EDF(+) programming guidelines

European Data Format

Specs

Guidelines for development of EDF/EDF+ software

When implementing EDF(+), some of its details require quite some thinking and some specifications have raised questions. This resulted in some discussions, mainly at the EDF usersgroup. This page is intended to save time and avoid errors by listing the results of those discussions. This page is NOT part of the specification of the EDF or EDF+ format. It merely lists some non-obligatory but helpful suggestions for developing (semi-)automatic EDF+ software.

Please pose any remaining questions to the EDF usersgroup (link elsewhere on this site). I sincerely thank Jesus Olivan, Marco Roessen, Paul Koster, Raphael Schneider, Patrick Berg, Nizar Kerkeni and others for their contributions.

The guidelines

1, EDF+ vs EDF (okt 2004).
Carefully read the EDF and EDF+ format specification, including the page about standard texts. Even if you do not need to handle annotations or events, realize that EDF+ still is a better and more comprehensive format than EDF. In that case, your Annotations signal need only contain the starttime of each data record.

2, viewers (okt 2004).
When matching pixel positions to signal samples, realize that the signal samples refer to a continuous time (seconds) and amplitude (like Volt) of the physical signal. In the same way, the signal pane shows a continuous time interval and amplitude range and the pixels are samples of this continuous signal pane. Just find out which pixel position is most close to the continuous time and amplitude of the signal sample. Assume that both samples and pixels are in the middle of a small continuous time/amplitude area.
        For instance, if an 8-pixel screen displays the first second of a 2 Hz signal, then the two samples are displayed by the 2nd/3d and 6th/7th pixel. More practically, if a 1024-pixel screen is used, the two samples are displayed by the 256th/257th and 767th/768th pixel, respectively. The same computations apply to the vertical amplitude positions. Realize that the formula to compute the physical value, P, from the digitally stored (16-bit integer) value, D, reads: P = Pmin   + (Pmax - Pmin) * (D - Dmin) / (Dmax - Dmin). In this formula, Pmin and Pmax are the physimin and physimax values, while Dmin and Dmax are the digimin and digimax values, all from the EDF header. Of course, interpolation between the thus computed pixels can be applied.

2a, E-notation (May 2009). Numbers in EDF(+) headers may have the scientific E notation as in 1E2345, +012E+34, -1.34E09 and +1.234E-5. Note, though, that the 8 characters of some EDF(+) numberfields are used more efficiently by -123.456 uV than by -1.23E-4 V.

3 (Okt 2004). Specify numbers, dimensions and signals in annotations according to the same rules that apply to the EDF+ header. So, numbers must not contain any digit grouping symbol and a dot (".") must be used for any decimal separator. Physical dimensions must be specified according to the EDF+ standard texts. Additionally, physical dimensions must be preceded by a number, either directly or separated by a space character (for example 50uV or 50 uV). Signals must be specified by their signal label as defined in the EDF+ standard texts.
    By these rules, the sloppy annotation "EMG artifact of 1,234 microvolt in FpzCz" changes to "EMG artifact of 1234 uV in EEG Fpz-Cz". Only the latter text can be automatically attached to the EEG Fpz-Cz signal and related to its amplitude calibration.

4 (Nov 2004). The size limit of 61440 for data records was already recommended for EDF files. In practice, almost all EDF files do abide to this limit. The 61440 limit is obligatory for all EDF+ files.

5 (Dec 2004). Quoting the EDF+ spec: "The samples of an ordinary signal must have equal sample intervals inside each data record, but the interval to the first sample of the next data record may be different". So, both in EDF+C and EDF+D, ordinary-signal samples are contiguous within a data record. Only between datarecords, discontiguities can occur. If they do, the file is EDF+D.

6 (dec 2004). Quoting the EDF+ spec about Annotations: "Onset as well as Duration are coded using US-ASCII characters with byte values 43, 45, 46 and 48-57 (the '+', '-', '.' and '0'-'9' characters, respectively)". Therefore scientific notations like 3.2E-05 are not allowed at this place in EDF+ files. Use 0.000032 (in this example) instead. Such scientific notations are allowed inside the Annotations themselves (see item 3). Viewers/analysers can be more relaxed and accept the scientific notation in Onset and Duration also.

7 (dec 2004). Data records in an EDF+D file can be non-contiguous, meaning that the end of a data record does not necessarily connect to the beginning of the next datarecord. Of course, the file can contain continuous segments in which the data records do connect to each other. For example, an MSLT test in one single EDF+D file can contain 5 continuous 20 minute segments starting at 9AM, 11AM, 1PM, 3PM and 5PM. Another EDF+D file might contain only those segments of a Holter ECG recording that are characterized by arrhythmia.
    The timing annotations at the beginning of each EDF+ datarecord enable any EDF+ viewer/analyser to detect those continuous segments. Depending on the application, the viewer can then analyse/display the signals by segment, by data record or whatever. In the MSLT case, the viewer would probably show 30s pages for sleep scoring. In the ECG case, the viewer might page from one arrhythmia segment to the next, or within an arrhythmia segment from one QRS complex to the next.

8 (feb 2005). The section "Analysis results" specifies that hypnograms can be stored as an ordinary signal by coding the sleep stages as the integer numbers 0-6 and 9 in the data records. Some have argued that it may not be completely clear whether digital or physical values are meant here. In order to avoid any ambiguity, do the following when creating such a signal. Put the digital values 0-6 and 9 in the file. And in the header, set the digital as well as the physical minimum and maximum of this signal to 0 and 9 respectively. In this way, the 0-6 and 9 are both the digital and the physical values.

9 (sep 2005). EDF+ does not explicitly forbid to assign a duration to the annotation that specifies the starttime of each data record, i.e. the Onset. However, such a duration makes no sense because the starttime is a point of time and not a period of time. Therefore, the first TAL in the first Annotations signal in each datarecord must start with '+Onset

' in which 'Onset' can be, for instance, 3245604.4554 if that datarecord starts 3245604.4554 seconds after the starttime of the file.

10 (dec 2005). Storing many signals with large sampling frequencies in at most 61440 bytes per datarecord is not always straightforward. For instance how to store 124 signals that are each sampled by 1006Hz? The solution works as follows:
            In EDF(+), data record Durations are specified in an 8-character string, for instance 0.123456 or 1234567. So, Durations can only be one of the following values:
0.000000
0.000001
0.000002
0.000003
...
0.999998
0.999999
1
...
99999998
99999999
              For each value of Duration, the number of 1006Hz samples that fit into Duration seconds equals NrSamples = Duration x 1006. This value of NrSamples is a float value, but EDF accepts only an integer number. Therefore there is an error. This Error equals MOD(NrSamples). The RelativeError is MOD(NrSamples) / NrSamples. Now, simply write a few lines that check all values of Duration and keep the best one. That is the one that has the smallest RelativeError.
            In this example, the Duration must also be smaller than 0.246265 in order to fit the sizelimit of 61440 bytes. This best Duration = 0.221670s. For this Duration, NrSamples = 223.00002. However, in EDF, we must specify that NrSamples is 223. The error is 0.00002 samples per datarecord. The RelativeError = 0.0000000896. So, in a full 24h recording, we have accumulated an error of less than 0.008 seconds. No ADC in this universe has that accuracy.
            The 124 signals take 124 * 223 * 2 = 55304 bytes in each datarecord. In one datarecord, maximum 61440 bytes are available for all signals (including the Annotation signal). So, there are 6136 bytes left for the Annotations signal and/or other additional signals.