Guidelines for development of EDF/EDF+ software
When implementing
EDF(+), some of its details require quite some
thinking and some specifications have raised questions. This resulted
in some discussions, mainly at the EDF usersgroup. This page is
intended to save time and avoid
errors by listing the results of those discussions.
This page is NOT part of the specification of the EDF or EDF+ format.
It merely lists some non-obligatory but helpful suggestions for
developing
(semi-)automatic EDF+ software.
Please pose any remaining questions to the EDF usersgroup (link
elsewhere on this site). I sincerely thank Jesus Olivan, Marco Roessen,
Paul
Koster, Raphael Schneider, Patrick Berg, Nizar Kerkeni and others for
their
contributions.
The guidelines
1, EDF+ vs EDF (okt 2004).
Carefully read the EDF
and EDF+
format specification,
including the page about standard
texts. Even if you do not need to
handle
annotations or events, realize that EDF+ still is a better
and more comprehensive format than EDF. In that case, your Annotations
signal need only contain the starttime
of each data
record.
2, viewers (okt 2004).
When
matching pixel positions
to signal
samples, realize that the signal samples refer to a continuous time
(seconds) and amplitude (like Volt) of the physical signal. In the same
way, the signal pane shows a continuous time interval and amplitude
range and the pixels are samples of this continuous signal pane. Just
find out which pixel position is most close to the continuous time and
amplitude of the signal sample. Assume that both samples and pixels are
in the middle of a small continuous time/amplitude area.
For instance,
if an 8-pixel screen displays the first second of a 2 Hz signal, then
the two samples are displayed by the 2nd/3d and 6th/7th pixel. More
practically, if a 1024-pixel screen is used, the two samples are
displayed by the 256th/257th and 767th/768th pixel, respectively. The
same computations apply to the vertical amplitude positions. Realize
that the formula to compute the physical value, P, from the digitally
stored (16-bit integer) value, D, reads: P = Pmin +
(Pmax - Pmin) * (D - Dmin) / (Dmax - Dmin). In this formula, Pmin and
Pmax are the physimin and physimax values, while Dmin and Dmax are the
digimin and digimax values, all from the EDF header. Of course,
interpolation between the thus computed
pixels can be applied.
2a, E-notation (May 2009).
Numbers in EDF(+) headers may
have the
scientific E notation as in 1E2345, +012E+34,
-1.34E09 and +1.234E-5. Note, though, that the 8 characters of some
EDF(+) numberfields are used more efficiently by -123.456 uV than by
-1.23E-4 V.
3 (Okt 2004).
Specify numbers,
dimensions and signals in
annotations according to the same rules that apply to the EDF+
header. So, numbers must not contain any digit grouping symbol and a
dot (".") must be used for any decimal separator. Physical dimensions
must be specified according to the EDF+ standard
texts. Additionally,
physical dimensions must be preceded by a number, either directly or
separated by a space character (for example 50uV or 50 uV). Signals
must be specified by their signal label as defined in the EDF+ standard
texts.
By these rules, the
sloppy annotation "EMG artifact
of 1,234 microvolt in FpzCz" changes to "EMG artifact of 1234 uV in EEG
Fpz-Cz".
Only the latter text can be automatically attached to the EEG Fpz-Cz
signal and related to its amplitude calibration.
4 (Nov 2004). The
size limit of
61440 for data records
was already
recommended for EDF files. In practice, almost all EDF files do abide
to this limit. The 61440 limit is obligatory for all EDF+ files.
5 (Dec 2004). Quoting
the EDF+ spec: "The
samples of an ordinary
signal must
have equal sample intervals inside each data record, but the interval
to
the first sample of the next data record may be different". So,
both in EDF+C and
EDF+D, ordinary-signal samples are contiguous within a
data record. Only between datarecords,
discontiguities
can occur.
If they do, the
file is EDF+D.
6 (dec 2004).
Quoting the
EDF+ spec about Annotations: "Onset
as well as Duration are coded using US-ASCII
characters
with byte values 43, 45, 46 and 48-57 (the '+', '-', '.' and
'0'-'9'
characters, respectively)". Therefore scientific notations like
3.2E-05
are not allowed at this place in EDF+ files. Use 0.000032 (in this
example) instead. Such scientific notations are allowed inside the Annotations
themselves (see item 3). Viewers/analysers can be more relaxed and
accept the scientific notation in Onset and Duration also.
7
(dec 2004). Data records in an
EDF+D file can be
non-contiguous, meaning that the
end
of a data record does not necessarily connect to the beginning of
the next
datarecord. Of course, the file
can contain continuous segments in
which the data records do connect
to each other. For example, an
MSLT test in one single
EDF+D file can contain 5 continuous 20 minute segments starting at 9AM, 11AM, 1PM, 3PM and
5PM. Another EDF+D
file might contain only those segments of a Holter ECG recording that
are characterized by arrhythmia.
The timing annotations at the beginning of
each EDF+ datarecord
enable any EDF+ viewer/analyser
to detect those continuous segments. Depending on the application, the viewer
can then
analyse/display the signals by segment,
by data record or whatever. In
the MSLT case, the viewer would probably show 30s pages for sleep scoring. In
the ECG case, the
viewer might page from one arrhythmia segment to the next, or within
an arrhythmia segment from one QRS complex to the next.
8 (feb 2005). The
section "Analysis results" specifies that hypnograms can be
stored
as an ordinary signal by coding the sleep stages as the integer numbers
0-6 and 9 in the data records. Some have argued that it may not be
completely
clear whether digital or physical values are meant here. In order to
avoid
any ambiguity, do the following when creating such a signal. Put the
digital
values 0-6 and 9 in the file. And in the header, set the digital as
well
as the physical minimum and maximum of this signal to 0 and 9
respectively.
In this way, the 0-6 and 9 are both the digital and the physical values.
9 (sep 2005). EDF+
does not explicitly forbid to assign a duration to
the annotation that specifies the starttime of each data record, i.e.
the Onset. However, such a duration makes no sense because the
starttime is a point of time and not a period of time. Therefore, the first TAL in the first Annotations signal in each datarecord must start with
'+Onset' in
which
'Onset' can be, for instance, 3245604.4554 if that datarecord starts
3245604.4554 seconds after the starttime of the file.
10 (dec 2005). Storing
many signals with large sampling frequencies in at most 61440
bytes per datarecord is not always straightforward. For instance how to
store 124 signals that are each sampled by 1006Hz? The solution works
as follows:
In EDF(+), data record Durations
are specified in an 8-character
string, for instance 0.123456 or 1234567. So, Durations can only be
one of the following values:
0.000000
0.000001
0.000002
0.000003
...
0.999998
0.999999
1
...
99999998
99999999
For
each value of Duration, the number of
1006Hz samples that fit into Duration seconds equals NrSamples = Duration x
1006. This
value of NrSamples is a float value, but EDF accepts only an
integer number. Therefore there is an error. This Error equals
MOD(NrSamples). The RelativeError is MOD(NrSamples) / NrSamples. Now,
simply write
a few lines that check all values of Duration
and keep the best one. That is the one that has the smallest
RelativeError.
In this
example,
the Duration must also be smaller than 0.246265 in order to fit the
sizelimit of 61440 bytes. This best Duration =
0.221670s. For this Duration, NrSamples = 223.00002. However, in EDF,
we must specify that
NrSamples is 223. The error is 0.00002 samples per datarecord. The
RelativeError = 0.0000000896. So, in a full 24h recording, we have
accumulated an error
of less than 0.008 seconds. No ADC in this universe
has that accuracy.
The 124
signals take 124 * 223 * 2 = 55304 bytes in each datarecord. In one
datarecord, maximum 61440 bytes are available for all signals
(including the Annotation signal). So, there are 6136 bytes left for
the Annotations signal and/or other
additional signals.