|
     |
Cloudnet NetCDF Convention
Robin
Hogan
Version 1: 5 December 2001 - original specification for Level 1 radar
and lidar data
Version 2: 5 August 2002 - some refinements to conform to the
Climate and
Forecast (CF) convention
Version 3: 23 August 2004 - more detailed specifications relevant to
Level 2 and above
Motivation
It has been agreed that Cloudnet instrument data should be provided
by participants in either NetCDF or ASCII format. For instruments
such as radiometers that simply produce a time series of a few
parameters, ASCII is sufficient. For instruments such as radar and
lidar that produce large two-dimensional arrays of data, NetCDF is a
much more suitable format, in large part because it is self
describing. It is also much more suitable for subsequent
meteorological products. However, there is much freedom in how to
arrange the data in a NetCDF file, so it makes sense to define a
standard that participants should aim to conform to, both in the
instrument data that they provide and the meteorological products
derived from them. This allows generic programs to read and plot data
produced by any participants, specifically the chilncplot
program, part of the chil package,
which is used to produce the quicklooks currently on the
Cloudnet web site.
The Cloudnet convention is applicable to any dataset on a
time-height grid, including radar and lidar data, single-site model
forecasts and derived meteorological products. It adopts many of the
components of other
NetCDF conventions, specifically the Climate and
Forecast (CF) convention, which is favoured by the British Atmospheric Data Centre
(BADC). Conventions generally relate to the attributes that should be
supplied, or those that it is recommended to use if a certain piece of
information needs to be conveyed.
Suggestions for improvements/clarifications to the convention would
be welcome.
Sample radar and lidar NetCDF
files (older data so may not conform fully)
Files and filenames
Each level 1 (instrumental or model data) or level 2
(meteorological product) file should contain data from a single day.
The times reported in the file should be in hours UTC, so for
instruments that operate continuously, each individual file should run
from midnight to midnight UTC.
Filenames should be of the form YYYYMMDD_WHERE_WHAT.nc,
where the fields are as follows
- YYYYMMDD
- The date, UTC.
- WHERE
- A lower case string identifying the site (currently one of
chilbolton, cabauw, palaiseau,
arm-sgp, arm-nsa, arm-darwin,
arm-manus and arm-nauru).
- WHAT
- This field either identifies the instrument (e.g. galileo
for the Galileo radar), the model (met-office-global-12-35
for the 12-35 hour forecast of the Met Office global model) or the
meteorological product (e.g. iwc-Z-T-method for ice water content
derived using the reflectivity+temperature method).
The fields should not contain underscores (_); hyphens
(-) should be used to separate information within
fields. This then allows for an additional field to be added to
indicate the version number, under consideration for a future revision
of the convention. Thus filenames should contain only the characters
[-_.a-z0-9]. Spaces are forbidden as they have the habit of
breaking unix scripts.
Processed data (i.e. data with the clear-sky noise removed and set
to a constant value) zip down very small because of all the repeated
values in the file.
Dimensions
The NetCDF dataset should contain the dimension time,
which should be the first dimension defined. The vertical dimension
may be range, height or level:
- time
- This dimension may have the length unlimited. NetCDF
permits one dimension to be unlimited, which means that variables
using this dimension can grow along this dimension. However, if the
data are read one variable at a time then the use of an unlimited
dimension seems to slow down the read speed.
- range
- This dimension is used for instrumental data up to level 1b, and
indicates that distance is measured from the instrument rather than
from mean sea level, and also allows for instruments not pointing at
zenith.
- height
- For level 1c and 2, ranges from instruments are converted to
heights above mean sea level, and this dimension name is used.
- level
- For level 1 model data this is used to indicate model level rather
than height, since model levels often do not correspond to
unique heights.
Other dimensions may be defined. For example, the level 1b model data
contains microwave propagation parameters derived from the model
fields for several different frequencies, so uses the dimension
frequency. The level 1c Instrument Synergy/Target
Categorization dataset holds model data on the original vertical model
grid (to save space), which is referenced using the
model_height dimension.
Variables
Compulsory variables
The following compulsory variables are stored as variables rather
than global attributes because they have a unit or other describing
attribute associated with them; the attributes that should be set are
shown indented after each variable name. Each NetCDF attribute
consists of a "name" and a "value", where the value can be a text
string or a vector of numbers. All these variables are of type
float, i.e. a 4-byte floating-point number (real*4
in FORTRAN nomenclature).
- latitude
-
- units = "degrees_north"
- long_name = "Latitude of site"
- longitude
- It is conventional to always report positive longitudes, i.e. +359.0
rather than -1.0.
- units = "degrees_east"
- long_name = "Longitude of site"
For each dimension a "coordinate variable" must be defined, i.e. a
vector variable with the same name as the dimension. Typically these
would be of type float. Thus all datasets should contain a
time variable:
- time(time)
- Note that the float type has enough precision for time in
hours to be discretised to better than 0.007 seconds.
- units = "hours since YYYY-MM-DD 00:00:00 00:00"
- where YYYY-MM-DD must contain the date that the data were taken
(e.g. 2002-09-05). The zeros at the end indicate that the time is from
midnight UTC (i.e. timezone 00:00). This reporting of time is from the
CF
convention. Note that reporting time in hours rather than seconds
from midnight is much more convenient for the user.
- long_name = "Time UTC"
- axis = "T"
A range, height or level variable should
then also be defined, depending on the dimensions present, e.g.
- range(range)
-
- units = "km"
- Note that reporting range in metres ("m") is also permissible.
- long_name = "Range from antenna to the centre of each range gate"
- An example long name.
- axis = "Z"
- height(height)
-
- units = "m"
- long_name = "Height above mean sea level"
- axis = "Z"
Note that the axis attribute is the CF way of
stating the dominant temporal and vertical variables against which 2D
variables in the file should be plotted. No more than one axis of a
given type should be present in the file.
Compulsory variable attributes
All variables should set the following two attributes:
- units
- The units should be readable by the UDUNITS
package, as required by the CF
convention description. An additionally accepted unit is
dBZ. If possible, units should be SI. Consider also using the
units_html attribute discussed below. The main points for
uniform use of units are as follows:
- Exponents should be expressed by "g m-3", not "g
m^{-3}", "gm-3", "g/m3", "g(m)^-1"
etc.
- If conventional modifiers such as "kilo" are used, please use the
correct case, i.e. "km" not "Km" for kilometers.
- The appropriate way to express microns is "um", not
"microns" or "1e-6 m".
- The units for time should conform to the use indicated in
the section above.
- Dimensionless variables use the unit "1".
Note that bit fields and status fields, defined below,
need not use the units attribute.
- long_name
- This should be a concise but informative phrase describing the
variable, short enough to fit comfortably in the axis or title of a
plot (i.e. shorter than around 60 characters). It should start with an
upper case letter.
Recommended variable attributes
The following attributes are good ways to express information about
a variable. They should conform to the conventions indicated.
- comment
- This is by far the most important attribute that a variable can
have as it describes to the user what the variable is. Do not assume
that the user has a copy of documentation that should have been
distributed with the file: put enough information here to explain what
the variable contains, how it was derived, what the calibration
convention was and things the user should be aware of when using this
variable. If there are references specific to this variable
(i.e. those that would be inappropriate in the global
references attribute) then include them here. Ideally this
attribute should start with "This variable contains...", such
that it may be used as a general description of the variable for use
with programs that generate automatic documentation from a NetCDF
file; for example, the detailed descriptions of variables on the IWC product
page were contained in comment attributes. Use complete
sentences terminated with a full-stop/period so that extra comments
can be easily appended. New line characters (ASCII code: decimal 10)
should be used to break long lines. Note that the use of the plural
comments has been deprecated.
- _FillValue and missing_value
- If the variable contains missing data (e.g. because an instrument
was not working or the variable indicates cloud particle size but not
cloud is present etc.) then both _FillValue and
missing_value should be present to indicate which value has
been used to flag that no valid data are available. They must be of
the same type as the variable itself. The use of two different
attributes is an unfortunate consequence of the fact that older
programs may only expect missing_value while newer programs
tend to use _FillValue.
- units_html
- If units contains subscripts or superscripts, consider
adding a units_html attribute containing
<sup></sup> or <sub></sub>
HTML tags, which display programs (specifically chilncplot)
can use to show exponents properly. So if units was "g
m-3" then units_html would be "g
m<sup>-3</sup>".
- plot_range
- A two-element vector of numbers with the same units as the
variable itself, which indicate the recommended range to plot the
variable over. This does not mean that variables outside this range
are invalid. The attribute must be of the same type as the variable
(i.e. float, short etc.). This attribute should be
used in combination with plot_scale.
- plot_scale
- This attribute either contains "linear" or
"logarithmic", indicating the best way to plot the
variable. It should be used in combination with plot_range
and is interpretted by programs such as chilncplot.
- source
- For datasets containing variables derived from different sources,
it is useful to indicate the particular source here. Typically one
would take the global source attribute from the dataset from
which this variable was derived.
Variables indicating error and sensitivity
All derived meteorological products at level 2 and above should
ideally be accompanied by an indication of their error. Typically
errors can be divided into random error that decorrelates
rapidly with time, and a bias due to the accuracy with which an
instrument was calibrated and which may affect all measurements in a
day uniformly. Additionally, many instruments and the products
derived from them have a sensitivity, or a minimum
detectable value, which should be reported in order that
comparison with models be fair. Variables affected in this way should
define one or more of the following attributes:
- error_variable
- Contains the name of the variable in the file that indicates the
random error of the variable in question. Typically if the variable
name were Z, then the corresponding error_variable
would be Z_error.
- bias_variable
- As above, but for the bias. Similarly, the typical name for the
bias in Z would be Z_bias.
- sensitivity_variable
- As above, but for the sensitivity. The typical name for the
minimum detectable Z would be Z_sensitivity.
Sometimes errors can have a long (and difficult to define)
decorrelation time, and it is not obvious how to differentiate between
random error and bias. In this case only an error_variable
need be defined. The variables used to report error and sensitivity
should conform to the following conventions:
- An error/sensitivity variable may be a function of all, some or
none of the dimensions used by the corresponding meteorological
variable. For example, a random error might vary with time
and height, while a bias might be a constant value for the
whole file, indicating the expected accuracy of the calibration of the
instrument from which the meteorological variable was derived (so
therefore needing no dimensions). In the case of radar, the
sensitivity of radar reflectivity is predominantly a function of
height, as will be parameters derived from it.
- The units of the error/sensitivity variable would
typically be the same as those of the meteorological variable.
However, for errors and biases, two additional units are permissible:
- %
- If the error is of a fractional nature then it is appropriate to
report a percentage error. However, this becomes ambiguous in the case
of large fractional errors, as an error of 200% implies that a
variable might range between 300% of its value to the negative
(i.e. -100%) of its value. When errors are likely to exceed 25%, it is
better to use units of dB.
- dB
- While decibels are often not familiar to non-instrumentalists,
they are convenient for expressing fractional errors of any
magnitude. Simply put, a "bel" is an order of magnitude so a "decibel"
is a tenth of an order of magnitude. Hence an error of 3 dB
corresponds to a "factor of 2" or +100%/-50% and an error of 10 dB
corresponds to a "factor of 10" or +900%/-90%. Hence an error of a
"factor of X" corresponds to 10log10(X)
dB.
- The error should correspond to one standard deviation, not
be a 95% confidence interval, which is around two standard deviations.
This should be stated in the comment attribute, and ideally
the long_name attribute. So for variable Z , its
random error variable would typically have the long name Random
error in Z, one standard deviation.
- The comment attribute should start with, for example,
"This variable is the [random error in|expected bias
in|approximate calibration error in|minimum detectable]...", and
indicate how it was calculated.
Bit fields and status fields
It is often necessary to indicate the status of a retrieval,
enabling the user to distinguish pixels for which the retrieval was
(for example) "reliable", "probably reliable but...", "unreliable",
"not possible". Sometimes targets need to be distinguished between a
number of different types, such as "liquid clouds", "ice clouds",
"aerosol", "insects". In this case one can use a status field,
where the integer variable will be one of a limited number of values,
or a bit field, where each bit of the integer variable should
be interpretted as a separate flag. Such variables should always be of
NetCDF type byte, to avoid the byte-order confusion that is
likely to arise with two-byte and four-byte integers due to different
CPU architectures. Additionally, it is probably best not to use the
most significant bit of the field, as this is used to indicate the
sign of the byte and could easily be misread by badly written
programs. Hence use no more than 7 bits per byte, and if you need more
bits, consider providing two bit fields. Rather than use a
units attribute, the variable should use a
definition attribute, where each line (separated by the
new-line character) indicates the meaning either of each value, or of
each bit. In the case of status fields, we could have:
- definition =
- "0: No cloud present
1: Reliable retrieval
2: Possibly unreliable retrieval due to spiders in the waveguide
3: Unreliable retrieval"
while in the case of bit fields we could have:
- definition =
- "Bit 0: Liquid droplets are present
Bit 1: Ice particles are present
Bit 2: Raindrops are present
Bit 3: Aerosol particles are present"
Note that definition is used by programs such as
chilncplot in the key at the side of the plot to indicate the
meaning of each colour, so the descriptions should be fairly
concise. Use of a long_definition attribute is therefore
recommended where more complete descriptions may be placed, but the
same format should be used, with a single line terminated by a
new-line character (except the last) for each entry.
Typically status fields are very suitable to be plotted, so
to assist plotting programs it is helpful if the following attributes
are defined:
- plot_range
- As defined above (although it need not be accompanied by
plot_scale), this attribute would be a vector of two bytes
indicating the lowest (invariably 0) and highest value to be
displayed.
- legend_key_red, legend_key_green and
legend_key_blue
- Each attribute is a vector of type float with length
equal to the number of categories in the status field. The numbers
should lie between 0.0 and 1.0 and indicate the RGB
values recommended for displaying the field.
Global attributes
Global attributes provide important information about the data in a
NetCDF file.
Compulsory global attributes
The following attributes should be present and of type
short. They replicate information present in the
units attribute of the time variable, but are much
easier to obtain from scalar global attributes than by parsing a
string.
- day
- The day of the month on which the data were taken.
- month
- The month of the year, where January = 1 etc.
- year
- The year as a full four-digit number (e.g. 2001)
The following attributes should be present and of type text:
- Conventions = "CF-1.0"
- Indicates that your data satisfies the CF
conventions. If your data doesn't satisfy the CF conventions,
don't include this attribute.
- location
- The site at which the instrument was operating, such as
"Chilbolton", "Cabauw", "Palaiseau" and
"ARM Southern Great Plains".
- title
- A suitable title for plots created from the dataset, such as
"Ice water content from Chilbolton", "Chilbolton 94-GHz
Cloud Radar (Galileo)" or "Cabauw 905-nm CT75K Vaisala Lidar
Ceilometer".
- history
- Each program that acts on the file should append to this attribute
a brief description of what they did, and when they did it (again
using the new-line character as a separator). Extra information can
include the user and the name of the machine. For example, "Wed
Nov 28 18:38:12 GMT 2001 - NetCDF generated from original data by
Robin Hogan <r.j.hogan@reading.ac.uk> on voldemort". If
the calibration needs to be changed then it may be appended by
"\nThu Nov 29 18:38:12 GMT 2001 - Recalibrated (+3 dB) by Robin
Hogan <r.j.hogan@reading.ac.uk> on voldemort", where
'\n' indicates the new-line character (i.e. not a backslash
character followed by an "n" character).
- institution
- The institution that produced the data, such as "Royal Dutch
Meteorological Institute (KNMI)". It may be necessary to refer to
several institutes, in which case the two should be separated by a new
line, e.g. "Data recorded at Chilbolton Observatory
(part of the Radio Communications Research Unit, RAL, UK)\n
Processed by the University of Reading".
- source
- In the case of instrumental data, this would contain a brief
specification of the instrument. The spec of a radar should include
frequency, antenna diameter, pulse repetition freqiency, pulse width
(in microseconds) and peak power, and the spec of a lidar should
include wavelength, divergence, field of view and pulse repetition
frequency. The fields would be new-line separated. In the case of
model data a single-line title for the model is sufficient,
e.g. "UK Met Office mesoscale model". Data derived from a
variety of sources should concatenate the global source
attributes from the input datasets, separated by semi-colon
(;) and new-line.
- references
- Any web-based or published information about the data, e.g.
"Information on the data is available at
http://www.met.rdg.ac.uk/radar/doc/galileo.html". Obviously
please ensure that the web site referred to is maintained for the
likely lifetime of the data.
Recommended global attributes
- comment
- Any further general information for the user (that is not specific
to individual variables) should be added here. Use complete sentences
terminated with a full-stop/period so that extra comments can be
easily appended. It is also useful to add new-line characters to break
up long lines.
- command_line
- The full Unix (or DOS) command line used to call the program that
generated the data. This is essential for Chilbolton data where the
various processing options (such as the calibration figure applied)
are all decided by command-line arguments, and one often needs to know
exactly what processing was applied. If more than one program
operates on the file (such as if the data need to be recalibrated)
then each program should append their own command line, separated by a
new-line character. Therefore each element of the
command_line attribute should correspond to each element of
the history attribute.
- software_version
- If the processing program changes over time then it is useful
to store the version number (as a string) of the program here.
Sample radar and lidar NetCDF files
Recommended variables for radar and lidar data
The following describes additional conventions that should make
radar and lidar data from different sites as similar as possible.
Scalar variables
The following variables are single values that are stored as
variables rather than global attributes because they have a unit or
other describing attribute associated with them; the attributes that
should be set are shown indented after each variable name. All these
variables are of type float.
- altitude
- To get the altitude of each range gate above mean sea level, the
user of this data should add this value to the values in the
range variable (assuming the instrument is vertically
pointing, and taking account of the fact that altitude is in
metres and range is in km).
- units = "m"
- long_name = "Altitude of antenna above mean sea level"
- elevation
- Most radars will be vertically pointing, so their elevation will
be 90°. Lidars may be deployed off-zenith to avoid specular
reflection from horizontally aligned plate crystals, in which case the
elevation will be less than 90°.
- units = "degrees"
- long_name = "Elevation above horizon"
- azimuth
- An optional variable that gives the azimuth of instruments that
are not vertically pointing.
- units = "degrees"
- long_name = "Azimuth clockwise from due north"
For radar the following should also be defined:
- frequency
-
- units = "GHz"
- long_name = "Radar frequency"
For lidar, use:
- wavelength
- If this is a multi-wavelength lidar, then wavelength
should be a one-dimensional array containing all the wavelengths available.
This requires an extra dimension, also with name wavelength.
- units = "nm"
- long_name = "Lidar wavelength"
Two-dimensional variables
Most two-dimensional variables will be of type float.
However, for some data it may make sense to use the short
data type (a signed 2-byte integer; integer*2 in FORTRAN
nomenclature). The CT75K lidar ceilometer is a good candidate as the
raw data are stored to this precision so no information is lost. You
may then use scale_factor and/or add_offset
attributes to get the data into suitable units and to provide the
correct calibration. If both are present then the data in the file
should be scaled first before the offset is added. Note also that the
missing_value and _FillValue
attributes apply to the data before it has been scaled and
shifted in this way. Usually scale_factor and
add_offset would be of type float.
For some variables, notably radar reflectivity, accurate
calibration can be difficult and the data may need to be recalibrated
after the initial release. These variables should therefore indicate
the calibration that has been applied to them in the processing stage
in the calibration_applied attribute.
The following are variable names that could be used in radar data,
and some of the attributes that should be present:
- Z(time, range)
-
- units = "dBZ"
- long_name = "Radar reflectivity factor"
- comment = "Calibration convention: in the absence of attenuation, a cloud at 273 K containing one million 100-micron droplets per cubic metre will have a reflectivity of 0 dBZ at all frequencies."
- calibration_applied
- ...in dB.
- v(time, range)
-
- units = "m s-1"
- units_html = "m s<sup>-1</sup>"
- long_name = "Doppler velocity"
- comment = "Positive velocities are away from the radar."
- folding_velocity
- This attribute indicates that the velocities may be folded, lying
in the range -folding_velocity to folding_velocity.
- width(time, range)
-
- units = "m s-1"
- units_html = "m s<sup>-1</sup>"
- long_name = "Spectral width"
- comment = "This variable is the standard deviation
of the reflectivity-weighted velocities in the radar pulse
volume."
- sigma_v(time, range)
- Level 1 data is typically averaged to 30 seconds, so the velocity
variable in the NetCDF file is typically an average of a number of
high-resolution mean velocity values measured in the averaging time.
The sigma_v variable is the standard deviation of these
high-resolution mean velocities. Spectral width is the standard
deviation of actual particle velocities measured within the radar
pulse volume in a short time (typically around 1 second), so tends to
be dominated by the differential fall speeds of the different sized
particles. This variable, on the other hand, is dominated by
turbulence.
- units = "m s-1"
- units_html = "m s<sup>-1</sup>"
- long_name = "Standard deviation of mean velocity"
- comment = "The data in this file are at a lower
resolution than the raw data, and this variable is the standard
deviation of the raw Doppler velocities measured during in each output
gate and ray."
- Ldr(time, range)
-
- units = "dB"
- long_name = "Linear depolarisation ratio"
Similarly, the following are variable names that could be used with
lidar data:
- beta(time, range)
- If attenuated backscatter coefficient is measured at more than one
wavelength, then the wavelength could be indicated in the variable
name, such as beta1064, beta532 etc.
- units = "m-1 sr-1"
- units_html = "m<sup>-1</sup> sr<sup>-1</sup>"
- long_name = "Attenuated backscatter coefficient"
- Ldr(time, range)
-
- units = "1"
- Lidar depolarisation ratio normally lies in the range 0 to 1.
- long_name = "Linear depolarisation ratio"
If there is a need to have an unprocessed version of a variable in
the file then I suggest using the names Z_raw,
beta_raw and so on.
These pages are maintained by Ewan O'Connor.
Return to Radar Group |
Department of Meteorology | University of Reading
|
| |