Go to the previous, next section.
We will use a small netCDF example to illustrate the concepts of netCDF dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language). It provides an easily comprehended text version of the structure and contents of a binary netCDF file:
netcdf example_1 { // example of CDL notation for a netCDF file dimensions: // dimension names and sizes are declared first lat = 5, lon = 10, level = 4, time = unlimited; variables: // variable types, names, shapes, attributes float temp(time,level,lat,lon); temp:long_name = "temperature"; temp:units = "celsius"; float rh(time,lat,lon); rh:long_name = "relative humidity"; rh:valid_range = 0.0, 1.0; // min and max int lat(lat), lon(lon), level(level); lat:units = "degrees_north"; lon:units = "degrees_east"; level:units = "millibars"; short time(time); time:units = "hours since 1990-11-25 12:00 UTC"; // global attributes :source = "National Weather Service"; data: // optional data assignments level = 1000, 850, 700, 500; lat = 20, 30, 40, 50, 60; lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15; time = 12; rh =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7, .1,.3,.1,.1,.1,.1,.5,.7,.8,.8, .1,.2,.2,.2,.2,.5,.7,.8,.9,.9, .1,.2,.3,.3,.3,.3,.7,.8,.9,.9, 0,.1,.2,.4,.4,.4,.4,.7,.9,.9; }
The CDL notation for a netCDF file can be generated automatically by
using ncdump
, a utility program described later (see section ncdump).
Another netCDF utility, ncgen
, generates a netCDF file (or
optionally C or FORTRAN source code containing calls needed to produce a
netCDF file) from CDL input (see section ncgen). It is not necessary to
learn much about CDL notation to use the netCDF library; we use it in
this document as a concise way of presenting netCDF examples.
The CDL notation will be explained more fully as we describe the components of a netCDF file. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters `//' on any line. A CDL description of a netCDF file takes the form
netCDF name { dimensions: ... variables: ... data: ... }where the name is used only as a default in constructing the name of the file generated by the
ncgen
utility. The CDL description
consists of three optional parts, introduced by the keywords
dimensions
, variables
, and data
. NetCDF dimension
declarations appear after the dimensions
keyword, netCDF
variables and attributes are defined after the variables
keyword,
and variable data assignments appear after the data
keyword.
A netCDF dimension is a named integer used to specify the shape of one or more of the multi-dimensional variables contained in a netCDF file. A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height. A dimension might also be used to index more abstract quantities, for example, color-table entry number, instrument number, station-time pair, or model-run ID.
Every netCDF dimension has both a name and a size. A
dimension name is an arbitrary sequence of alphanumeric characters
(as well as the underscore character, `_', and the hyphen character,
`-') beginning with a letter. Case is distinguished in netCDF
names. A dimension size is an arbitrary positive integer, except that
one dimension in a netCDF file can have the size UNLIMITED
. Such
a dimension is called the unlimited dimension or the record
dimension. A variable with an unlimited dimension can grow to any
length along that dimension. The unlimited dimension is like a record
number in conventional record-oriented files.
CDL dimension declarations may appear on one or more lines following the
CDL keyword dimensions
. Multiple dimension declarations on the
same line may be separated by commas. Each declaration is of the form
name = size.
There are four dimensions in the example: lat
, lon
,
level
, and time
. The first three are assigned
fixed sizes; time
is assigned the size UNLIMITED
, which means
it is the unlimited dimension. A netCDF file can have at most one
unlimited dimension, but need not have any.
There are several uses for netCDF dimensions:
The basic unit of named data in a netCDF file is a variable. In general, a variable is a multidimensional object that has, among other characteristics, a shape, which is defined by the number, order, and sizes of its dimensions. When a netCDF variable is defined, the number and order of the dimensions that define its shape are specified. Hence you must first create the necessary dimensions before creating a netCDF variable that uses them.
It is possible to use the same dimension more than once in specifying a
variable shape, for example var(dim, dim)
, but it does not make
much sense to do this; it is contrary to the intuitive meaning of a
physical dimension. A variable that has two dimensions that happen to
be the same size is more accurately modeled by using two dimensions with
different names but the same size.
Two dimensions may have the same size, perhaps by coincidence, without being related in any other way. Dimension names provide a way to distinguish dimensions regardless of size.
Variables are related by the dimensions they share. For example, if two
variables are defined with the same dimensions, they might represent
observations or model output for the same set of points. In the example
netCDF file, variables temp
and rh
share the
time
, lat
, and lon
dimensions. The variable
rh
does not have level
as a dimension, perhaps because it is only
defined for a single level.
Besides serving as sizes for integer indexes to multidimensional
variables, dimensions may be used to define coordinate systems for
variable data. To do this, create a variable with the same name as a
dimension and specify coordinate values for that variable. A variable
should only be given the same name as a dimension in a netCDF file when it is
to be used as a coordinate variable. Such variables are
indexed by the dimension for which they provide coordinate values, for
example, lat(lat)
.
It is not necessary to provide a coordinate variable for each dimension; if no such variable is defined, the coordinate values of the dimension are assumed to be 0, 1, 2, ... (for C programs) or 1, 2, 3, ... (for FORTRAN programs). Although the C and FORTRAN interfaces support different conventions for index numbering, there is no difference between the actual netCDF files written by C and FORTRAN programs. Programs written in either language can be used to access data written by programs using the other interface.
In the CDL example, each dimension has an associated coordinate variable
with the same name as the dimension. The four values of the
level
index, 0, 1, 2, 3, (1, 2, 3, 4 in FORTRAN) are related in
coordinate-like fashion to the four values (1000, 850, 700, 500) of the
level
variable. Note that there is no requirement that
coordinates be equally spaced or increasing. It would not make much
sense for two coordinate values to be the same, but the meaning of
coordinate variables is enforced only by conventions of application
packages and utilities, not by the netCDF interface. The application
packages and utilities that make use netCDF coordinate variables assume
they are one-dimensional and monotonic (all values either increasing or
decreasing).
A variable represents a multidimensional array of values of the same type. A variable has a name, a data type, and a shape described by its list of dimensions, all of which are specified when the variable is created. Each variable may also have data values and associated attributes, which may be added or changed after the variable is created. Variables are used to store the bulk of the data in a netCDF file, and are the primary component used by utilities to identify sub-parts of a netCDF file.
Like a dimension name, a variable name is an arbitrary sequence of alphanumeric characters (also including `_' and `-') beginning with a letter. Case is distinguished in variable names. Long names help to make a netCDF file self-documenting, but ancillary information about a variable is better stored in variable attributes (discussed below) than encoded as part of the name.
A variable data type is one of a small set of netCDF types that
have the names NC_BYTE
, NC_CHAR
, NC_SHORT
,
NC_LONG
, NC_FLOAT
, and NC_DOUBLE
in the C interface
and the corresponding names NCBYTE
, NCCHAR
, NCSHORT
,
NCLONG
, NCFLOAT
, and NCDOUBLE
in the
FORTRAN interface. In the CDL notation, these types are given the
simpler names byte
, char
, short
, long
,
float
, and double
. int
may be used as a synonym
for long
and real
may be used as a synonym for
float
in the CDL notation. We will postpone a discussion of the
exact meaning of each of the types until the discussion of data,
below. For now, it suffices to know that the choice of the type used to
represent variable data depends on the range of values it can have, the
precision to which values are known, and the number of bits required to
represent the variable in a netCDF file on disk.
The shape of a variable is specified by its list of dimensions. If a variable has an unlimited dimension, that dimension must appear first in the list of dimensions in CDL. It is possible to define variables with no dimensions, also called scalar variables. There are no scalar variables in the example netCDF file.
CDL variable declarations appear after the variables
keyword in
a CDL unit. They have the form
type variable_name ( dim_name_1, dim_name_2, ... ) ;for variables with dimensions, or
type variable_name ;for scalar variables.
In the CDL example there are six variables. As discussed above, four of
these are coordinate variables for dimensions. The remaining variables,
temp
and rh
, contain what is usually thought of as the
data. Each of these variables has the unlimited dimension time
as its first dimension, so they are called record variables. A
variable that is not a record variable has a fixed size (number of data
values) given by the product of its dimensions. A record variable has a
current size, given by the product of the maximum record written so far
and the other dimensions of the variable. Only record variables may
grow after they are defined.
A netCDF attribute is meant to contain information about a netCDF variable or about an entire netCDF file. This information is ancillary data, or data about data, analogous to the information stored in data dictionaries and schema in conventional database systems. An attribute has an associated variable, a name, a data type, a length, and a value. Individual attributes are identified by specifying a variable and an attribute name.
Each attribute is associated with a single variable when it is created. Attributes for different variables may differ in data type, length, and values even though they share the same name.
A global attribute is one that applies to the whole netCDF rather than any particular variable. Global attributes are defined and accessed similarly to variable attributes; the details for defining global attributes in the CDL notation and in the netCDF procedural interface are presented later.
Attribute names follow the same rules as dimension and variable names.
Providing meaningful names for attributes is important, but using agreed
on conventional names is also required if generic applications and
utility programs will be used on a netCDF file. For example, every
variable for which units make sense should have a units
attribute
defined, so the units can be printed in labels. Furthermore, if the
netCDF file is ever to be used as input to generic units-aware software,
the values of the units
attributes should be expressed in a
conventional form as a character string that can be interpreted by that
software.
The type of an attribute is specified when it is created. The types
permitted for attributes are exactly the same as the netCDF data types
used in creating variables. Attributes with the same name for different
variables should sometimes be of different types. For example, the
attribute valid_max
specifying the maximum valid data value for a
variable of type long
should be of type long
, whereas the
attribute valid_max
for a variable of type double
should
instead be of type double
.
In addition to specifying the associated variable, attribute name, and type, the length and value of an attribute must also be specified when it is created. The information in an attribute is represented by either a single value (length 1) or a vector of values of the same type. Since "character string" is not a basic netCDF data type, string-valued attributes have a vector of characters as their value, with a length equal to the length of the character string.
Attributes are more dynamic than variables or dimensions; they
can have their type, length, and values changed after they are created.
For example, an attribute max_value
might store the
maximum value seen so far for a record variable, and might
be updated every time a new record is written.
The CDL notation for defining an attribute is
variable_name:attribute_name = list_of_values ;for a variable attribute, or
:attribute_name = list_of_values ;for a global attribute. The type and length of each attribute are not explicitly declared in CDL; they are derived from the values assigned to the attribute. All values of an attribute must be of the same type. The notation used for constant values of the various netCDF types is discussed later (see section CDL Notation for Data Constants).
In the example netCDF (see section Components of a NetCDF File), units
is an attribute for the variable lat
that has a
13 character array value `degrees_north'. And valid_range
is an
attribute for the variable rh
that has length 2 and values
`0.0' and `1.0'.
One global attribute---source
---is defined
for the example netCDF file. This is a character array intended for
documenting the data. Real netCDF files typically have more global
attributes to document the origin, history, accuracy, and other
characteristics of the data.
Generic applications that take netCDF files as input may by convention expect certain variable and global attributes. A few other attributes are handled in special ways by the netCDF library (these reserved attributes will have names that begin with a leading underscore character `_'). If you want to be able to use generic applications with your files, you should use the following conventional names for these commonly used attributes:
units
long_name
long_name
attribute assigned, the
variable name will be used as a default.
valid_range
valid_range
attribute should
match the type of its variable.
valid_min
valid_max
valid_range
; this
handles the case where it only makes sense to bound the data below
or above.
scale_factor
add_offset
scale_factor
and add_offset
attributes are present,
the data are first scaled before the offset is added.
The attributes scale_factor
and add_offset
can be used
together to provide simple data compression to store low-resolution
floating-point data as small integers in a netCDF file. When scaled data
are written, the application should first subtract the offset and then
divide by the scale factor.
When scale_factor
and add_offset
are used for packing, the
associated variable (containing the packed data) is typically of type
byte or short, whereas the unpacked values are intended to be of type
float or double. The attributes scale_factor
and
add_offset
should both be of the type intended for the unpacked
data, e.g. float or double.
_FillValue
valid_range
for a
variable. It is not necessary to define your own _FillValue
attribute for a variable if the default fill value for the type of
the variable is adequate. Note that if you change the value of this
attribute, the changed value only applies to subsequent writes;
previously written data are not changed. See section Missing Values, for
more information.
missing_value
missing_value
is a conventional name for a missing value that
will not be treated in any special way be the library, as the
_FillValue
attribute is. It is also useful when it is necessary
to distinguish between two kinds of missing values. For example,
_FillValue
might be useful to indicate data that was expected but
did not appear, whereas missing_value
might be used to indicate
grid regions that are not intended to contain data. See section Missing Values, for more information.
signedness
signedness
with value
"signed"
. Similarly, if you define a variable for an array of
short or long integers and you intend that the values be interpreted as
unsigned, it would be appropriate to define the variable attribute
signedness = "unsigned"
. This attribute is ignored by the netCDF
library, but applications may use it. Since there are no standard
FORTRAN types corresponding to unsigned integers, FORTRAN programs that
compute with or use the ordering of data values may need to handle this
attribute.
C_format
C_format
attribute as "%.3g"
. The
ncdump
utility program uses this attribute for variables for
which it is defined.
FORTRAN_format
title
history
Conventions
pub/netcdf/Conventions/
on the host machine
ftp.unidata.ucar.edu
.
For example, if a group named NUWG agrees upon a set of conventions for
dimension names, variable names, required attributes, and netCDF
representations for certain discipline-specific data structures, they
may store a document describing the agreed-upon conventions in a file in
the NUWG/
subdirectory of the Conventions directory, and files
that followed these conventions would contain a global
Conventions
attribute with value "NUWG"
.
Later, if the group agrees upon some additional conventions for a specific
subset of NUWG data, for example time series data, the description of the
additional conventions might be stored in the NUWG/Time_series/
subdirectory, and files that adhered to these additional conventions would
use the global Conventions
attribute with value
"NUWG/Time_series"
, implying that this file adheres to the NUWG
conventions and also to the additional NUWG time-series conventions.
In contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data. Typically the data in variables of an open netCDF will reside on disk, because the data are too large to fit in memory all at once. In contrast, the total amount of ancillary data associated with a netCDF object and stored in its attributes is typically small enough to be memory-resident.
Another difference between attributes and variables is that variables may be multidimensional. Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension).
Variables are created with a name, type, and shape before they are assigned data values, so a variable may exist with no values. The value of an attribute must be specified when it is created, so no attribute ever exists without a value.
A variable may have attributes, but an attribute cannot have attributes.
Attributes assigned to variables may have the same units as the variable
(for example, valid_range
) or have no units (for example,
scale_factor
). If you want to store data in a netCDF file that
requires units different from those of the associated variable, it is
better to use a variable than an attribute. More generally, if data
require ancillary data to describe them, are multidimensional, require
any of the defined netCDF dimensions to index their values, or require a
significant amount of storage, the data should be represented using
variables rather than attributes.
Go to the previous, next section.