Go to the previous, next section.

Components of a NetCDF File

A netCDF file has dimensions, variables, and attributes. These components can be used together to capture the meaning of data and relations among data fields in a scientific data set.

We will use a small netCDF example to illustrate the concepts of netCDF dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language). It provides an easily comprehended text version of the structure and contents of a binary netCDF file:

netcdf example_1 {  // example of CDL notation for a netCDF file

dimensions:          // dimension names and sizes are declared first
        lat = 5, lon = 10, level = 4, time = unlimited;

variables:           // variable types, names, shapes, attributes
        float   temp(time,level,lat,lon);
                    temp:long_name     = "temperature";
                    temp:units         = "celsius";
        float   rh(time,lat,lon);
                    rh:long_name = "relative humidity";
                    rh:valid_range = 0.0, 1.0;      // min and max
        int     lat(lat), lon(lon), level(level);
                    lat:units       = "degrees_north";      
                    lon:units       = "degrees_east";
                    level:units     = "millibars";
        short   time(time);
                    time:units      = "hours since 1990-11-25 12:00 UTC";
        // global attributes
                    :source = "National Weather Service";

data:                // optional data assignments
        level   = 1000, 850, 700, 500;
        lat     = 20, 30, 40, 50, 60;
        lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
        time    = 12;
        rh      =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7,
                 .1,.3,.1,.1,.1,.1,.5,.7,.8,.8,
                 .1,.2,.2,.2,.2,.5,.7,.8,.9,.9,
                 .1,.2,.3,.3,.3,.3,.7,.8,.9,.9,
                  0,.1,.2,.4,.4,.4,.4,.7,.9,.9;
}

The CDL notation for a netCDF file can be generated automatically by using ncdump, a utility program described later (see section ncdump). Another netCDF utility, ncgen, generates a netCDF file (or optionally C or FORTRAN source code containing calls needed to produce a netCDF file) from CDL input (see section ncgen). It is not necessary to learn much about CDL notation to use the netCDF library; we use it in this document as a concise way of presenting netCDF examples.

The CDL notation will be explained more fully as we describe the components of a netCDF file. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters `//' on any line. A CDL description of a netCDF file takes the form

  netCDF name {
    dimensions: ...
    variables: ...
    data: ...
  }

where the name is used only as a default in constructing the name of the file generated by the ncgen utility. The CDL description consists of three optional parts, introduced by the keywords dimensions, variables, and data. NetCDF dimension declarations appear after the dimensions keyword, netCDF variables and attributes are defined after the variables keyword, and variable data assignments appear after the data keyword.

Dimensions

A netCDF dimension is a named integer used to specify the shape of one or more of the multi-dimensional variables contained in a netCDF file. A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height. A dimension might also be used to index more abstract quantities, for example, color-table entry number, instrument number, station-time pair, or model-run ID.

Every netCDF dimension has both a name and a size. A dimension name is an arbitrary sequence of alphanumeric characters (as well as the underscore character, `_', and the hyphen character, `-') beginning with a letter. Case is distinguished in netCDF names. A dimension size is an arbitrary positive integer, except that one dimension in a netCDF file can have the size UNLIMITED. Such a dimension is called the unlimited dimension or the record dimension. A variable with an unlimited dimension can grow to any length along that dimension. The unlimited dimension is like a record number in conventional record-oriented files.

CDL dimension declarations may appear on one or more lines following the CDL keyword dimensions. Multiple dimension declarations on the same line may be separated by commas. Each declaration is of the form name = size.

There are four dimensions in the example: lat, lon, level, and time. The first three are assigned fixed sizes; time is assigned the size UNLIMITED, which means it is the unlimited dimension. A netCDF file can have at most one unlimited dimension, but need not have any.

There are several uses for netCDF dimensions:

Specifying the shapes and sizes of variables.
Identifying and relating variables that are defined on a common grid.
Providing a way to define coordinate systems.

We discuss each of these uses below.

Using Dimensions to Specify Variable Shapes

The basic unit of named data in a netCDF file is a variable. In general, a variable is a multidimensional object that has, among other characteristics, a shape, which is defined by the number, order, and sizes of its dimensions. When a netCDF variable is defined, the number and order of the dimensions that define its shape are specified. Hence you must first create the necessary dimensions before creating a netCDF variable that uses them.

It is possible to use the same dimension more than once in specifying a variable shape, for example var(dim, dim), but it does not make much sense to do this; it is contrary to the intuitive meaning of a physical dimension. A variable that has two dimensions that happen to be the same size is more accurately modeled by using two dimensions with different names but the same size.

Using Dimensions to Relate Variables

Two dimensions may have the same size, perhaps by coincidence, without being related in any other way. Dimension names provide a way to distinguish dimensions regardless of size.

Variables are related by the dimensions they share. For example, if two variables are defined with the same dimensions, they might represent observations or model output for the same set of points. In the example netCDF file, variables temp and rh share the time, lat, and lon dimensions. The variable rh does not have level as a dimension, perhaps because it is only defined for a single level.

Using Dimensions to Define Coordinate Systems

Besides serving as sizes for integer indexes to multidimensional variables, dimensions may be used to define coordinate systems for variable data. To do this, create a variable with the same name as a dimension and specify coordinate values for that variable. A variable should only be given the same name as a dimension in a netCDF file when it is to be used as a coordinate variable. Such variables are indexed by the dimension for which they provide coordinate values, for example, lat(lat).

It is not necessary to provide a coordinate variable for each dimension; if no such variable is defined, the coordinate values of the dimension are assumed to be 0, 1, 2, ... (for C programs) or 1, 2, 3, ... (for FORTRAN programs). Although the C and FORTRAN interfaces support different conventions for index numbering, there is no difference between the actual netCDF files written by C and FORTRAN programs. Programs written in either language can be used to access data written by programs using the other interface.

In the CDL example, each dimension has an associated coordinate variable with the same name as the dimension. The four values of the level index, 0, 1, 2, 3, (1, 2, 3, 4 in FORTRAN) are related in coordinate-like fashion to the four values (1000, 850, 700, 500) of the level variable. Note that there is no requirement that coordinates be equally spaced or increasing. It would not make much sense for two coordinate values to be the same, but the meaning of coordinate variables is enforced only by conventions of application packages and utilities, not by the netCDF interface. The application packages and utilities that make use netCDF coordinate variables assume they are one-dimensional and monotonic (all values either increasing or decreasing).

Variables

A variable represents a multidimensional array of values of the same type. A variable has a name, a data type, and a shape described by its list of dimensions, all of which are specified when the variable is created. Each variable may also have data values and associated attributes, which may be added or changed after the variable is created. Variables are used to store the bulk of the data in a netCDF file, and are the primary component used by utilities to identify sub-parts of a netCDF file.

Like a dimension name, a variable name is an arbitrary sequence of alphanumeric characters (also including `_' and `-') beginning with a letter. Case is distinguished in variable names. Long names help to make a netCDF file self-documenting, but ancillary information about a variable is better stored in variable attributes (discussed below) than encoded as part of the name.

A variable data type is one of a small set of netCDF types that have the names NC_BYTE, NC_CHAR, NC_SHORT, NC_LONG, NC_FLOAT, and NC_DOUBLE in the C interface and the corresponding names NCBYTE, NCCHAR, NCSHORT, NCLONG, NCFLOAT, and NCDOUBLE in the FORTRAN interface. In the CDL notation, these types are given the simpler names byte, char, short, long, float, and double. int may be used as a synonym for long and real may be used as a synonym for float in the CDL notation. We will postpone a discussion of the exact meaning of each of the types until the discussion of data, below. For now, it suffices to know that the choice of the type used to represent variable data depends on the range of values it can have, the precision to which values are known, and the number of bits required to represent the variable in a netCDF file on disk.

The shape of a variable is specified by its list of dimensions. If a variable has an unlimited dimension, that dimension must appear first in the list of dimensions in CDL. It is possible to define variables with no dimensions, also called scalar variables. There are no scalar variables in the example netCDF file.

CDL variable declarations appear after the variables keyword in a CDL unit. They have the form

     type variable_name  ( dim_name_1, dim_name_2, ... ) ;

for variables with dimensions, or

     type variable_name ;

for scalar variables.

In the CDL example there are six variables. As discussed above, four of these are coordinate variables for dimensions. The remaining variables, temp and rh, contain what is usually thought of as the data. Each of these variables has the unlimited dimension time as its first dimension, so they are called record variables. A variable that is not a record variable has a fixed size (number of data values) given by the product of its dimensions. A record variable has a current size, given by the product of the maximum record written so far and the other dimensions of the variable. Only record variables may grow after they are defined.

Attributes

A netCDF attribute is meant to contain information about a netCDF variable or about an entire netCDF file. This information is ancillary data, or data about data, analogous to the information stored in data dictionaries and schema in conventional database systems. An attribute has an associated variable, a name, a data type, a length, and a value. Individual attributes are identified by specifying a variable and an attribute name.

Each attribute is associated with a single variable when it is created. Attributes for different variables may differ in data type, length, and values even though they share the same name.

A global attribute is one that applies to the whole netCDF rather than any particular variable. Global attributes are defined and accessed similarly to variable attributes; the details for defining global attributes in the CDL notation and in the netCDF procedural interface are presented later.

Attribute names follow the same rules as dimension and variable names. Providing meaningful names for attributes is important, but using agreed on conventional names is also required if generic applications and utility programs will be used on a netCDF file. For example, every variable for which units make sense should have a units attribute defined, so the units can be printed in labels. Furthermore, if the netCDF file is ever to be used as input to generic units-aware software, the values of the units attributes should be expressed in a conventional form as a character string that can be interpreted by that software.

The type of an attribute is specified when it is created. The types permitted for attributes are exactly the same as the netCDF data types used in creating variables. Attributes with the same name for different variables should sometimes be of different types. For example, the attribute valid_max specifying the maximum valid data value for a variable of type long should be of type long, whereas the attribute valid_max for a variable of type double should instead be of type double.

In addition to specifying the associated variable, attribute name, and type, the length and value of an attribute must also be specified when it is created. The information in an attribute is represented by either a single value (length 1) or a vector of values of the same type. Since "character string" is not a basic netCDF data type, string-valued attributes have a vector of characters as their value, with a length equal to the length of the character string.

Attributes are more dynamic than variables or dimensions; they can have their type, length, and values changed after they are created. For example, an attribute max_value might store the maximum value seen so far for a record variable, and might be updated every time a new record is written.

The CDL notation for defining an attribute is

    variable_name:attribute_name = list_of_values ;

for a variable attribute, or

    :attribute_name = list_of_values ;

for a global attribute. The type and length of each attribute are not explicitly declared in CDL; they are derived from the values assigned to the attribute. All values of an attribute must be of the same type. The notation used for constant values of the various netCDF types is discussed later (see section CDL Notation for Data Constants).

In the example netCDF (see section Components of a NetCDF File), units is an attribute for the variable lat that has a 13 character array value `degrees_north'. And valid_range is an attribute for the variable rh that has length 2 and values `0.0' and `1.0'.

One global attribute---source---is defined for the example netCDF file. This is a character array intended for documenting the data. Real netCDF files typically have more global attributes to document the origin, history, accuracy, and other characteristics of the data.

Attribute Conventions

Generic applications that take netCDF files as input may by convention expect certain variable and global attributes. A few other attributes are handled in special ways by the netCDF library (these reserved attributes will have names that begin with a leading underscore character `_'). If you want to be able to use generic applications with your files, you should use the following conventional names for these commonly used attributes:

units

A character array that specifies the units used for the variable's data. A standard for conventional ways to name units in each specific discipline should be used, if available. Unidata has developed a freely-available library of routines to convert between character string and binary forms of unit specifications and to perform various useful operations on the binary forms. This library is used in some netCDF applications. Using the recommended units syntax permits data represented in conformable units to be automatically converted to common units for algebraic operations. See section Units, for more information.

long_name

A long descriptive name. This could be used for labelling plots, for example. If a variable has no long_name attribute assigned, the variable name will be used as a default.

valid_range

An array of two numbers specifying the minimum and maximum valid values for this variable. The type of each valid_range attribute should match the type of its variable.

valid_min

valid_max

One or both of these may be used instead of valid_range; this handles the case where it only makes sense to bound the data below or above.

scale_factor

If present for a variable, the data are to be multiplied by this factor after the data are read by the application that accesses the data.

add_offset

If present for a variable, this number is to be added to the data after it is read by the application that accesses the data. If both scale_factor and add_offset attributes are present, the data are first scaled before the offset is added. The attributes scale_factor and add_offset can be used together to provide simple data compression to store low-resolution floating-point data as small integers in a netCDF file. When scaled data are written, the application should first subtract the offset and then divide by the scale factor.

When scale_factor and add_offset are used for packing, the associated variable (containing the packed data) is typically of type byte or short, whereas the unpacked values are intended to be of type float or double. The attributes scale_factor and add_offset should both be of the type intended for the unpacked data, e.g. float or double.

_FillValue

If a scalar attribute with this name is defined for a variable and is of the same type as the variable, it will be subsequently used as the fill value for that variable. The purpose of this attribute is to save the applications programmer the work of prefilling the data and also to eliminate the duplicate writes that result from netCDF filling in missing data with its default fill value, only to be immediately overwritten by the programmer's preferred value. This value is considered to be a special value that indicates missing data, and is returned when reading values that were not written. The missing value should be outside the range specified by valid_range for a variable. It is not necessary to define your own _FillValue attribute for a variable if the default fill value for the type of the variable is adequate. Note that if you change the value of this attribute, the changed value only applies to subsequent writes; previously written data are not changed. See section Missing Values, for more information.

missing_value

missing_value is a conventional name for a missing value that will not be treated in any special way be the library, as the _FillValue attribute is. It is also useful when it is necessary to distinguish between two kinds of missing values. For example, _FillValue might be useful to indicate data that was expected but did not appear, whereas missing_value might be used to indicate grid regions that are not intended to contain data. See section Missing Values, for more information.

signedness

Used to indicate a nondefault interpretation of the signedness of integer values. By default, applications that deal with values should treat netCDF byte data as unsigned and netCDF short and long integer data as signed. If you declare a netCDF variable for storing bytes, and you intend that the values represent signed quantities, you should declare the variable attribute signedness with value "signed". Similarly, if you define a variable for an array of short or long integers and you intend that the values be interpreted as unsigned, it would be appropriate to define the variable attribute signedness = "unsigned". This attribute is ignored by the netCDF library, but applications may use it. Since there are no standard FORTRAN types corresponding to unsigned integers, FORTRAN programs that compute with or use the ordering of data values may need to handle this attribute.

C_format

A character array for the format that should be used to print values for this variable by C applications. For example, if you know a variable is only accurate to three significant digits, it would be appropriate to define the C_format attribute as "%.3g". The ncdump utility program uses this attribute for variables for which it is defined.

FORTRAN_format

A character array for the format that should be used to print values for this variable by FORTRAN applications.

title

A global attribute that is a character array providing a succinct description of what is in the data set.

history

A global attribute that is a character array with a line for each invocation of a program and arguments that were used to derive the file. Well-behaved generic netCDF filters (programs that take netCDF files as input and produce netCDF files as output) will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file.

Conventions

If present, `Conventions' is a global attribute that is a character array for the name of the conventions followed by the file, in the form of a string that is interpreted as a directory name relative to a directory that is a repository of documents describing sets of discipline-specific conventions. This permits a hierarchical structure for conventions and provides a place where descriptions and examples of the conventions may be maintained by the defining institutions and groups. The conventions path name is currently interpreted relative to the directory pub/netcdf/Conventions/ on the host machine ftp.unidata.ucar.edu.

For example, if a group named NUWG agrees upon a set of conventions for dimension names, variable names, required attributes, and netCDF representations for certain discipline-specific data structures, they may store a document describing the agreed-upon conventions in a file in the NUWG/ subdirectory of the Conventions directory, and files that followed these conventions would contain a global Conventions attribute with value "NUWG".

Later, if the group agrees upon some additional conventions for a specific subset of NUWG data, for example time series data, the description of the additional conventions might be stored in the NUWG/Time_series/ subdirectory, and files that adhered to these additional conventions would use the global Conventions attribute with value "NUWG/Time_series", implying that this file adheres to the NUWG conventions and also to the additional NUWG time-series conventions.

Attributes may be added to a netCDF file long after it is first defined, so you don't have to anticipate all potentially useful attributes. However adding new attributes to an existing file can incur the same expense as copying the file. See section NetCDF File Structure and Performance, for a more extensive discussion.

Differences between Attributes and Variables

In contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data. Typically the data in variables of an open netCDF will reside on disk, because the data are too large to fit in memory all at once. In contrast, the total amount of ancillary data associated with a netCDF object and stored in its attributes is typically small enough to be memory-resident.

Another difference between attributes and variables is that variables may be multidimensional. Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension).

Variables are created with a name, type, and shape before they are assigned data values, so a variable may exist with no values. The value of an attribute must be specified when it is created, so no attribute ever exists without a value.

A variable may have attributes, but an attribute cannot have attributes. Attributes assigned to variables may have the same units as the variable (for example, valid_range) or have no units (for example, scale_factor). If you want to store data in a netCDF file that requires units different from those of the associated variable, it is better to use a variable than an attribute. More generally, if data require ancillary data to describe them, are multidimensional, require any of the defined netCDF dimensions to index their values, or require a significant amount of storage, the data should be represented using variables rather than attributes.

Go to the previous, next section.