Go to the previous, next section.

Data

This chapter discusses the six primitive netCDF data types, the kinds of data access supported by the netCDF interface, and how data structures other than multidimensional arrays may be implemented in a netCDF file.

NetCDF Data Types

The current set of primitive types supported by the netCDF interface are:

byte: Eight-bit data; especially good for saving space when only a few values are possible or resolution is low.
character: Currently synonymous with byte; intended for representing text strings as arrays of ASCII characters.
short: 16-bit integers.
long: 32-bit integers.
float: 32-bit IEEE floating-point.
double: 64-bit IEEE floating-point.

Except for the added byte and the lack of unsigned types, netCDF supports the same primitive data types as C. The names for the primitive data types are reserved words in CDL, so the names of variables, dimensions, and attributes must not be type names. Whether byte, short, or long data is interpreted as signed or unsigned is not part of the netCDF interface; since no netCDF operations depend on the sign or order of variable data, you are free to interpret a byte, for example, as holding values between 0 and 255 or between -128 and 127. For convenience, short and long constants are interpreted as signed in the CDL notation. See section Attribute Conventions for more information on representing signedness of values.

These types were chosen because they are familiar to C and FORTRAN programmers, they have well-defined external representations independent of any particular computers (using XDR), and they are sufficient for providing a reasonably wide range of trade-offs between data precision and number of bits required for each datum.

Additional primitive types may be added in the future, but only in a way that is compatible with existing programs and files. For example, hyperlong for 64-bit integers will eventually be needed, along with a new type for multibyte characters, but these can both be added without affecting existing netCDF files or applications, and with only minor changes required for generic applications that will support them.

Data Access

The netCDF interface supports several kinds of data access to data in an open netCDF file:

direct (random) access to single data values,
direct access to an arbitrary generalized cross-section of data for a single variable,
record-oriented access to data for a single variable, and
record-oriented access to a subset of the variables in a record.

To directly access a single data value, you specify a netCDF file, a variable, and a multidimensional index for the variable. Files are not specified by name every time you want to access data, but instead by a small integer obtained when the file is first created or opened. Similarly, variables are not specified by name for every data access either, but by variable IDs, small integers used to identify variables in a netCDF file.

Data in a netCDF file can be accessed as single values or as hyperslabs. A hyperslab is a generalized piece of a multidimensional variable that is specified by giving the indices of a corner point and a list of edge lengths along each of the dimensions of the variable. The corner point specified must be the one with the smallest indices, that is the one closest to the origin of the variable index space. The block of data values returned (or written) has the last dimension of the variable varying fastest, and the first dimension varying most slowly in the C interface. For FORTRAN, the order is reversed, with the first dimension of the variable varying fastest and the last dimension varying most slowly. These ordering conventions correspond to the customary order in which multidimensional variables are stored in C and FORTRAN.

As an example of hyperslab access, assume that in the example netCDF file (see section Components of a NetCDF File), you wish to read all the data for the temp variable at the second (850 millibar) level, and assume that there are currently three records (time values) in the netCDF file. Recall that the dimensions are defined as

        lat = 5, lon = 10, level = 4, time = unlimited;

and the variable temp is declared as

        float   temp(time, level, lat, lon);

in the CDL notation.

A corresponding C variable that holds data for all four levels and all three times might be declared as

#define LATS  5
#define LONS 10
#define LEVELS 4
#define TIMES 3                 /* currently */
    ...
float   temp[TIMES*LEVELS*LATS*LONS];

to keep the data in a one-dimensional array, or

    ...
float   temp[TIMES][LEVELS][LATS][LONS];

using a multidimensional array declaration. To read in a hyperslab of data for only one level at a time, for example, you could leave out the LEVEL dimension or set it to 1.

In FORTRAN, the dimensions are reversed from the CDL declaration with the first dimension varying fastest and the record dimension as the last dimension of a record variable. Thus a FORTRAN declaration for the coresponding variable that holds all times and levels is

      PARAMETER (LATS=5, LONS=10, LEVELS=4, TIMES=3)
         ...
      REAL TEMP(LONS, LATS, LEVELS, TIMES)

To specify the hyperslab of data that represents the second level, all times, all latitudes, and all longitudes, we need to provide a corner and some edge lengths. The corner should be (0, 1, 0, 0) in C--or (1, 1, 2, 1) in FORTRAN--because you want to start at the beginning of each of the time, lon, and lat dimensions, but you want to begin at the second value of the level dimension. The edge lengths should be (3, 1, 5, 10) in C--or (10, 5, 1, 3) in FORTRAN--since you want to get data for all three time values, only one level value, all five lat values, and all 10 lon values. You should expect to get a total of 150 float values returned (3 * 1 * 5 * 10), and should provide enough space in your array for this many. The order in which the data will be returned is with the last dimension, lon, varying fastest for C, or with the first dimension, LON, varying fastest for FORTRAN:

              C                  FORTRAN

     temp[0][1][0][0]      TEMP(1, 1, 2, 1)
     temp[0][1][0][1]      TEMP(2, 1, 2, 1)
     temp[0][1][0][2]      TEMP(3, 1, 2, 1)
     temp[0][1][0][3]      TEMP(4, 1, 2, 1)

           ...                 ...

     temp[2][1][4][7]      TEMP( 8, 5, 2, 3)
     temp[2][1][4][8]      TEMP( 9, 5, 2, 3)
     temp[2][1][4][9]      TEMP(10, 5, 2, 3)

Note that the different dimension orders for the C and FORTRAN interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the two languages. In general, it does not matter whether a netCDF file is written using the C or FORTRAN interface; netCDF files written from either language may be read by programs written in the other language.

Besides the regular hyperslab access described above, the netCDF abstraction also supports the reading and writing of generalized hyperslabs. A generalized hyperslab has the same corner point and edge lengths attributes as a regular hyperslab. It allows, however, more general mappings between the points of the hyperslab and both the disk-resident values of the netCDF variable and the locations for those values in memory.

In a regular hyperslab, the mapping between the points of the hyperslab and the values of a netCDF variable can be described as contiguous: along each dimension, adjacent hyperslab points correspond to adjacent netCDF variable values. In a generalized hyperslab, however, this is not necessarily true; in any dimension, adjacent generalized hyperslab points correspond to netCDF variable values that are separated by a distance of n values. n is the stride of the dimension and needn't be the same for all dimensions. Strides may reasonably vary from one (to access adjacent netCDF variable values) to the number of points in a netCDF dimension. A regular hyperslab has unity strides in all dimensions.

The other mapping allowed by generalized hyperslabs is between the points of the hyperslab and the memory locations of the corresponding values. In a regular hyperslab, this mapping is trivial: the structure of the in-memory values (i.e. the dimensional sizes and their order) is identical to that of the hyperslab. In a generalized hyperslab, however, this is not necessarily true, instead an index mapping vector is used to define the mapping between points in the generalized hyperslab and the memory locations of the corresponding values. The offset, in bytes, from the origin of a memory-resident array to a particular point is given by the inner product (1) of the index mapping vector with the point's coordinate offset vector (2) . The index mapping vector for a regular hyperslab would have -- in order from most rapidly varying dimension to most slowly -- the byte size of a memory-resident datum (e.g. 4 for a floating-point value), then the product of that value with the edge length of the most rapidly varying dimension of the hyperslab, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a generalized hyperslab, however, the correspondence between hyperslab points and memory locations can be radically diferent. For example, the following C definitions

struct vel {
    int flags;
    float u;
    float v;
} vel[NX][NY];
long imap[2] = {
    sizeof(struct vel),
    sizeof(struct vel)*NY};

where imap is the index mapping vector, can be used to access the memory-resident values of the netCDF variable, vel[NY][NX], even though the dimensions are transposed and the data is contained in a 2-D array of structures rather than a 2-D array of floating-point values.

Note that, although the netCDF abstraction allows the use of generalized hyperslab access if warranted by the situation, it does not mandate it. If you do not need generalized hyperslab access, you may ignore this capability and use regular hyperslab access instead.

To perform conventional record-oriented access, you specify a netCDF file, a record variable (one defined with an unlimited dimension), and for the record number use the value of the first dimension (last dimension in FORTRAN), using hyperslab access to get the record of values.

You may read or write multiple record variables, even if the variables are of different types, with a single call in the C interface. In this case you must write a whole record's-worth of data for each desired variable. This interface is supported as a convenience for C programmers, but is not strictly necesary, since the same result may be achieved with one read or write call for each variable. An equivalent portable FORTRAN interface that replaces multiple calls with a single call is unfortunately not possible. However, data written using the C interface can still be read with the FORTRAN interface, using one call per variable.

When efficiency is a concern, you should keep in mind the order in which netCDF data are written on the disk, since the best I/O performance is achieved by reading or writing contiguous data. All variable data are ordered with the last dimension for each variable varying fastest in the C interface, or the slowest in the FORTRAN interface. This means that for record variables in particular, at least one disk access per record will be required for reading a value from each record. Hence reading a hyperslab that takes one value out of each record will require as many disk accesses as the number of values requested. For writing, the situation is even worse, since each record must first be read and then rewritten to change a single value within a record. If you have a choice about the order in which data is accessed or the order of the dimensions that define the shape of a variable, try to choose these two orders in harmony to avoid needless inefficiency.

Data Structures

The only kind of data structure directly supported by the netCDF abstraction is a collection of named scalar and multidimensional variables with attached vector attributes. NetCDF is not particularly well-suited for storing linked lists, trees, sparse matrices, or other kinds of data structures requiring pointers. The underlying XDR library on which netCDF is implemented is quite suitable for storing and retrieving arbitrary data structures in a network-transparent way, but such structures will no longer be self-describing unless you encode information about the structure with the data. It is possible to build other kinds of data structures from sets of multidimensional arrays by adopting various conventions regarding the use of data in one array as pointers into another array. The netCDF library won't provide much help or hindrance with constructing such data structures, but netCDF provides the mechanisms with which such conventions can be designed.

For example, it is possible to use a variable attribute to name an associated index variable: the variable attribute `array_index_var = "v_index"' might provide the name of another associated variable to be used as an index for fast retrieval by value in the variable to which the attribute is attached.

As another example, netCDF variables may be grouped within a netCDF file by defining attributes that list the names of the variables in each group, separated by a conventional delimiter such as a space or comma. A convention can be adopted to use particular sorts of attribute names for such groupings, so that an arbitrary number of named groups of variables can be supported. If needed, a particular conventional attribute for each variable might list the names of the groups of which it is a member. Use of attributes or variables that refer to other attributes or variables provides a flexible mechanism for representing complex structures in netCDF files.

Go to the previous, next section.