Go to the previous, next section.

Higher-Level netCDF Operations

One of the primary reasons for using the netCDF interface for both scientific data and applications that deal with scientific data is to take advantage of higher-level netCDF operations and generic applications for processing netCDF files. NetCDF operators read one or more netCDF input files and write a netCDF output file. NetCDF utilities read one or more netCDF files and produce some other kind of output (e.g. a text file or graphics), or read some other kind of input (e.g. a text file) and write netCDF files. Currently two netCDF utilities are available as part of the netCDF software distribution: ncgen and ncdump. A preliminary set of example netCDF operators is available as a separate package.

This chapter describes the ncgen and ncdump utilities. These two tools convert between binary netCDF files and an ASCII representation of netCDF files. The output of ncdump and the input to ncgen is an ASCII description of a netCDF file in a tiny language known as CDL (network Common data form Description Language).

The preliminary set of example netCDF operators is available via anonymous FTP of the file pub/sdm/ncopers<version>.tar.Z from host ftp.unidata.ucar.edu. The programs include: ncextr, which extracts a subset of variables from a netCDF file into an output netCDF file; ncstat, which reads a netCDF file and writes selected statistics along one or more summary dimensions into another netCDF file; and ncbarne, which performs a Barnes objective analysis of raw data contained in a netCDF file and writes the result to an output netCDF file. These programs are documented in the ncopers release.

A number of freely-available and commercial software packages are also available for manipulating or graphically displaying netCDF data. A list of these packages is maintained in the file pub/netcdf/utilities.txt from host ftp.unidata.ucar.edu.

CDL Syntax

Below is an example of CDL, describing a netCDF file with several named dimensions (lat, lon, time), variables (z, t, p, rh, lat, lon, time), variable attributes (units, valid_range, _FillValue), and some data.

netcdf foo {    // example netCDF specification in CDL

dimensions:
lat = 10, lon = 5, time = unlimited ;

variables:
  long    lat(lat), lon(lon), time(time);
  float   z(time,lat,lon), t(time,lat,lon);
  double  p(time,lat,lon);
  long    rh(time,lat,lon);

  lat:units = "degrees_north";
  lon:units = "degrees_east";
  time:units = "seconds";
  z:units = "meters";
  z:valid_range = 0., 5000.;
  p:_FillValue = -9999.;
  rh:_FillValue = -1;

data:
  lat   = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
  lon   = -140, -118, -96, -84, -52;
}

All CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments may follow the double slash characters // on any line.

A CDL description consists of three optional parts: dimensions, variables, and data. The variable part may contain variable declarations and attribute assignments.

A dimension is used to define the shape of one or more of the multidimensional variables described by the CDL description. A dimension has a name and a size. At most one dimension in a CDL description can have the unlimited size, which means a variable using this dimension can grow to any length (like a record number in a file).

A variable represents a multidimensional array of values of the same type. A variable has a name, a data type, and a shape described by its list of dimensions. Each variable may also have associated attributes (see below) as well as data values. The name, data type, and shape of a variable are specified by its declaration in the variable section of a CDL description. A variable may have the same name as a dimension; by convention such a variable is one-dimensional and contains coordinates of the dimension it names. Dimensions need not have corresponding variables.

An attribute contains information about a variable or about the whole netCDF data set. Attributes are used to specify such properties as units, special values, maximum and minimum valid values, scaling factors, offsets, and parameters. Attribute information is represented by single values or arrays of values. For example, units is an attribute represented by a character array such as celsius. An attribute has an associated variable, a name, a data type, a length, and a value. In contrast to variables that are intended for data, attributes are intended for ancillary data (data about data).

In CDL, an attribute is designated by a variable and attribute name, separated by a colon (`:'). It is possible to assign global attributes not associated with any variable to the netCDF file as a whole by using the colon (`:') before the attribute name. The data type of an attribute in CDL is derived from the type of the value assigned to it. The length of an attribute is the number of data values or the number of characters in the character string assigned to it. Multiple values are assigned to noncharacter attributes by separating the values with commas (`,'). All values assigned to an attribute must be of the same type.

CDL names for variables, attributes, and dimensions may be any combination of alphabetic or numeric characters as well as `_' and `-' characters, but names beginning with `_' are reserved for use by the library. Case is significant in CDL names. The netCDF library does not enforce any restrictions on netCDF names, so it is possible (though unwise) to define variables with names that are not valid CDL names. The names for the primitive data types are reserved words in CDL, so the names of variables, dimensions, and attributes must not be type names.

The optional data section of a CDL description is where netCDF variables may be initialized. The syntax of an initialization is simple:

        variable = value_1, value_2, ...;

The comma-delimited list of constants may be separated by spaces, tabs, and newlines. For multidimensional arrays, the last dimension varies fastest. Thus, row-order rather than column order is used for matrices. If fewer values are supplied than are needed to fill a variable, it is extended with a type-dependent fill value. The types of constants need not match the type declared for a variable; coercions are done to convert integers to floating point, for example. All meaningful type conversions are supported.

CDL Data Types

The CDL data types are:

char: Characters.
byte: Eight-bit data, including zero bytes.
short: 16-bit signed integers.
long: 32-bit signed integers.
int: (Synonymous with long).
float: IEEE single-precision floating point (32 bits).
real: (Synonymous with float).
double: IEEE double-precision floating point (64 bits).

Except for the added data-type byte and the lack of the type qualifier unsigned, CDL supports the same primitive data types as C. In declarations, type names may be specified in either upper or lower case.

Bytes differ from characters in that they are intended to hold eight bits of data, and the zero byte has no special significance, as it may for character data. Ncgen converts byte declarations to char declarations in the output C code and to the nonstandard BYTE declaration (or similar platform-specific declaration) in output FORTRAN code.

Shorts can hold values between -32768 and 32767. Ncgen converts short declarations to short declarations in the output C code and to the nonstandard INTEGER*2 declaration in output FORTRAN code.

Longs can hold values between -2147483648 and 2147483647. Ncgen converts long declarations to long declarations in the output C code and to INTEGER declarations in output FORTRAN code. In CDL declarations int and integer are accepted as synonyms for long.

Floats can hold values between about -3.4+38 and 3.4+38. Their external representation is as 32-bit IEEE normalized single-precision floating-point numbers. Ncgen converts float declarations to float declarations in the output C code and to REAL declarations in output FORTRAN code. In CDL declarations real is accepted as a synonym for float.

Doubles can hold values between about -1.7+308 and 1.7+308. Their external representation is as 64-bit IEEE standard normalized double-precision, floating-point numbers. Ncgen converts double declarations to double declarations in the output C code and to DOUBLE PRECISION declarations in output FORTRAN code.

CDL Notation for Data Constants

This section explains the CDL notation for netCDF constants.

Attributes are initialized in the variables section of a CDL description by providing a list of constants that determines the attribute's type and length. (In the C and FORTRAN procedural interfaces to the netCDF library, the type and length of an attribute must be explicitly provided when it is defined.) Since neither C nor FORTRAN provide suitable standard syntax to distinguish between constants of type byte and char, short and long, or float and double (except that FORTRAN provides the latter), CDL defines a syntax for constant values that allows it to determine the netCDF type of any constant. The syntax for CDL constants is similar to C syntax, except that type suffixes are appended to shorts and floats to distinguish them from longs and doubles.

A byte constant is represented by a single character or multiple character escape sequence enclosed in single quotes. For example:

'a'     // ASCII a
'\0'    // a zero byte
'\n'    // ASCII newline character
'\33'   // ASCII escape character (33 octal)
'\x2b'  // ASCII plus (2b hex)
'\377'  // 377 octal = 255 decimal, a non-ASCII byte

Character constants are enclosed in double quotes. A character array may be represented as a string enclosed in double quotes. Multiple strings are concatenated into a single array of characters, permitting long character arrays to appear on multiple lines. To support multiple variable-length string values, a conventional delimiter such as `,' may be used, but interpretation of any such convention for a string delimiter must be implemented in software above the netCDF library layer. The usual escape conventions for C strings are honored. For example:

"a"            // ASCII `a'
"Two\nlines\n" // a 10-character string with two embedded newlines
"a bell:\007"  // a string containing an ASCII bell
"ab","cde"     // the same as "abcde"

The form of a short constant is an integer constant with an `s' or `S' appended. If a short constant begins with `0', it is interpreted as octal. When it begins with 0x, it is interpreted as a hexadecimal constant. For example:

2s      // a short 2
0123s   // octal
0x7ffs  // hexadecimal

The form of a long constant is an ordinary integer constant, although it is acceptable to append an optional `l' or `L'. If a long constant begins with `0', it is interpreted as octal. When it begins with 0x, it is interpreted as a hexadecimal constant. Examples of valid long constants include:

-2
1234567890L
0123            // octal
0x7ff           // hexadecimal

The float type is appropriate for representing data with about seven significant digits of precision. The form of a float constant is the same as a C floating-point constant with an `f' or `F' appended. A decimal point is required in a CDL float to distinguish it from an integer. For example, the following are all acceptable float constants:

-2.0f
3.14159265358979f       // will be truncated to less precision
1.f
.1f

The double type is appropriate for representing floating-point data with about 16 significant digits of precision. The form of a double constant is the same as a C floating-point constant. An optional `d' or `D' may be appended. A decimal point is required in a CDL double to distinguish it from an integer. For example, the following are all acceptable double constants:

-2.0
3.141592653589793
1.0e-20
1.d

ncgen

The ncgen tool generates a netCDF file or the C or FORTRAN programs required to create the netCDF file. If no options are specified in invoking ncgen, the program merely checks the syntax of the CDL input, producing error messages for any violations of CDL syntax. Other options can be used to create a netCDF file or to generate a program in C or FORTRAN that calls the routines required to create the netCDF file.

UNIX syntax for invoking ncgen:

ncgen [-b] [-o netcdf-file] [-c] [-f] [-n] [input-file]

where:

`-b': Create a (binary) netCDF file. If the `-o' option is absent, a default file name will be constructed from the netCDF name (specified after the netcdf keyword in the input) by appending the `.nc' extension. If a file already exists with the specified name it will be overwritten.
`-o netcdf-file': Name for the netCDF file created. If this option is specified, it implies the `-b' option. (This option is necessary because netCDF files are direct-access files created with seek calls, and hence cannot be written to standard output.)
`-c': Generate C source code that will create a netCDF file matching the netCDF specification. The C source code is written to standard output.
`-f': Generate FORTRAN source code that will create a netCDF file matching the netCDF specification. The FORTRAN source code is written to standard output.
`-n': Like the `-b' option, except creates a netCDF file with the obsolete `.cdf' extension instead of the `.nc' extension, in the absence of an output filename specified by the `-o' option. This option is only supported for backward compatibility.

Examples

Check the syntax of the CDL file `foo.cdl':

ncgen foo.cdl

From the CDL file `foo.cdl', generate an equivalent binary netCDF file named `flu.nc':

ncgen -o flu.nc foo.cdl

From the CDL file `foo.cdl', generate a C program containing the netCDF function invocations necessary to create an equivalent binary netCDF file:

ncgen -c foo.cdl > foo.c

ncdump

The ncdump tool generates an ASCII representation of a netCDF file on standard output, either with or without an ASCII representation of the variable data in the file. The ASCII representation used is the CDL notation that ncgen accepts as input. Thus ncdump and ncgen can be used as inverses to transform data representation between binary and ASCII representations.

ncdump may also be used as a simple browser for netCDF data files, to display the dimension names and sizes; variable names, types, and shapes; attribute names and values; and optionally, the values of data for all variables or selected variables in a netCDF file.

ncdump defines a default format used for each type of netCDF variable data, but this can be overridden if a C_format attribute is defined for a netCDF variable. In this case, ncdump will use the C_format attribute to format values for that variable. For example, if floating-point data for the netCDF variable Z is known to be accurate to only three significant digits, it would be appropriate to use the variable attribute

Z:C_format = "%.3g"

UNIX syntax for invoking ncdump:

ncdump  [ -c | -h]  [-v var1,...]  [-b lang]  [-f lang]
[-l len]  [ -d fdig[,ddig]] [ -n name]  [input-file]

where:

`-c': Show the values of coordinate variables (variables that are also dimensions) as well as the declarations of all dimensions, variables, and attribute values. Data values of non-coordinate variables are not included in the output. This is often the most suitable option to use for a brief look at the structure and contents of a netCDF file.
`-h': Show only the header information in the output, that is, output only the declarations for the netCDF dimensions, variables, and attributes of the input file, but no data values for any variables. The output is identical to using the `-c' option except that the values of coordinate variables are not included. (At most one of `-c' or `-h' options may be present.)
`-v var1,...': The output will include data values for the specified variables, in addition to the declarations of all dimensions, variables, and attributes. One or more variables must be specified by name in the comma-delimited list following this option. The list must be a single argument to the command, hence cannot contain blanks or other white space characters. The named variables must be valid netCDF variables in the input-file. The default, without this option and in the absence of the `-c' or `-h' options, is to include data values for all variables in the output.
`-b lang': A brief annotation in the form of a CDL comment (text beginning with the characters `//') will be included in the data section of the output for each `row' of data, to help identify data values for multidimensional variables. If lang begins with `C' or `c', then C language conventions will be used (zero-based indices, last dimension varying fastest). If lang begins with `F' or `f', then Fortran language conventions will be used (one-based indices, first dimension varying fastest). In either case, the data will be presented in the same order; only the annotations will differ. This option is useful for browsing through large volumes of multidimensional data.
`-f lang': Full annotations in the form of trailing CDL comments (text beginning with the characters `//') for every data value (except individual characters in character arrays) will be included in the data section. If lang begins with `C' or `c', then C language conventions will be used (zero-based indices, last dimension varying fastest). If lang begins with `F' or `f', then Fortran language conventions will be used (one-based indices, first dimension varying fastest). In either case, the data will be presented in the same order; only the annotations will differ. This option may be useful for piping data into other filters, since each data value appears on a separate line, fully identified.
`-l len': Changes the default maximum line length (80) used in formatting lists of non-character data values.
`-d float_digits[,double_digits]': Specifies default number of significant digits to use in displaying floating-point or double precision data values for variables that don't have a `C_format' attribute. Floating-point data will be displayed with float_digits significant digits. If double_digits is also specoified, double-precision values will be displayed with that many significant digits. If a variable has a `C_format' attribute, that overrides any specified floating-point default. In the absence of any `-d' specifications, floating-point and double-precision data are displayed with 7 and 15 significant digits respectively. CDL files can be made smaller if less precision is required. If both floating-point and double-presision precisions are specified, the two values must appear separated by a comma (no blanks) as a single argument to the command.
`-n name': CDL requires a name for a netCDF data set, for use by `ncgen -b' in generating a default netCDF file name. By default, ncdump constructs this name from the last component of the pathname of the input netCDF file by stripping off any extension it has. Use the `-n' option to specify a different name. Although the output file name used by `ncgen -b' can be specified, it may be wise to have ncdump change the default name to avoid inadvertantly overwriting a valuable netCDF file when using ncdump, editing the resulting CDL file, and using `ncgen -b' to generate a new netCDF file from the edited CDL file.

Examples

Look at the structure of the data in the netCDF file `foo.nc':

ncdump -c foo.nc

Produce an annotated CDL version of the structure and data in the netCDF file `foo.nc', using C-style indexing for the annotations:

ncdump -b c foo.nc > foo.cdl

Output data for only the variables uwind and vwind from the netCDF file `foo.nc', and show the floating-point data with only three significant digits of precision:

ncdump -v uwind,vwind -d 3 foo.nc

Produce a fully-annotated (one data value per line) listing of the data for the variable omega, using Fortran conventions for indices, and changing the netCDF dataset name in the resulting CDL file to omega:

ncdump -v omega -f fortran -n omega foo.nc > Z.cdl

Go to the previous, next section.