File: guidef-7.html

package info (click to toggle)
netcdf-doc 1%3A3a-2.1
links: PTS
area: main
in suites: etch, etch-m68k, sarge
size: 3,052 kB
ctags: 1,463
sloc: makefile: 58; sh: 17
file content (163 lines) | stat: -rw-r--r-- 21,611 bytes
parent folder | download | duplicates (2)
<!-- Generated by Harlequin WebMaker 2.2.3 (24-Apr-1996)
LispWorks 3.2.2 -->
<HTML> <HEAD>
<TITLE>2	 Components of a NetCDF Dataset</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
<A NAME=HEADING7></A>
<A HREF="guidef-8.html">[Next] </A><A HREF="guidef-6.html">[Previous] </A><A HREF="guidef-1.html">[Top] </A><A HREF="guidef-3.html">[Contents] </A><A HREF="guidef-21.html">[Index] </A><A HREF="http://www.unidata.ucar.edu/packages/netcdf/">[netCDF Home Page]</A><A HREF="http://www.unidata.ucar.edu/">[Unidata Home Page]</A><P>
NetCDF User's Guide for Fortran<P>
<A NAME=HEADING7-0></A>
<H1>2  Components of a NetCDF Dataset</H1>
<HR>
<A NAME=HEADING7-1></A>
<H2>2.1  The NetCDF Data Model</H2>
<HR>
 A netCDF dataset contains <I>dimensions</I>, <I>variables</I>, and <I>attributes</I>, which all have both a name and an ID number by which they are identified. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The netCDF library allows simultaneous access to multiple netCDF datasets which are identified by dataset ID numbers, in addition to ordinary file names. <P>
 A <A NAME=MARKER-2-1570></A><A NAME=MARKER-2-1571></A><A NAME=MARKER-2-1572></A><A NAME=MARKER-2-1573></A><A NAME=MARKER-2-1574></A>netCDF dataset contains a symbol table for variables containing their name, data type, rank (number of dimensions), dimensions, and starting disk address. Each element is stored at a disk address which is a linear function of the array indices (subscripts) by which it is identified. Hence, these indices need not be stored separately (as in a relational database). This provides a fast and compact storage method. <P>
<A NAME=HEADING7-4></A>
<H2>2.1.1  Naming Conventions</H2>
 The<A NAME=MARKER-2-1575></A><A NAME=MARKER-2-1576></A><A NAME=MARKER-2-1577></A><A NAME=MARKER-2-1578></A><A NAME=MARKER-2-1579></A><A NAME=MARKER-2-1580></A><A NAME=MARKER-2-1581></A><A NAME=MARKER-2-1582></A><A NAME=MARKER-2-1583></A><A NAME=MARKER-2-1584></A><A NAME=MARKER-2-1585></A> names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '<CODE>_</CODE>' and hyphen '<CODE>-</CODE>'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names. <P>
<A NAME=HEADING7-6></A>
<H2>2.1.2  <A NAME=MARKER-9-1586></A>network Common Data Form Language (CDL)</H2>
 We will use a small netCDF example to illustrate the concepts of the netCDF data model. This includes dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language), which provides a convenient way of describing netCDF datasets. The netCDF system includes utilities for producing human-oriented CDL text files from binary netCDF datasets and vice versa. <P>
<PRE>
netcdf example_1 {  // example of CDL notation for a netCDF dataset

dimensions:         // dimension names and lengths are declared first
        lat = 5, lon = 10, level = 4, time = unlimited;

variables:          // variable types, names, shapes, attributes
        float   temp(time,level,lat,lon);
                    temp:long_name     = &quot;temperature&quot;;
                    temp:units         = &quot;celsius&quot;;
        float   rh(time,lat,lon);
                    rh:long_name = &quot;relative humidity&quot;;
                    rh:valid_range = 0.0, 1.0;      // min and max
        int     lat(lat), lon(lon), level(level);
                    lat:units       = &quot;degrees_north&quot;;
                    lon:units       = &quot;degrees_east&quot;;
                    level:units     = &quot;millibars&quot;;
        short   time(time);
                    time:units      = &quot;hours since 1996-1-1&quot;;
        // global attributes
                    :source = &quot;Fictional Model Output&quot;;

data:                // optional data assignments
        level   = 1000, 850, 700, 500;
        lat     = 20, 30, 40, 50, 60;
        lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
        time    = 12;
        rh      =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7,
                 .1,.3,.1,.1,.1,.1,.5,.7,.8,.8,
                 .1,.2,.2,.2,.2,.5,.7,.8,.9,.9,
                 .1,.2,.3,.3,.3,.3,.7,.8,.9,.9,
                  0,.1,.2,.4,.4,.4,.4,.7,.9,.9;
}
</PRE>
 The <A NAME=MARKER-2-1590></A><A NAME=MARKER-2-1591></A>CDL notation for a netCDF dataset can be generated automatically by using <CODE>ncdump</CODE>, a utility program described later (<A HREF=guidef-15.html#MARKER-9-3208>see Section 10.5 "ncdump,"  page 104</A>). Another netCDF utility, <CODE>ncgen</CODE>, generates a netCDF dataset (or optionally C or FORTRAN source code containing calls needed to produce a netCDF dataset) from CDL input (<A HREF=guidef-15.html#MARKER-9-3190>see Section 10.4 "ncgen,"  page 103</A>). <P>
 The CDL notation is simple and largely self-explanatory. It will be explained more fully as we describe the components of a netCDF dataset. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters '<CODE>//</CODE>' on any line. A CDL description of a netCDF dataset takes the form<P>
 <P>
<PRE>
  netCDF <I>name</I> {
    dimensions: ... 
    variables: ... 
    data: ... 
  }
</PRE>
 where the <I>name</I> is used only as a default in constructing file names by the <CODE>ncgen</CODE> utility. The CDL description consists of three optional parts, introduced by the keywords <CODE>dimensions</CODE>, <CODE>variables</CODE>, and <CODE>data</CODE>. NetCDF dimension declarations appear after the <CODE>dimensions</CODE> keyword, netCDF variables and attributes are defined after the <CODE>variables</CODE> keyword, and variable data assignments appear after the <CODE>data</CODE> keyword. <P>
<A NAME=HEADING7-49></A>
<H2>2.2  Dimensions</H2>
<HR>
 A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height. A dimension might also be used to index other quantities, for example station or model-run-number. <P>
 A<A NAME=MARKER-2-1593></A><A NAME=MARKER-2-1594></A> netCDF dimension has both a <I>name</I> and a <I>length</I>. A dimension length is an arbitrary positive integer, except that one dimension in a netCDF dataset can have the length <CODE>UNLIMITED</CODE>. <P>
 Such a dimension is called the <I>unlimited dimension</I> or the <I>record dimension</I>. A variable with an unlimited dimension can grow to any length along that dimension. The unlimited dimension index is like a record number in conventional record-oriented files. A netCDF dataset can have at most one unlimited dimension, but need not have any. If a variable has an unlimited dimension, that dimension must be the most significant (slowest changing) one. Thus any unlimited dimension must be the first dimension in a CDL shape and the <A NAME=MARKER-10-1610></A>last dimension in corresponding <A NAME=MARKER-10-1611></A>FORTRAN array declarations. <P>
 CDL dimension declarations may appear on one or more lines following the CDL keyword <CODE>dimensions</CODE>. Multiple dimension declarations on the same line may be separated by commas. Each declaration is of the form <I>name</I> = <I>length</I>. <P>
 There are four dimensions in the above example: <CODE>lat</CODE>, <CODE>lon</CODE>, <CODE>level</CODE>, and <CODE>time</CODE>. The first three are assigned fixed lengths; <CODE>time</CODE> is assigned the length <CODE>UNLIMITED</CODE>, which means it is the <I>unlimited</I> dimension. <P>
 The basic unit of named data in a netCDF dataset is a <I>variable</I>. When a variable is defined, its <I>shape</I> is specified as a list of dimensions. These dimensions must already exist. The number of dimensions is called the <I>rank</I> (a.k.a. <I>dimensionality</I>). A scalar variable has rank 0, a vector has rank 1 and a matrix has rank 2. <P>
 It is possible to use the same dimension more than once in specifying a variable shape (but this was not possible in previous netCDF versions). For example, <CODE>correlation(instrument, instrument)</CODE> could be a matrix giving correlations between measurements using different instruments. But data whose dimensions correspond to those of physical space/time should have a shape comprising different dimensions, even if some of these have the same length. <P>
<A NAME=HEADING7-57></A>
<H2>2.3  <A NAME=MARKER-9-1616></A>Variables</H2>
<HR>
 Variables<A NAME=MARKER-2-1617></A><A NAME=MARKER-2-1618></A><A NAME=MARKER-2-1623></A><A NAME=MARKER-2-1624></A><A NAME=MARKER-2-1625></A><A NAME=MARKER-2-1630></A> are used to store the bulk of the data in a netCDF dataset. A <I>variable</I> represents an array of values of the same type. A scalar value is treated as a 0-dimensional array. A variable has a name, a data type, and a shape described by its list of dimensions specified when the variable is created. A variable may also have associated attributes, which may be added, deleted or changed after the variable is created. <P>
 A variable external data type is one of a small set of netCDF <A NAME=MARKER-2-1639></A><I>types</I> that have the names<A NAME=MARKER-10-1640></A><CODE><A NAME=MARKER-2-1641></A><A NAME=MARKER-2-1642></A>NF_BYTE</CODE>(with synonym <A NAME=MARKER-2-1643></A><A NAME=MARKER-2-1644></A><CODE>NF_INT1</CODE>), <A NAME=MARKER-10-1645></A><CODE><A NAME=MARKER-2-1646></A><A NAME=MARKER-2-1647></A>NF_CHAR</CODE>, <A NAME=MARKER-10-1648></A><CODE><A NAME=MARKER-2-1649></A><A NAME=MARKER-2-1650></A>NF_SHORT</CODE> (with synonym <A NAME=MARKER-2-1651></A><A NAME=MARKER-2-1652></A><CODE>NF_INT2</CODE>), <A NAME=MARKER-10-1653></A><CODE><A NAME=MARKER-2-1654></A>NF_INT,<A NAME=MARKER-10-1655></A><A NAME=MARKER-2-1656></A><A NAME=MARKER-2-1657></A>NF_FLOAT</CODE> (with synonym <A NAME=MARKER-2-1658></A><A NAME=MARKER-2-1659></A><CODE>NF_REAL</CODE>), and <CODE><A NAME=MARKER-10-1660></A><A NAME=MARKER-2-1661></A><A NAME=MARKER-2-1662></A>NF_DOUBLE</CODE> in the <A NAME=MARKER-10-1663></A>FORTRAN interface. <A NAME=MARKER-10-1664></A><P>
 In the <A NAME=MARKER-2-1665></A><A NAME=MARKER-2-1666></A>CDL notation, these types are given the simpler names <CODE>byte</CODE>,<A NAME=MARKER-2-1670></A><A NAME=MARKER-2-1671></A><A NAME=MARKER-2-1672></A> <CODE>char<A NAME=MARKER-2-1673></A><A NAME=MARKER-2-1674></A><A NAME=MARKER-2-1675></A></CODE>, <CODE>short</CODE>, <A NAME=MARKER-2-1676></A><A NAME=MARKER-2-1677></A><A NAME=MARKER-2-1678></A><CODE>int</CODE>, <CODE>float</CODE>, and <CODE>double</CODE>.<A NAME=MARKER-2-1685></A><A NAME=MARKER-2-1686></A><A NAME=MARKER-2-1687></A> <CODE>real</CODE> may be used as a synonym for <CODE>float</CODE> in the CDL notation. <A NAME=MARKER-2-1688></A><A NAME=MARKER-2-1689></A><A NAME=MARKER-2-1690></A><A NAME=MARKER-2-1691></A><CODE>long </CODE>is a deprecated synonym for <CODE>int</CODE>. The exact meaning of each of the types is discussed in <A HREF=guidef-8.html#MARKER-9-1741>Section 3.1 "netCDF external data types,"  page 15</A>. <P>
 CDL<A NAME=MARKER-2-1692></A><A NAME=MARKER-2-1693></A><A NAME=MARKER-2-1694></A> variable declarations appear after the <CODE>variable</CODE> keyword in a CDL unit. They have the form<P>
<PRE>
     <I>type</I> <I>variable_name</I> ( <I>dim_name_1, dim_name_2, ... </I>);
</PRE>
 for variables with dimensions, or<P>
<PRE>
     <I>type</I> <I>variable_name</I>;
</PRE>
 for scalar variables.<P>
 In <A NAME=MARKER-2-1695></A><A NAME=MARKER-2-1696></A><A NAME=MARKER-2-1697></A><A NAME=MARKER-2-1698></A>the above CDL example there are six variables. As discussed below, four of these are coordinate variables. The remaining variables (sometimes called <I>primary variables</I>), <CODE>temp</CODE> and <CODE>rh</CODE>, contain what is usually thought of as the data. Each of these variables has the unlimited dimension <CODE>time </CODE>as its first dimension, so they are called <I>record variables</I>. A variable that is not a record variable has a fixed length (number of data values) given by the product of its dimension lengths. The length of a record variable is also the product of its dimension lengths, but in this case the product is variable because it involves the length of the unlimited dimension, which can vary. The length of the unlimited dimension is the number of records. <P>
<A NAME=HEADING7-67></A>
<H2>2.3.1  Coordinate Variables</H2>
 It is legal for a variable to have the same name as a dimension. Such variables have no special meaning to the netCDF library. However there is a convention that such variables should be treated in a special way by software using this library. <P>
 A <A NAME=MARKER-2-1699></A><A NAME=MARKER-2-1700></A>variable with the same name as a dimension is called a <I>coordinate variable</I>. It typically defines a physical coordinate corresponding to that dimension. The above CDL example includes the coordinate variables <CODE>lat</CODE>, <CODE>lon</CODE>, <CODE>level</CODE> and <CODE>time</CODE>, defined as follows: <P>
<PRE>
        int     lat(lat), lon(lon), level(level);
        short   time(time);
... 
data:
        level   = 1000, 850, 700, 500;
        lat     = 20, 30, 40, 50, 60;
        lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
        time    = 12;
</PRE>
 These define the latitudes, longitudes, barometric pressures and times corresponding to positions along these dimensions. Thus there is data at altitudes corresponding to 1000, 850, 700 and 500 millibars; and at latitudes 20, 30, 40, 50 and 60 degrees north. Note that each coordinate variable is a vector and has a shape consisting of just the dimension with the same name. <P>
 A position along a dimension can be specified using an <I>index</I>. This is an integer with a minimum value of <A NAME=MARKER-10-1703></A>1 for FORTRAN programs. Thus the 700 millibar level would have an index value of <A NAME=MARKER-10-1704></A>3 in the example above.<P>
 If a dimension has a corresponding coordinate variable, then this provides an alternative, and often more convenient, means of specifying position along it. Current application packages that make use of coordinate variables commonly assume they are numeric vectors and strictly monotonic (all values are different and either increasing or decreasing). <P>
<A NAME=HEADING7-81></A>
<H2>2.4  Attributes</H2>
<HR>
 NetCDF <I>attributes</I> are used to store data about the data (<I>ancillary data</I> or <I>metadata</I>), similar in many ways to the information stored in data dictionaries and schema in conventional database systems. Most attributes provide information about a specific variable. These are identified by the name (or ID) of that variable, together with the name of the attribute. <P>
 Some attributes provide information about the dataset as a whole and are called <I>global</I> attributes. These are identified by the attribute name together with a blank variable name (in CDL) or a special null "global variable" ID (in C or Fortran).<P>
 An attribute has an associated variable (the null "global variable" for a global attribute), a name, a data type, a length, and a value. The current version treats all attributes as vectors; scalar values are treated as single-element vectors. <P>
 Conventional attribute names should be used where applicable. New names should be as meaningful as possible. <P>
 The external type of an attribute is specified when it is created. The types permitted for attributes are the same as the netCDF external data types for variables. Attributes with the same name for different variables should sometimes be of different types. For example, the attribute <CODE>valid_max</CODE> specifying the maximum valid data value for a variable of type <CODE>int</CODE> should be of type <CODE>int</CODE>, whereas the attribute <CODE>valid_max</CODE> for a variable of type <CODE>double</CODE> should instead be of type <CODE>double</CODE>. <P>
 Attributes are more dynamic than variables or dimensions; they can be deleted and have their type, length, and values changed after they are created, whereas the netCDF interface provides no way to delete a variable or to change its type or shape. <P>
 The <A NAME=MARKER-2-1719></A><A NAME=MARKER-2-1720></A><A NAME=MARKER-2-1721></A>CDL notation for defining an attribute is<P>
<PRE>
    <I>variable_name:attribute_name</I> = <I>list_of_values</I>;
</PRE>
 for a variable attribute, or<P>
<PRE>
   <I> :attribute_name</I> = <I>list_of_values</I>;
</PRE>
 for a global attribute. The type and length of each attribute are not explicitly declared in CDL; they are derived from the values assigned to the attribute. All values of an attribute must be of the same type. The notation used for constant values of the various netCDF types is discussed later (<A HREF=guidef-15.html#MARKER-9-3152>see Section 10.3 "CDL Notation for Data Constants,"  page 102</A>). <P>
 In the netCDF example (<A HREF=#MARKER-9-1586>see Section 2.1.2 "network Common Data Form Language (CDL),"  page 9</A>), <CODE>units </CODE>is an attribute for the variable <CODE>lat</CODE> that has a 13-character array value '<CODE>degrees_north</CODE>'. And <CODE>valid_range</CODE> is an attribute for the variable <CODE>rh</CODE> that has length 2 and values '<CODE>0.0</CODE>' and '<CODE>1.0</CODE>'. <P>
 One <A NAME=MARKER-2-1726></A><A NAME=MARKER-2-1727></A><A NAME=MARKER-2-1728></A>global attribute---<CODE>source</CODE>---is defined for the example netCDF dataset. This is a character array intended for documenting the data. Actual netCDF datasets might have more global attributes to document the origin, history, conventions, and other characteristics of the dataset as a whole. <P>
 Most generic applications that process netCDF datasets assume standard attribute conventions and it is strongly recommended that these be followed unless there are good reasons for not doing so. <A HREF=guidef-13.html#MARKER-9-2936>See Section 8.1 "Attribute Conventions,"  page 81</A>, for information about <CODE>units</CODE>, <CODE>long_name</CODE>, <CODE>valid_min</CODE>, <CODE>valid_max</CODE>, <CODE>valid_range</CODE>, <CODE>scale_factor</CODE>, <CODE>add_offset</CODE>, <CODE>_FillValue</CODE>, and other conventional attributes. <P>
 Attributes may be added to a netCDF dataset long after it is first defined, so you don't have to anticipate all potentially useful attributes. However adding new attributes to an existing dataset can incur the same expense as copying the dataset. <A HREF=guidef-14.html#MARKER-9-3034>See Chapter 9 "NetCDF File Structure and Performance,"  page 95</A>, for a more extensive discussion. <P>
<A NAME=HEADING7-97></A>
<H2>2.5  Differences between Attributes and Variables</H2>
<HR>
 In<A NAME=MARKER-2-1733></A><A NAME=MARKER-2-1735></A><A NAME=MARKER-2-1736></A><A NAME=MARKER-2-1737></A><A NAME=MARKER-2-1738></A> contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data. The total amount of ancillary data associated with a netCDF object, and stored in its attributes, is typically small enough to be memory-resident. However variables are often too large to entirely fit in memory and must be split into sections for processing. <P>
 Another difference between attributes and variables is that variables may be multidimensional. Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension). <P>
 Variables are created with a name, type, and shape before they are assigned data values, so a variable may exist with no values. The value of an attribute must be specified when it is created, so no attribute ever exists without a value. <P>
 A <A NAME=MARKER-2-1739></A>variable may have attributes, but an attribute cannot have attributes. Attributes assigned to variables may have the same units as the variable (for example, <CODE>valid_range</CODE>) or have no units (for example, <CODE>scale_factor</CODE>). If you want to store data that requires units different from those of the associated variable, it is better to use a variable than an attribute. More generally, if data require ancillary data to describe them, are multidimensional, require any of the defined netCDF dimensions to index their values, or require a significant amount of storage, that data should be represented using variables rather than attributes. <P>
<!-- TOC --><DL>
<DT><A HREF="guidef-7.html#HEADING7-1"><B>2.1 </B> - The NetCDF Data Model</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-4"><B>2.1.1 </B> - Naming Conventions</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-6"><B>2.1.2 </B> - network Common Data Form Language (CDL)</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-49"><B>2.2 </B> - Dimensions</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-57"><B>2.3 </B> - Variables</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-67"><B>2.3.1 </B> - Coordinate Variables</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-81"><B>2.4 </B> - Attributes</A>
<DD>
<DT><A HREF="guidef-7.html#HEADING7-97"><B>2.5 </B> - Differences between Attributes and Variables</A>
<DD>
</DL>

<HR>
<ADDRESS>NetCDF User's Guide for Fortran - 4 JUN 1997</ADDRESS>
<A HREF="guidef-8.html">[Next] </A><A HREF="guidef-6.html">[Previous] </A><A HREF="guidef-1.html">[Top] </A><A HREF="guidef-3.html">[Contents] </A><A HREF="guidef-21.html">[Index] </A><A HREF="http://www.unidata.ucar.edu/packages/netcdf/">[netCDF Home Page]</A><A HREF="http://www.unidata.ucar.edu/">[Unidata Home Page]</A><P>
</BODY>