File: translation.html

package info (click to toggle)
netcdf 1%3A4.7.4-1
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 104,952 kB
  • sloc: ansic: 228,683; sh: 10,980; yacc: 2,561; makefile: 1,319; lex: 1,173; xml: 173; awk: 2
file content (505 lines) | stat: -rw-r--r-- 14,591 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
<html>
<body>
<h1>DAP to Netcdf Translation Rules</h1>
Two translations are currently available.
<ul>
<li><a href="#ncdap3">DAP 2 Protocol to netCDF-3</a>
<li><a href="#ncdap4">DAP 2 Protocol to netCDF-4</a>
</ul>

<h2><a name="ncdap3">netCDF-3 Translation Rules</a></h3>
The current set of translation rules to convert an OPeNDAP
DAP protocol version 2 DDS to netCDF-3 is designed to mimic
as closely as possible those currently used by the libnc-dap
system.  Please note that the translation is still subject
to change to respond to unforeseen problems and user
suggestions.
<p>
For illustrative purposes, the following example will be used.
<pre>
Dataset {
  Int32 f1;
  Structure {
    Int32 f11;        
    Structure {
      Int32 f1[3];
      Int32 f2;
    } FS2[2]; 
  } S1; 
  Structure {
    Grid {
      Array:
        Float32 temp[lat=2][lon=2];
      Maps:
        Int32 lat[lat=2];
        Int32 lon[lon=2];
    } G1;
  } S2;
  Grid {
      Array:
        Float32 G2[lat=2][lon=2];
      Maps:
        Int32 lat[2];
        Int32 lon[2];
  } G2;
  Int32 lat[lat=2];
  Int32 lon[lon=2];
} D1;
</pre>

<h3>Variable Definition</h3>
The set of variables is defined by the fields with
primitive base types as they occur in
Sequences, Grids, and Structures.
The field names are modified to be fully qualified initially.
For the above, the set of variables, the variables are as follows.
<ol>
<li>f1
<li>S1.f11
<li>S1.FS2.f1
<li>S1.FS2.f2
<li>S2.G1.temp
<li>S2.G1.lat
<li>S2.G1.lon
<li>S2.G2.G2
<li>S2.G2.lat
<li>S2.G2.lon
<li>lat
<li>lon
</ol>

<h3>Variable Dimension Translation</h3>
A variable's rank is determined from three sources.
<ol>
<li>
The variable has the dimensions associated with the field
it represents (e.g. S1.FS2.f1[3] in the above example).
<li>
The variable inherits the dimensions associated with any containing
structure that has a rank greater than zero.
These dimensions precede those of case 1.
Thus, we have in our example, f1[2][3], where the first dimension
comes from the containing Structure FS2[2].
<li>
The variable's set of dimensions are altered
if any of its containers is a DAP DDS Sequence.
This is discussed more fully below.
</ol>

<h3>Dimension translation</h3>
For dimensions, the rules are as follows.
<ol>
<li> Fields in dimensioned structures inherit the dimension
of the structure; thus the above list would have the following
dimensioned variables.
<ul>
<li>S1.FS2.f1 -&gt; S1.FS2.f1[2][3]
<li>S1.FS2.f2 -&gt; S1.FS2.f2[2]
<li>S2.G1.temp -&gt; S2.G1.temp[lat=2][lon=2]
<li>S2.G1.lat -&gt; S2.G1.lat[lat=2]
<li>S2.G1.lon -&gt; S2.G1.lon[lon=2]
<li>S2.G2.G2 -&gt; S2.G2.lon[lat=2][lon=2]
<li>S2.G2.lat -&gt; S2.G2.lat[lat=2]
<li>S2.G2.lon -&gt; S2.G2.lon[lon=2]
<li>lat -&gt; lat[lat=2]
<li>lon -&gt; lon[lon=2]
</ul>
<p>
<li>
Collect all of the dimension specifications from the DDS, both
named and anonymous (unnamed)
For each unique anonymous dimension with value NN
create a netCDF dimension of the form "&lt;array&gt;_&lt;i&gt;=NN",
where <array> is the fully qualified name of the variable and i is the
i'th (inherited) dimension of the array where the anonymous dimension occurs.
For our example, this would create the following dimensions.
<ul>
<li>S1.FS2.f1_0 = 2 ;
<li>S1.FS2.f1_1 = 3 ;
<li>S1.FS2.f2_0 = 2 ;
<li>S2.G2.lat_0 = 2 ;
<li>S2.G2.lon_0 = 2 ;
</ul>
<p>
<li>
If, however, the anonymous dimension is the single dimension
of a MAP vector in a Grid then the dimension is given the
same name as the map vector This leads to the following.
<ul>
<li>S2.G2.lat_0 -&gt; S2.G2.lat
<li>S2.G2.lon_0 -&gt; S2.G2.lon
</ul>
<p>
<li>
For each unique named dimension "&lt;name&gt;=NN",
create a netCDF dimension of the form "&lt;name&gt;=NN",
where name has the qualifications removed.
If this leads to duplicates (i.e. same name and same value),
then the duplicates are ignored.
This produces the following.
<ul>
<li>S2.G2.lat -&gt; lat
<li>S2.G2.lon -&gt; lon
</ul>
Note that this produces duplicates.
<p>
<li>
At this point the only dimensions left to process should be named
dimensions with the same name as some dimension from step number 3,
but with a different value.  For those dimensions create a dimension
of the form "&lt;name&gt;M=NN" where M is a counter starting at 1.
The example has no instances of this.
<p>
<li>
Finally and if needed, define a single UNLIMITED dimension named "unlimited"
with value zero.
</ol>
This leads to the following set of dimensions.
<pre>
dimensions:
  unlimited = UNLIMITED;
  lat = 2 ;
  lon = 2 ;
  S1.FS2.f1_0 = 2 ;
  S1.FS2.f1_1 = 3 ;
  S1.FS2.f2_0 = 2 ;
</pre>

<h3>Variable Name Translation</h3>
The steps for variable name translation are as follows.
<p>
<ol>
<li>Take the set of variables captured above.
Thus for the above DDS, the following fields would be collected.
<ul>
<li>f1
<li>S1.f11
<li>S1.FS2.f1
<li>S1.FS2.f2
<li>S2.G1.temp
<li>S2.G1.lat
<li>S2.G1.lon
<li>S2.G2.G2
<li>S2.G2.lat
<li>S2.G2.lon
<li>lat
<li>lon
</ul>
<p>
<li>All grid array variables are renamed to be the same as the containing
grid and the grid prefix is removed.
In the above DDS, this results in the following changes.
<ol>
<li> G1.temp -> G1
<li> G2.G2 -> G2
</ol>
Note that, for example, the G1.lon keeps that name.
Also note that libnc-dap just drops the grid map variables,
so this is one place where the translation differs from
libnc-dap, but in a compatible way.

</ol>
<p>
It is important to note that if this process could produce duplicate
variables (i.e. with the same name); in that case they are all assumed
to have the same content and the duplicates are ignored.
If it turns out that the duplicates have different content, then
the translation will not detect this. YOU HAVE BEEN WARNED.
<p>
The final netCDF-3 schema (minus attributes) is then as follows.
<pre>
netcdf t {
dimensions:
        unlimited = UNLIMITED
        lat = 2 ;
        lon = 2 ;
        S1.FS2.f1_0 = 2 ;
        S1.FS2.f1_1 = 3 ;
        S1.FS2.f2_0 = 2 ;
variables:
        int f1 ;
        int lat(lat) ;
        int lon(lon) ;
        int S1.f11 ;
	int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
        int S1.FS2.f2(S1_FS2_f2_0) ;
        float S2.G1(lat, lon) ;
	int S2.G1.lat(lat) ;
	int S2.G1.lon(lon) ;
        float G2(lat, lon) ;
        int G2.lat(lat) ;
        int G2.lon(lon) ;
}
</pre>
In actuality, the unlimited dimension is dropped because
it is unused.
<p>
There are differences with the original libnc-dap here
because libnc-dap technically was incorrect.  The original
would have said this, for example.
<pre>
int S1.FS2.f1(lat, lat) ;
</pre>
Note that this is incorrect because it dimensions
S1.FS2.f1(2,2) rather than S1.FS2.f1(2,3).

<h3>Translating DAP DDS Sequences</h3>
Any variable (as determined above) that is contained
directly or indirectly by a Sequence is subject to revision
of its rank using the following rules.
<ol>
<li> 
Let the variable be contained in Sequence Q1, where Q1 is the
innermost containing sequence. If Q1 is itself contained
(directly or indirectly) in a sequence,
or Q1 is contained (again directly or indirectly)
in a structure that has rank greater than 0,
then the variable will have an initial UNLIMITED
dimension.  However, all dimensions coming from "above" and including (in
the containment sense) the innermost Sequence, Q1, will be
removed and replaced by the single UNLIMITED dimension.  The
size associated with that UNLIMITED is zero, which means
that its contents are inaccessible through the netcdf-3 API.
Again, this differs from libnc-dap, which leaves out such variables.
Again, however, this difference is compatible.
<p>
<li> 
If the variable is contained in a single Sequence (i.e. not nested)
and all containing structures have rank 0, then the variable will
have an initial dimension whose size is the record count for that
Sequence. The name of the new dimension will be the name of the
Sequence.
</ol>
<p>
Consider this example.
<pre>
Dataset {
  Structure {
    Sequence {
      Int32 f1[3];
      Int32 f2;
    } SQ1;
  } S1[2]; 
  Sequence {
    Structure {
      Int32 x1[7];
    } S2[5];
  } Q2;
} D;
</pre>
The corresponding netcdf-3 translation is pretty much as follows
(the value for dimension Q2 may differ).
<pre>
dimensions:
    unlimited = UNLIMITED ; // (0 currently)
    S1.SQ1.f1_0 = 2 ;
    S1.SQ1.f1_1 = 3 ;
    S1.SQ1.f2_0 = 2 ;
    Q2.S2.x1_0 = 5 ;
    Q2.S2.x1_1 = 7 ;
    Q2 = 5 ;
variables:
    int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
    int S1.SQ1.f2(unlimited) ;
    int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;
</pre>
Note that for example S1.SQ1.f1_0
is not actually used because it has been folded
into the unlimited dimension.
<p>
Note that there is a performance cost
because the translation code has to walk the data to determine
how many records are associated with the sequence.
Since libnc-dap did something similar, it can be assumed that
the cost is not prohibitive.

<h2><a name="ncdap4">netCDF-4 Translation Rules</a></h2>
The DAP to netCDF-4 translation is enabled if the
"--enable-netcdf-4" option is specified at configure time.
This translation includes some elements of the libnc-dap
translation, but attempts to provide a simpler (but not,
unfortunately, simple) set of translation rules than is used
for the netCDF-3 translation.
Please note that the translation is still subject
to change to respond to unforeseen problems or to
suggested improvements.
<p>
This text will use this running example.
<pre>
Dataset {
  Int32 f1[fdim=10];
  Structure {
    Int32 f11;        
    Structure {
      Int32 f1[3];
      Int32 f2;
    } FS2[2]; 
  } S1; 
  Grid {
    Array:
      Float32 temp[lat=2][lon=2];
    Maps:
      Int32 lat[2];
      Int32 lon[2];
  } G1;
  Sequence {
    Float64 depth;
  } Q1;
} D
</pre>

<h3>Variable Definition</h3>
The rules for choosing variables is as follows.
<ol>
<li> Start with the names of the top-level fields of the DDS.
The term top-level means that the object is a direct subnode
of the Dataset object. In our example, this produces the set
[f1, S1, G1, Q1].
<p>
<li>
Replace all Grid objects with the fully qualified list of array
and map fields of the grid.
Our variable set then becomes [f1, S1, G1.temp, G1.lat, G1.lon, Q1].
Note that the libnc-dap practice of re-naming the array variable to be
that of the Grid is not used.
<p>
<li>
Attempt to remove the prefix Grid name from the top-level Grid array and map
variables. If that eventually conflicts with some other name,
then leave the conflicting Grids alone.
Our variable set then becomes [f1, S1, temp, lat, lon, Q1].
<li>
If the Grid array name is the same as the Grid name, then
remove the prefix Grid name (not shown).
</ol>

<h3>Dimension Definition</h3>
The rules for choosing and defining dimensions is as follows.
<ol>
<li>
Collect the set of dimensions (named and anonymous) directly
associated with the  variables as defined above.
This means that dimensions
within user-defined types are ignored.  From our example,
the dimension set is [fdim=10,lat=2,lon=2,2,2].  Note that the
unqualified names are used.
<p>
<li>
If an anonymous dimension is associated with a Grid Map variable,
then given the dimension, the name of the map.
Our dimension set now becomes 
[fdim=10,lat=2,lon=2,lat=2,lon=2].
<p>
<li>
All remaining anonymous dimensions are given the
name "&lt;var&gt;_NN", where "&lt;var&gt;" is the
unqualified name of the variable in which the anonymous
dimension appears and NN is the relative position of that
dimension in the dimensions associated with that array.
No instances of this rule occur in the running example.
<p>
<li>
Remove duplicate dimensions (those with same name and value).
Our dimension set now becomes 
[fdim=10,lat=2,lon=2].
<p>
<li>
The final case occurs when there are dimensions with the same
name but with different values. For this case,
the size of the dimension is appended to the dimension name.
</ol>

<h3>Type Definition</h3>
The rules for choosing user-defined types are as follows.
<ol>
<li>For every Structure, Sequence, and non-top-level Grid,
netcdf-4 compound type is created whose fields are the fields
of the Structure, Sequence, or Grid. The name of the type is the
same as the Structure or Grid name suffixed with "_t".
However, the compound types derived from Sequences are instead 
suffixed with "_record_t".
<p>
The types of the fields are the types of the corresponding field
of the Structure, Sequence, or Grid. Note that this type
might be itself a user-defined type.
<p>
From the example, we get the following compound types.
<pre>
compound FS2_t {
    int f1[3];
    int f2;
};
compound S1_t {
    int f11;
    FS2_t FS2[2];  
};
compound Q1_record_t {
    double depth;
};
</pre>
<p>
<li>
For all sequences of name X,
also create this type.
<pre>
    X_record_t (*) X_t
</pre>
In our example, this produces the following type.
<pre>
    Q1_record_t (*) Q1_t
</pre>
<p>
<li>
If a Sequence, Q has a single field F,
whose type is a primitive type, T,
(e.g., int, float, string), then
do not apply the previous rule, but instead replace the whole
sequence with the the following field.
<pre>
    T (*) Q.f
</pre>
<p>
<li>
Attempt to maximally shorten the type names as long as there is no conflict.
</ol>

<h2>Choosing a Translation</h2>
The decision about whether to translate to netCDF-3
(libnc-dap) or netCDF-4 is determined by applying the
following rules in order.
<ol>
<li>
If the NC_CLASSIC_MODEL flag is set on nc_open(), then netcdf-3
(i.e. libnc-dap) translation is used.
<li>
If the NC_NETCDF4 flag is set on nc_open(), then netCDF-4
translation is used.
<li>
If the URL is prefixed with the string
"[mode=netcdf3]" or "[mode=libnc-dap]",
then the libnc-dap translation is used.
<li>
If the URL is prefixed with the string "[mode=netcdf4]",
then the netCDF-4 translation described below is used.
<li>
If none of the above is specified, then the default
is "[mode=libnc-dap]".
</ol>

<h2>Defined Client Parameters</h2>
Currently, a limited set of client parameters is recognized.
Parameters not listed here are ignored, but no error is signalled.
<table borderwidth=1 cellpadding=2>
<tr><th>Parameter Name<th>Legal Values<th>Semantics
<tr><td>[mode=...]<td>libnc&#8209;dap|netcdf3|netcdf4
<td>Specify the translation to be applied to the DAP data source on the
client side.
<tr><td>[show=...]<td>das|dds|url
<td>This causes information to appear as specific global attributes.  The
tags may be combined using comma with no spaces
(e.g. "show=dds,url").  The currently recognized tags are "dds" to
display the underlying DDS, "das" similarly, and "url" to display
the url used to retrieve the data.
</table>
</body>
</html>