1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217
|
<!doctype HTML public "-//W3C//DTD HTML 4.0 Frameset//EN">
<html>
<head>
<title>Chapter 1: The HDF5 Data Model and File Structure</title>
<!--(Meta)==========================================================-->
<!--(Links)=========================================================-->
<!--( Begin styles definition )=====================================-->
<link href="ed_styles/NewUGelect.css" rel="stylesheet" type="text/css">
<!--( End styles definition )=======================================-->
</head>
<body>
<!-- #BeginLibraryItem "/ed_libs/styles_UG.lbi" -->
<!--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* Copyright by the Board of Trustees of the University of Illinois. *
* All rights reserved. *
* *
* This file is part of HDF5. The full HDF5 copyright notice, including *
* terms governing use, modification, and redistribution, is contained in *
* the files COPYING and Copyright.html. COPYING can be found at the root *
* of the source code distribution tree; Copyright.html can be found at the *
* root level of an installed copy of the electronic HDF5 document set and *
* is linked from the top-level documents page. It can also be found at *
* http://hdfgroup.org/HDF5/doc/Copyright.html. If you do not have *
* access to either file, you may request a copy from help@hdfgroup.org. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-->
<!-- #EndLibraryItem --><!--( TOC )=========================================================-->
<SCRIPT language="JavaScript">
<!--
document.writeln ('\
<table x-use-null-cells\
align="right"\
width=240\
cellspacing="0"\
class="tocTable">\
<tr valign="top"> \
<td class="tocTableHeaderCell" colspan="2"> \
<span class="TableHead">Chapter Contents</span></td>\
</tr>\
-->
<!-- Table Version 3 -->\
<!--
<tr valign="top"> \
<td class="tocTableContentCell2"> \
<a href="#Intro">1.</a></td>\
<td class="tocTableContentCell3">\
<a href="#Intro">Introduction</a></td> \
</tr>\
<tr valign="top"> \
<td class="tocTableContentCell2"> \
<a href="#AbstractDMod">2.</a></td>\
<td class="tocTableContentCell3">\
<a href="#AbstractDMod">The Abstract Data Model</a></td>\
</tr>\
<tr valign="top"> \
<td class="tocTableContentCell2"> \
<a href="#SModel">3.</a></td>\
<td class="tocTableContentCell3">\
<a href="#SModel">The HDF5 Storage Model</a></td> \
</tr>\
\
<tr valign="top"> \
<td class="tocTableContentCell"> \
-->
<!-- editingComment -- "tocTableContentCell" and "tocTableContentCell4" \
-->\
<!-- are the table-closing cell class.\
<td class="tocTableContentCell2"> \
-->\
<!--
<a href="#Structure">4.</a></td>\
<td class="tocTableContentCell4">\
<a href="#Structure">The Structure of an HDF5 File</a>\
-->
<!-- editingComment -- This section not currently complete or validated.\
</tr><tr valign="top"> \
<td class="tocTableContentCell"> \
<a href="#Appendix">10</a></td>\
<td class="tocTableContentCell4"><a href="#Appendix">Appendix</a></td>\
-->\
<!--
</td></tr>\
</table>\
')
-->
</SCRIPT>
<!--(End TOC)=======================================================-->
<div align="center">
<a name="TOP">
<h2>Chapter 1<br /><font size="6">The HDF5 Data Model and File Structure</font></h2>
</a>
</div>
<!-- editingComment
<span class="editingComment">[ [ [
] ] ]</span>
-->
<!-- HEADER LEFT " " -->
<!-- HEADER RIGHT " " -->
<!-- HEADER LEFT "HDF5 User's Guide" -->
<!-- HEADER RIGHT "HDF5 Data Model" -->
<a name="Intro">
<h3>1.1. Introduction</h3>
</a>
<p>The Hierarchical Data Format (HDF) implements a model for managing
and storing data. The model includes an abstract data model and an
abstract storage model (the data format), and libraries to implement the
abstract model and to map the storage model to different storage
mechanisms. The HDF5 library provides a programming interface to a
concrete implementation of the abstract models. The library also
implements a model of data transfer, i.e., efficient movement of data
from one stored representation to another stored representation. The
figure below illustrates the relationships between the models and
implementations. This chapter explains these models in detail.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig1.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 1. HDF5 models and implementations</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>The <em>Abstract Data Model</em> is a conceptual model of data, data types,
and data organization. The abstract data model is independent of storage
medium or programming environment. The <em>Storage Model</em> is a
standard representation for the objects of the abstract data model.
The <a href="../H5.format.html" target="H5DocWin"><cite>HDF5 File Format
Specification</cite></a> defines the storage model.</p>
<p>The <em>Programming Model</em> is a model of the computing environment
and includes platforms from small single systems to large multiprocessors
and clusters. The programming model manipulates (instantiates, populates,
and retrieves) objects from the abstract data model.</p>
<p>The <em>Library</em> is the concrete implementation of the programming
model. The Library exports the HDF5 APIs as its interface.
In addition to implementing the objects of the abstract data model,
the Library manages data transfers from one stored form to another.
Data transfer examples include reading from disk to memory and writing
from memory to disk. </p>
<p><em>Stored Data</em> is the concrete implementation of the storage
model. The storage model is mapped to several storage
mechanisms including single disk files, multiple files (family of files),
and memory representations.</p>
<p>The HDF5 Library is a C module that implements the programming model
and abstract data model. The HDF5 Library calls the operating system
or other storage management software (e.g., the MPI/IO Library) to store and
retrieve persistent data. The HDF5 Library may also link to other software
such as filters for compression. The HDF5 Library is linked to an application
program which may be written in C, C++, Fortran, or Java. The application
program implements problem specific algorithms and data structures and calls
the HDF5 Library to store and retrieve data. The figure below shows the
dependencies of these modules.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig2.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 2. The library, the application
program, and other modules</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<br />
<p>It is important to realize that each of the software components manages
data using models and data structures that are appropriate to the
component. When data is passed between layers (during storage or
retrieval), it is transformed from one representation to another. The
figure below <!-- Figure 3 --> suggests some of the kinds of data
structures used in the different layers.</p>
<p>The <em>Application Program</em> uses data structures that represent
the problem and algorithms including variables, tables, arrays, and
meshes among other data structures. Depending on its design and function,
an application may have quite a few different kinds of data structures
and different numbers and sizes of objects.</p>
<p>The HDF5 Library implements the objects of the HDF5 abstract
data model. Some of these objects include groups, datasets, and
attributes. The application program maps the application data
structures to a hierarchy of HDF5 objects. Each application will create a
mapping best suited to its purposes. </p>
<!-- editingComment
<span class="editingComment">For suggestions and examples, see ???.
[ [ [ Do we have such a document? ] ] ]</em></span>
-->
<p>The objects of the HDF5 abstract data model are mapped to the objects
of the HDF5 storage model, and stored in a storage medium.
<!-- editingComment
(Section ?? below)
-->
The stored objects include header blocks, free lists, data blocks,
B-trees, and other objects. Each group or dataset is stored as one or
more header and data blocks. See the
<a href="../H5.format.html" target="H5DocWin"><cite>HDF5 File Format
Specification</cite></a> for more information on how these objects are
organized.
The HDF5 Library can also use other
libraries and modules such as compression.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig3_a.JPG"><br />
<img src="Images/Dmodel_fig3_b.JPG"><br />
<img src="Images/Dmodel_fig3_c.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 3. Data structures in different layers</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>The important point to note is that there is not necessarily any simple
correspondence between the objects of the application program,
the abstract data model, and those of the <em>Format Specification</em>.
The organization of the data of application program, and how it is mapped
to the HDF5 abstract data model is up to the application developer.
The application program only needs to deal with the library and the
abstract data model. Most applications need not consider any details of
the <a href="../H5.format.html" target="H5DocWin"><cite>HDF5 File Format
Specification</cite></a>
or the details of how objects of abstract data model
are translated to and from storage.</p>
<SCRIPT language="JavaScript">
<!--
document.writeln ("
<div align="right">
<a href="#TOP"><font size="-1">(Top)</font></a>
</div>
</a>
");
-->
</SCRIPT>
<a name="AbstractDMod">
<h3 class="pagebefore">1.2. The Abstract Data Model</h3>
</a>
<!-- editingComment
<span class="editingComment">[ [ [ Note: In this section some of the figures
are included twice. These are alternative versions of the same diagram,
included for comparison and selection. ] ] ]</span>
-->
<p>The abstract data model (ADM) defines concepts for defining and
describing complex data stored in files. The ADM is a very general model
which is designed to conceptually cover many specific models. Many different
kinds of data can be mapped to objects of the ADM, and therefore stored and
retrieved using HDF5. The ADM is not, however, a model of any particular problem or
application domain. Users need to map their data to the concepts of the ADM.</p>
<p>The key concepts include:</p>
<ul>
<li><em>File</em> - a contiguous string of bytes in a computer
store (memory, disk, etc.), and
the bytes represent zero or more objects of the model</li>
<li><em>Group</em> - a collection of objects (including groups)</li>
<li><em>Dataset</em> - a multidimensional array of data elements
with attributes and other metadata </li>
<li><em>Dataspace</em> - a description of the dimensions of a
multidimensional array</li>
<li><em>Datatype</em> - a description of a specific class of data
element including its storage layout as a pattern of bits</li>
<li><em>Attribute</em> - a named data value associated with a group,
dataset, or named datatype</li>
<li><em>Property List</em> - a collection of parameters (some permanent
and some transient) controlling options in the library </li>
<li><em>Link</em> - the way objects are connected </li>
</ul>
<p>These key concepts are described in more detail below.</p>
<h4>1.2.1. File</h4>
<p>Abstractly, an HDF5 file is a container for an organized
collection of objects.
The objects are groups, datasets, and other objects as defined below.
The objects are organized as a rooted, directed graph. Every HDF5 file has at
least one object, the root group. See the figure below. All objects are
members of the
root group or descendents of the root group.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
</tr>
<tr>
<td><br />
<table align="center" border="1">
<tr>
<td valign="top" align="center"><code>File</code></td>
</tr>
<tr>
<td valign="top" align="left">
<code>
superblock_vers:int<br />
global_freelist_vers:int<br />
symtable_vers:int<br />
sharedobjectheader_vers:int<br />
userblock:size_t<br />
sizeof_addr:size_t<br />
sizeof_size:size_t<br />
symtable_tree_rank:int<br />
symtable_node_size:int<br />
btree_istore_size:int
</code>
</td>
</tr>
<tr>
<td align="left"> </td>
</tr>
</table>
</td>
</tr>
<tr>
<td valign="top" align="center"><img src="Images/Dmodel_fig4_a.JPG"></td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 4. The HDF5 file</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>HDF5 objects have a unique identity <em>within a single HDF5 file</em> and
can be accessed only by its names within the hierarchy of the file. HDF5
objects in different files do not necessarily have unique identities, and
it is not possible to access a permanent HDF5 object except through a file.
See the section <a href="#Structure">“The Structure of an HDF5
File”</a> below for an explanation of the structure of the HDF5 file.
<p>When the file is created, the <em>file creation properties</em> specify
settings for the file. The file creation properties include version information
and parameters of global data structures. When the file is opened,
the <em>file access properties</em> specify settings for the current access to
the file. File access properties include parameters for storage drivers
<!-- editingComment
<span class="editingComment">(see section ?? below)</span>
-->
and parameters for caching and garbage collection.
The file creation properties are set permanently for the life
of the file, and
the file access properties can be changed by closing and reopening
the file.
<!-- editingComment
<span class="editingComment">
See PPP for more information about Property Lists and properties.
</span>
-->
<p>An HDF5 file can be “mounted” as part of another HDF5 file.
This is analogous to Unix file system mounts. The root of the mounted file
is attached to a group in the mounting file, and all the contents can be
accessed as if the mounted file were part of the mounting file.
<!-- editingComment
<span class="editingComment">
See XXX for an explanation of mounted files.
</span>
-->
<h4>1.2.2. Group</h4>
<p>An HDF5 group is analogous to a file system directory. Abstractly,
a group contains zero or more objects, and every object must be a member of at
least one group. The root group is a special case; it may not be a member
of any group.</p>
<p>Group membership is actually implemented via link objects. See the
figure below. A link object is owned by a group and points to a
<em>named object</em>. Each link has a <em>name</em>, and each link
points to exactly one object. Each named object has at least one and
possibly many links to it.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig5.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 5. Group membership via link objects</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<!-- NEW PAGE -->
<p>There are three classes of named objects: group, dataset, and named
datatype. See the figure below. Each of these objects is the member of
at least one group, and this means there is at least one link to it.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig6.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 6. Classes of named objects</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<h4>1.2.3. Dataset</h4>
<p>An HDF5 dataset is a multidimensional (rectangular) array
of data elements. See the figure below. The shape of the array
(number of dimensions, size of
each dimension) is described by the dataspace object (described
in the next section below).</p>
<p>A data element is a single unit of data which may be a number, a character,
an array of numbers or characters, or a record of heterogeneous data elements.
A data element is a set of bits. The layout of the bits is described by the
datatype (see below).</p>
<p>The dataspace and datatype are set when the
dataset is created, and they cannot be changed for the life
of the dataset. The <em>dataset creation properties</em> are set when the
dataset is created. The dataset creation properties include the fill
value and storage properties such as chunking and compression. These
properties cannot be changed after the dataset is created.</p>
<p>The dataset object manages the storage and access to the data. While
the data
is conceptually a contiguous rectangular array, it is physically stored
and
transferred in different ways depending on the storage properties and
the storage
mechanism used. The actual storage may be a set of compressed chunks,
and the access may be through different storage mechanisms and caches.
The dataset
maps between the conceptual array of elements and the actual stored data.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig7_b.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 7. The dataset</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<h4>1.2.4. Dataspace</h4>
<p>The HDF5 dataspace describes the layout of the elements of a
multidimensional
array. Conceptually, the array is a hyper-rectangle with one to 32 dimensions.
HDF5 dataspaces can be extendable. Therefore, each dimension has a current
size and a
maximum size, and the maximum may be unlimited. The dataspace describes
this hyper-rectangle: it is a list of dimensions with the current and maximum
(or unlimited) sizes. See the figure below.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<table border="1">
<tr><td align="center">
<code>Dataspace</code>
</td></tr>
<tr><td align="left">
<code>
rank:int<br />
current_size:hsize_t[ rank ] <br />
maximum_size:hsize_t[ rank ]
</code>
</td></tr>
<tr><td align="left"> </td></tr>
</table>
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 8. The dataspace</b><hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>Dataspace objects are also used to describe hyperslab selections
from a dataset. Any subset of the elements of a dastaset can be selected
for read or write by specifying a set of hyperslabs. A non-rectangular
region can be selected by the union of several (rectangular) dataspaces.</p>
<!-- editingComment
<span class="editingComment">
See SSS for more
information about data selection and hyperslabs.
</span>
-->
<h4>1.2.5. Datatype</h4>
<p>The HDF5 datatype object describes the layout of a single data element.
A data element is a single element of the array; it may be a single number,
a character, an array of numbers or carriers, or other data. The datatype
object describes the storage layout of this data. </p>
<p>Data types are categorized into 11 classes of datatype. Each class is
interpreted according to a set of rules and has a specific set of properties
to describe its storage. For instance, floating point numbers have exponent
position and sizes which are interpreted according to appropriate standards
for number representation. Thus, the datatype class tells what the element
means, and the datatype describes how it is stored.</p>
<p>The figure below <!-- formerly Figure 9 --> shows the classification
of datatypes. Atomic datatypes are
indivisible. each may be a single object; a number, a string, or some other
objects. Composite datatypes are composed of multiple elements of atomic
datatypes. In addition to the standard types, users can define additional
datatypes such as a 24-bit integer or a 16-bit float.</p>
<p>A dataset or attribute has a single datatype object associated with
it. See Figure 7 above. The datatype object may be used in the definition
of several objects, but by default, a copy of the datatype object will
be private to the dataset. </p>
<p>Optionally, a datatype object can be stored in the HDF5 file. The
datatype is linked into a group, and therefore given a name. A
<em>named datatype</em> can be opened and used in any way that a datatype
object can be used.</p>
<p>The details of datatypes, their properties, and how they are used are
explained in the
“<a href="11_Datatypes.html">HDF5 Datatypes</a>” chapter.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig9.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 9. Datatype classifications</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<h4>1.2.6. Attribute</h4>
<p>Any HDF5 named data object (group, dataset, or named datatype) may
have zero or more user defined attributes. Attributes are used to document
the object. The attributes of an object are stored with the object.</p>
<p>An HDF5 attribute has a name and data. The data portion is similar
in structure to a dataset: a dataspace defines the layout of an array of
data elements, and a datatype defines the storage layout and interpretation
of the elements See the figure below <!-- formerly Figure 10-->.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig10.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 10. Attribute data elements</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>In fact, an attribute is very similar to a dataset with
the following limitations:</p>
<ul>
<li>An attribute can only be accessed via the object</li>
<li>Attribute names are significant only within the object</li>
<li>An attribute should be a small object
<li>The data of an attribute must be read or written in a single access
(partial reading or writing is not allowed)</li>
<li>Attributes do not have attributes</li>
</ul>
<p>Note that the value of an attribute can be an <em>object reference</em>.
A shared attribute or an attribute that is a large array can be implemented
as a reference to a dataset.</p>
<!-- NEW PAGE -->
<p>The name, dataspace, and datatype of an attribute are specified when it
is created and cannot be changed over the life of the attribute. An
attribute can be opened by name, by index, or by iterating through all
the attributes of the object.</p>
<h4>1.2.7. Property List</h4>
<p>HDF5 has a generic property list object. Each list is a collection of
<em>name-value</em> pairs. Each class of property list has a specific set
of properties.
Each property has an implicit name, a datatype, and a value. See the figure
below. <!-- formerly Figure 11 -->
A property list object is created and used in ways similar to the other
objects of the HDF5 library.</p>
<p>Property Lists are attached to the object in the library, they can be
used by any part of the library. Some properties are permanent (e.g., the
chunking strategy for a dataset), others are transient (e.g., buffer sizes
for data transfer). A common use of a Property List is to pass parameters
from the calling program to a VFL driver or a module of the pipeline.</p>
<p>Property lists are conceptually similar to attributes. Property lists are
information relevant to the behavior of the library while attributes are
relevant to the user’s data and application.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<table width="95%" cellspacing="0" align="center">
<tr>
<td align="center"><br />
<table border="1">
<tr><td align="center">
<code>Property List</code>
</td></tr>
<tr><td align="left" valign="middle">
<code>
class:H5P_class_t
</code>
</td></tr>
<tr><td align="left" valign="middle">
<code>
create(class)<br />
get_class()
</code>
</td></tr>
</table>
</td>
</tr>
<tr>
<td valign="top" align="center">
<img src="Images/Dmodel_fig11_a.jpg"></td>
</tr>
<tr>
<td align="center">
<table border="1">
<tr><td align="center">
<code>Property</code>
</td></tr>
<tr><td align="left" valign="middle">
<code>
name:string<br />
value:H5TDatatype
</code>
</td></tr>
<tr><td align="left" valign="middle">
<code>
<br />
</code>
</td></tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 11. The property list</b>
<hr color="green" size="3"/></span>
</td>
</tr>
</table>
<br />
<!-- NEW PAGE -->
<p>Property lists are used to control optional behavior for file creation,
file access, dataset creation, dataset transfer (read, write), and file
mounting. Some property list classes are shown in the table below.
<!-- Table 1--> Details of the different property lists are explained in
the relevant sections of this document.</p>
<table width="600" cellspacing="0" align="center" cellpadding="0">
<tr valign="bottom">
<td colspan="3" align="left" valign="bottom">
<b>Table 1. Property list classes and their usage
</b></td>
</tr>
<tr><td colspan="3"><hr color="green" size="3" /></td></tr>
<tr valign="top">
<td width="34%">
<b>Property List Class</b></td>
<td width="33%">
<b>Used</b></td>
<td width="33%">
<b>Examples</b></td>
<tr><td colspan="3"><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td><code>H5P_FILE_CREATE</code></td>
<td>Properties for file creation.</td>
<td>Set size of user block.</td>
</tr>
<tr><td colspan="3"><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td><code>H5P_FILE_ACCESS</code></td>
<td>
Properties for file access.</td>
<td>
Set parameters for VFL driver. An example is MPI I/O.</td>
</tr>
<tr><td colspan="3"><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td><code>H5P_DATASET_CREATE</code></td>
<td>Properties for dataset creation.</td>
<td>Set chunking, compression, or fill
value.</td>
</tr>
<tr><td colspan="3"><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td><code>H5P_DATASET_XFER</code></td>
<td>Properties for raw data transfer
(read and write).</td>
<td>Tune buffer sizes or memory management.</td>
</tr>
<tr><td colspan="3"><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td><code>H5P_FILE_MOUNT</code></td>
<td>Properties for file mounting.</td>
<td> </td>
</tr>
<tr><td colspan="3"><hr color="green" size="3" /></td></tr>
</table>
<br />
<SCRIPT language="JavaScript">
<!--
document.writeln ("
<a name="SModel">
<div align="right">
<a href="#TOP"><font size="-1">(Top)</font></a>
</div>
</a>
");
-->
</SCRIPT>
<br />
<h4>1.2.8. Link</h4>
<p>This section is under construction.</p>
<br />
<!-- NEW PAGE -->
<a name="SModel">
<h3 class="pagebefore">1.3. The HDF5 Storage Model</h3>
</a>
<h4>1.3.1. The Abstract Storage Model: the HDF5 Format Specification</h4>
<p>The <a name="SupScript1" href="../H5.format.html">
<em>HDF5 File Format Specification</em></a>
<!-- editingComment
<span class="editingComment">[ [ [
[cite it]
] ] ]</span>
-->
defines how HDF5 objects
and data are mapped to a <em>linear address space</em>. The address space is
assumed to be a contiguous array of bytes stored on some random access
medium.<a href="#FootNote"><sup><font size="-1">1</font></sup></a>
The format defines the standard for how the objects of the
abstract data model are mapped to linear addresses. The stored
representation is self-describing in the sense that the format defines all
the information necessary to read and reconstruct the original
objects of the abstract data model.</p>
<p>The <em>HDF5 File Format Specification</em> is organized in three parts:
<ol>
<li><strong>Level 0</strong>: File signature and super block</li>
<li><strong>Level 1</strong>: File infrastructure</li>
<ol type="a">
<li><strong>Level 1A</strong>: B-link trees and B-tree nodes</li>
<li><strong>Level 1B</strong>: Group</li>
<li><strong>Level 1C</strong>: Group entry</li>
<li><strong>Level 1D</strong>: Local heaps</li>
<li><strong>Level 1E</strong>: Global heap</li>
<li><strong>Level 1F</strong>: Free-space index</li>
</ol>
<li><strong>Level 2</strong>: Data object</li>
<ol type="a">
<li><strong>Level 2A</strong>: Data object headers</li>
<li><strong>Level 2B</strong>: Shared data object headers</li>
<li><strong>Level 2C</strong>: Data object data storage</li>
</ol>
</ol>
<p>The <strong>Level 0</strong> specification defines the header block for
the file. Header block elements include a signature, version information,
key parameters of the file
layout (such as which VFL file drivers are needed), and pointers to the rest of
the file. <strong>Level 1</strong> defines the data structures used throughout
the file: the B-trees, heaps, and groups. <strong>Level 2</strong> defines the
data structure for storing the data objects and data. In all cases, the data
structures are completely specified so that every bit in the file can be
faithfully interpreted.</p>
<p>It is important to realize that the structures defined in the HDF5 file
format are not the same as the abstract data model: the object headers, heaps,
and B-trees of the file specification are not represented in the abstract
data model. The format defines a number of objects for managing the storage
including header blocks, B-trees, and heaps. The <em>HDF5 File Format
Specification</em>
defines how the abstract objects (for example, groups and datasets) are
represented as headers, B-tree blocks, and other elements.</p>
<p>The HDF5 Library implements operations to write HDF5 objects to the linear
format and to read from the linear format to create HDF5 objects. It is
important to realize that a single HDF5 abstract object is usually stored
as several objects. A dataset, for example, might be stored in a header
and in one or more data blocks, and these objects
might not be contiguous on the hard disk.</p>
<!-- NEW PAGE -->
<h4>1.3.2. Concrete Storage Model</h4>
<p>The HDF5 file format defines an abstract linear address space. This
can be implemented in different storage media such as a single file or
multiple files on disk or in memory.
The HDF5 Library defines an open interface called the <em>Virtual File
Layer</em> (VFL). The VFL allows different concrete storage models to be
selected. </p>
<!-- editingComment
<span class="editingComment">
See Ch. XXX and the VFL document [cite].
</span>
-->
<p>The VFL defines an abstract model, an API for random access storage,
and an API to plug in alternative VFL driver modules. The model defines
the operations that the VFL driver must and may support, and the plug-in
API enables the HDF5 Library to recognize the driver and pass it control
and data.</p>
<p>A number of VFL drivers have been defined in the HDF5 Library. Some
work with a single file, and some work with multiple files split in
various ways. Some work in serial computing environments, and some
work in parallel computing environments. Most work with disk copies of
HDF5 files, but one works with a memory copy. These drivers are listed
in the table in the <a href="08_TheFile.html#Drivers">“Alternate
File Storage Layouts and Low-level File Drivers”</a>
section in “The File” chapter. </p>
<!--
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig12.JPG">
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 12. Conceptual hierarchy of VFL drivers</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
9.28.2011. I removed the figure above. Drivers have changed a lot since
the figure was created. MEE -->
<p>Each driver isolates the details of reading and writing storage so
that the rest of the HDF5 Library and user program can be almost the same
for different storage methods. The exception to this rule is that some VFL
drivers need information from the calling application. This information is
passed using property lists. For example, the Parallel driver requires
certain control information that must be provided by the application.</p>
<br />
<SCRIPT language="JavaScript">
<!--
document.writeln ("
<div align="right">
<a href="#TOP"><font size="-1">(Top)</font></a>
</div>
</a>
");
-->
</SCRIPT>
<br />
<a name="Structure">
<!-- NEW PAGE -->
<h3 class="pagebefore">1.4. The Structure of an HDF5 File</h3>
</a>
<h4>1.4.1. Overall File Structure</h4>
<p>An HDF5 file is organized as a rooted, directed graph. Named data
objects are the nodes of the graph, and links are the directed arcs. Each
arc of the graph has a name, and the root group has the name “/”.
Objects are created and then inserted into the graph with the link operation
which creates a named link from a group to the object. For example, the
figure below <!-- formerly Figure 38 -->illustrates the structure of an HDF5
file when one dataset is created. An object can be the
target of more than one link. <a name="SupScript2">The names on the links
must be unique within each group, but there may be many links with the
same name in different groups. Link names are unambiguous: some ancestor
will have a different name, or they are the same object. The graph is
navigated with path names similar to Unix file systems.
<!-- editingComment
<span class="editingComment">[ [ [
[cite something]
] ] ]</span>
-->
An object can be opened with a full path starting at the root group or
with a relative path and a starting node (group). Note that all paths are
relative to a single HDF5 file. In this sense, an HDF5 file is analogous
to a single Unix file system.</a>
<a href="#FootNote"><sup><font size="-1">2</font></sup></a></p>
<table width = 300 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<img src="Images/Dmodel_fig38_a.JPG"><br />
a) Newly created file: one group, <code>/</code><br />
<img src="Images/Dmodel_fig38_b.JPG"><br />
b) Create a dataset called <code>/dset1</code><br />
(<code>HDcreate(..., “/dset2”, ...</code>)<br /><br />
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 12. An HDF5 file with one dataset
<!-- formerly Figure 38 --></b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>It is important to note that, just like the Unix file system, HDF5
objects do not have <em>names</em>. The names are associated with
<em>paths</em>. An object has a unique (within the file) <em>object
ID</em>, but a single object may have many names because there
may be many paths to the same object. An object can be renamed (moved to
another group) by adding and deleting links. In this case, the object
itself never moves. For that matter, membership in a group has no
implication for the physical location of the stored object.</p>
<!-- NEW PAGE -->
<p><a name="SupScript3">Deleting a link to an object does not necessarily
delete the object. The object remains available as long as there is at
least one link to it. After all the links to an object are deleted, it
can no longer be opened although the storage may or may not be
reclaimed.</a><a href="#FootNote"><sup><font size="-1">3</font></sup></a></p>
<p>It is important to realize that the linking mechanism can be used to
construct very complex graphs of objects. For example, it is possible for
an object to be shared between several groups and even to have more than
one name in the same group. It is also possible for a group to be a
member of itself or to be in a “cycle” in the graph. An example of a cycle
is where a child is the parent of one of its own ancestors. </p>
<!-- move the following paragraph to the Links chapter when it is written:
<p>HDF5 also has <em>soft links</em> similar to Unix soft links.
A soft link is an object that contains a name and a path name for the
target object. The soft link can be followed to open the target of the link
just like a regular (hard) link. Unlike hard links, the target of a soft
link has no count of the soft link to it. The reference count of an object
is the number of hard Links (which must be >= 1). A second difference is
that the hard link cannot be created if the target object does not exist,
and always points to the same object. A Soft Link can be created with any
path name, whether or not the object exists. Therefore, it may or may not
be possible to follow a Soft Link, or the target object
may change from one access to another access of the same Soft Link. </p>-->
<h4>1.4.2. HDF5 Path Names and Navigation</h4>
<p>The structure of the file constitutes the name space for the objects
in the file. A path name is a string of components separated by
‘/’. Each component is the name of a link or the special
character “.” for the current group. Link names (components)
can be any string of ASCII characters not containing ‘/’
(except the string “.” which is reserved). However, users
are advised to avoid the use of punctuation and non-printing characters
because they may create problems for other software. The figure below
<!-- formerly Figure 39 -->gives a BNF grammar for HDF5 path names.</p>
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="left">
<hr color="green" size="3"/>
<pre>
PathName ::= AbsolutePathName | RelativePathName
Separator ::= "/" ["/"]*
AbsolutePathName ::= Separator [ RelativePathName ]
RelativePathName ::= Component [ Separator RelativePathName ]*
Component ::= "." | Name
Name ::= Character+ - {"."}
Character ::= {<em>c</em>: <em>c</em> in {{ <em>legal ASCII characters</em> } - {'/'}}
</pre></td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 13. A BNF grammar for path names
<!-- formerly Figure 39--></b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<p>An object can always be addressed by a <em>full or absolute path</em>
which would start at the root group. As already noted, a given object
can have more than one full path name. An object can also be addressed
by a relative path which would start at a group and include the path
to the object.</p>
<p>The structure of an HDF5 file is “self-describing.” This
means that it is possible to navigate the file to discover all the
objects in the file. Basically, the structure is traversed as a graph
starting at one node and recursively visiting the nodes of the graph.</p>
<!-- move the following paragraph to the Links chapter when it is written:
<p>The members of a group can be discovered with the H5Giterate function,
and a description of the object can be retrieved with the H5Gget_obj_info
function. In this way, all the members of a given group can be determined,
and each can be opened to retrieve a description, or the data and
attributes of the object.</p> -->
<!-- editingComment
<span class="editingComment">
See ??? for more information about navigating and discovering the contents
of and HDF5 file.
</span>
-->
<h4>1.4.3. Examples of HDF5 File Structures</h4>
<p>The figure below <!-- formerly Figure 40 -->shows some possible HDF5
file structures with groups and datasets. Part a of the figure shows the
structure of a file with three groups.
Part b of the figure shows a dataset created in “/group1”. Part
c shows the structure after a dataset called dset2 has been added to the
root group. Part d the structure after another group and dataset have been
added.</p>
<!-- NEW PAGE -->
<table width = 600 cellspacing="0" align="center">
<tr valign="top">
<td align="center">
<hr color="green" size="3"/>
<table width="100%">
<tr valign="top" align="left">
<td width="48%">a) Three groups; two are members of the root
group,<br />
<code>/group1</code> and <code>/group2</code></td>
<td width="4%"> </td>
<td width="48%">b) Create a dataset in <code>/group1</code>:
<br />
<code>/group1/dset1</code></td>
</tr>
<tr valign="middle" align="center">
<td width="48%"><img src="Images/Dmodel_fig40_a.JPG"
width="230"></td>
<td width="4%"> </td>
<td width="48%"><img src="Images/Dmodel_fig40_b.JPG"
width="230"></td>
</tr>
<tr valign="top" align="left">
<td width="48%">c) Another dataset, a member of the root
group: <br />
<code>/dset2</code></td>
<td width="4%"> </td>
<td width="48%">d) And another group and dataset, reusing
object names: <br />
<code>/group2/group2/dset2</code></td>
</tr>
<tr valign="middle" align="center">
<td width="48%"><img src="Images/Dmodel_fig40_c.JPG"
width="230"></td>
<td width="4%"> </td>
<td width="48%"><img src="Images/Dmodel_fig40_d.JPG"
width="282"></td>
</tr>
</table>
</td>
</tr>
<tr><td><hr color="green" size="1" /></td></tr>
<tr valign="top">
<td align="left" >
<b>Figure 14.
<!-- formerly Figure 40: -->
Examples of HDF5 file structures with groups and datasets</b>
<hr color="green" size="3"/></td>
</tr>
</table>
<br />
<!-- FOR USE WITH ELECTRONIC VERSION ----------------------------------->
<br /><br /><br />
<!-- FOR USE WITH ELECTRONIC VERSION ----------------------------------->
<p> </p>
<a name="FootNote"><hr width="200px" align="left"></a>
<font size="-1"><sup><a href="#SupScript1">1</a></sup>HDF5 requires random
access to the linear address space. For this reason it is not well suited for
some data media such as streams.</font>
<br />
<!-- <font size="-1"><sup><a href="#SupScript2">2</a></sup>However, a compound
datatype with zero members can have no data, so it is useless.</font>
<br /> -->
<font size="-1"><sup><a href="#SupScript2">2</a></sup>It could be said that
HDF5 extends the organizing concepts of a file system to the internal
structure of a single file.</font>
<br />
<font size="-1"><sup><a href="#SupScript3">3</a></sup>As of HDF5-1.4, the
storage used for an object is reclaimed, even if all links
are deleted.</font>
</body>
</html>
|