1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
|
<html>
<head>
<title>Reserved File Address Space</title>
</head>
<!--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* Copyright by the Board of Trustees of the University of Illinois. *
* All rights reserved. *
* *
* This file is part of HDF5. The full HDF5 copyright notice, including *
* terms governing use, modification, and redistribution, is contained in *
* the files COPYING and Copyright.html. COPYING can be found at the root *
* of the source code distribution tree; Copyright.html can be found at the *
* root level of an installed copy of the electronic HDF5 document set and *
* is linked from the top-level documents page. It can also be found at *
* http://hdfgroup.org/HDF5/doc/Copyright.html. If you do not have *
* access to either file, you may request a copy from help@hdfgroup.org. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-->
<body>
<hl>Reserved File Address Space</hl>
<p><b>Note: this tech note is no longer accurate for HDF5 1.8. While users can
still adjust the size of HDF5 addresses, HDF5 no longer uses lazy allocations
due to the consequences of doing so in parallel when using the new metadata
cache. -JML 2006-09-28</b></p>
<p>HDF5 files use 8-byte addresses by default, but users can change this to 2,
4, or even 16 bytes. This means that it is possible to have files that only
address 64 KB of space, and thus that HDF must handle the case of files that
have enough space on disk but not enough internal address space to be written.</p>
<p>Thus, every time space is allocated in a file, HDF needs to check that this
allocation is within the files address space. If not, HDF should output an
error and ensure that all the data currently in the file (everything that is
still addressable) is successfully written to disk.</p>
<p>Unfortunately, some structures are stored in memory and do not allocate space
for themselves until the file is actually flushed to disk (object headers and the
local heap). This is good for efficiency, since these structures can grow without
creating the fragmentation that would result from frequent allocation and deallocation,
but means that if the library runs out of addressable space while allocating memory,
these structures will not be present in the file. Without them, HDF5 does not know
how to parse the data in the file, rendering it unreadable.</p>
<p>Thus, HDF keeps track of the space reserved for allocation in the file
(H5FD_t struct). When a function tries to allocate space in the file, it first
checks that the allocation would not overflow the address space, taking the reserved
space into account. When object headers or the heap finally allocate the space they
have reserved, they free the reserved space before allocating file space.</p>
<p>A given object header is only flushed to disk once, but the heap can be flushed
to disk multiple times over the life of the file and will require contiguous space
every time. To handle this, the heap keeps track of how much space it has reserved.
This allows it to reserve space only when it grows (when it is dirty and needs to be
re-written to disk).</p>
<p>For instance, if the heap is flushed to disk, it frees its reserved space. If new
data is inserted into the heap in memory, the heap may need to flush to disk again in
a new, larger section of memory. Thus, not only does it reserve space in the file
for this new data, but also for all of the previously-existing data in the heap to
be re-written. The next insert, however, will only need to reserve space for its new
data, since the rest of the heap already has space reserved for it.</p>
<p>Potential issues:
<ol>
<li>This system does not take into accout deleted space. Deleted space can still
be allocated as usual, but "reserved" space is always taken off the end of a file.
This means that a file that was filled to capacity but then had a significant number
of objects deleted will still throw errors if more data is written. This occurs
because the file's free space is in the middle of the file, not at the end. A
more complete space-reservation system would test if the reserved data can fit
into the file's free list, but this would be significantly more complicated to
implement.</li>
<li>HDF5 currently claims to support 16-byte addresses, but a number of platforms
do not support 16-byte integers, so addresses of this size cannot be represented
in memory. This solution does not attempt to address this issue.</li>
</ol>
</p>
<hr>
<address>
Last modifoed: 28 September 2006
</address>
</body>
</html>
|