1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Memory Management in HDF5</title>
</head>
<!--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* Copyright by the Board of Trustees of the University of Illinois. *
* All rights reserved. *
* *
* This file is part of HDF5. The full HDF5 copyright notice, including *
* terms governing use, modification, and redistribution, is contained in *
* the files COPYING and Copyright.html. COPYING can be found at the root *
* of the source code distribution tree; Copyright.html can be found at the *
* root level of an installed copy of the electronic HDF5 document set and *
* is linked from the top-level documents page. It can also be found at *
* http://hdfgroup.org/HDF5/doc/Copyright.html. If you do not have *
* access to either file, you may request a copy from help@hdfgroup.org. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-->
<body>
<h1>Memory Management in HDF5</h1>
<!-- ---------------------------------------------------------------- -->
<h2>Is a Memory Manager Necessary?</h2>
<p>Some form of memory management may be necessary in HDF5 when
the various deletion operators are implemented so that the
file memory is not permanently orphaned. However, since an
HDF5 file was designed with persistent data in mind, the
importance of a memory manager is questionable.
<p>On the other hand, when certain meta data containers (file glue)
grow, they may need to be relocated in order to keep the
container contiguous.
<blockquote>
<b>Example:</b> An object header consists of up to two
chunks of contiguous memory. The first chunk is a fixed
size at a fixed location when the header link count is
greater than one. Thus, inserting additional items into an
object header may require the second chunk to expand. When
this occurs, the second chunk may need to move to another
location in the file, freeing the file memory which that
chunk originally occupied.
</blockquote>
<p>The relocation of meta data containers could potentially
orphan a significant amount of file memory if the application
has made poor estimates for preallocation sizes.
<!-- ---------------------------------------------------------------- -->
<h2>Levels of Memory Management</h2>
<p>Memory management by the library can be independent of memory
management support by the file format. The file format can
support no memory management, some memory management, or full
memory management. Similarly with the library.
<h3>Support in the Library</h3>
<dl>
<dt><b>No Support: I</b>
<dd>When memory is deallocated it simply becomes unreferenced
(orphaned) in the file. Memory allocation requests are
satisfied by extending the file.
<dd>A separate off-line utility can be used to detect the
unreferenced bytes of a file and "bubble" them up to the end
of the file and then truncate the file.
<dt><b>Some Support: II</b>
<dd>The library could support partial memory management all
the time, or full memory management some of the time.
Orphaning free blocks instead of adding them to a free list
should not affect the file integrity, nor should fulfilling
new requests by extending the file instead of using the free
list.
<dt><b>Full Support: III</b>
<dd>The library supports space-efficient memory management by
always fulfilling allocation requests from the free list when
possible, and by coalescing adjacent free blocks into a
single larger free block.
</dl>
<h3>Support in the File Format</h3>
<dl>
<dt><b>No Support: A</b>
<dd>The file format does not support memory management; any
unreferenced block in the file is assumed to be free. If
the library supports full memory management then it will
have to traverse the entire file to determine which blocks
are unreferenced.
<dt><b>Some Support: B</b>
<dd>Assuming that unreferenced blocks are free can be
dangerous in a situation where the file is not consistent.
For instance, if a directory tree becomes detached from the
main directory hierarchy, then the detached directory and
everything that is referenced only through the detached
directory become unreferenced. File repair utilities will
be unable to determine which unreferenced blocks need to be
linked back into the file hierarchy.
<dd>Therefore, it might be useful to keep an unsorted,
doubly-linked list of free blocks in the file. The library
can add and remove blocks from the list in constant time,
and can generate its own internal free-block data structure
in time proportional to the number of free blocks instead of
the size of the file. Additionally, a library can use a
subset of the free blocks, an alternative which is not
feasible if the file format doesn't support any form of
memory management.
<dt><b>Full Support: C</b>
<dd>The file format can mirror library data structures for
space-efficient memory management. The free blocks are
linked in unsorted, doubly-linked lists with one list per
free block size. The heads of the lists are pointed to by a
B-tree whose nodes are sorted by free block size. At the
same time, all free blocks are the leaf nodes of another
B-tree sorted by starting and ending address. When the
trees are used in combination we can deallocate and allocate
memory in O(log <em>N</em>) time where <em>N</em> is the
number of free blocks.
</dl>
<h3>Combinations of Library and File Format Support</h3>
<p>We now evaluate each combination of library support with file
support:
<dl>
<dt><b>I-A</b>
<dd>If neither the library nor the file support memory
management, then each allocation request will come from the
end of the file and each deallocation request is a no-op
that simply leaves the free block unreferenced.
<ul>
<li>Advantages
<ul>
<li>No file overhead for allocation or deallocation.
<li>No library overhead for allocation or
deallocation.
<li>No file traversal required at time of open.
<li>No data needs to be written back to the file when
it's closed.
<li>Trivial to implement (already implemented).
</ul>
<li>Disadvantages
<ul>
<li>Inefficient use of file space.
<li>A file repair utility must reclaim lost file space.
<li>Difficulties for file repair utilities. (Is an
unreferenced block a free block or orphaned data?)
</ul>
</ul>
<dt><b>II-A</b>
<dd>In order for the library to support memory management, it
will be required to build the internal free block
representation by traversing the entire file looking for
unreferenced blocks.
<ul>
<li>Advantages
<ul>
<li>No file overhead for allocation or deallocation.
<li>Variable amount of library overhead for allocation
and deallocation depending on how much work the
library wants to do.
<li>No data needs to be written back to the file when
it's closed.
<li>Might use file space efficiently.
</ul>
<li>Disadvantages
<ul>
<li>Might use file space inefficiently.
<li>File traversal required at time of open.
<li>A file repair utility must reclaim lost file space.
<li>Difficulties for file repair utilities.
<li>Sharing of the free list between processes falls
outside the HDF5 file format documentation.
</ul>
</ul>
<dt><b>III-A</b>
<dd>In order for the library to support full memory
management, it will be required to build the internal free
block representation by traversing the entire file looking
for unreferenced blocks.
<ul>
<li>Advantages
<ul>
<li>No file overhead for allocation or deallocation.
<li>Efficient use of file space.
<li>No data needs to be written back to the file when
it's closed.
</ul>
<li>Disadvantages
<ul>
<li>Moderate amount of library overhead for allocation
and deallocation.
<li>File traversal required at time of open.
<li>A file repair utility must reclaim lost file space.
<li>Difficulties for file repair utilities.
<li>Sharing of the free list between processes falls
outside the HDF5 file format documentation.
</ul>
</ul>
<dt><b>I-B</b>
<dd>If the library doesn't support memory management but the
file format supports some level of management, then a file
repair utility will have to be run occasionally to reclaim
unreferenced blocks.
<ul>
<li>Advantages
<ul>
<li>No file overhead for allocation or deallocation.
<li>No library overhead for allocation or
deallocation.
<li>No file traversal required at time of open.
<li>No data needs to be written back to the file when
it's closed.
</ul>
<li>Disadvantages
<ul>
<li>A file repair utility must reclaim lost file space.
<li>Difficulties for file repair utilities.
</ul>
</ul>
<dt><b>II-B</b>
<dd>Both the library and the file format support some level
of memory management.
<ul>
<li>Advantages
<ul>
<li>Constant file overhead per allocation or
deallocation.
<li>Variable library overhead per allocation or
deallocation depending on how much work the library
wants to do.
<li>Traversal at file open time is on the order of the
free list size instead of the file size.
<li>The library has the option of reading only part of
the free list.
<li>No data needs to be written at file close time if
it has been amortized into the cost of allocation
and deallocation.
<li>File repair utilties don't have to be run to
reclaim memory.
<li>File repair utilities can detect whether an
unreferenced block is a free block or orphaned data.
<li>Sharing of the free list between processes might
be easier.
<li>Possible efficient use of file space.
</ul>
<li>Disadvantages
<ul>
<li>Possible inefficient use of file space.
</ul>
</ul>
<dt><b>III-B</b>
<dd>The library provides space-efficient memory management but
the file format only supports an unsorted list of free
blocks.
<ul>
<li>Advantages
<ul>
<li>Constant time file overhead per allocation or
deallocation.
<li>No data needs to be written at file close time if
it has been amortized into the cost of allocation
and deallocation.
<li>File repair utilities don't have to be run to
reclaim memory.
<li>File repair utilities can detect whether an
unreferenced block is a free block or orphaned data.
<li>Sharing of the free list between processes might
be easier.
<li>Efficient use of file space.
</ul>
<li>Disadvantages
<ul>
<li>O(log <em>N</em>) library overhead per allocation or
deallocation where <em>N</em> is the total number of
free blocks.
<li>O(<em>N</em>) time to open a file since the entire
free list must be read to construct the in-core
trees used by the library.
<li>Library is more complicated.
</ul>
</ul>
<dt><b>I-C</b>
<dd>This has the same advantages and disadvantages as I-C with
the added disadvantage that the file format is much more
complicated.
<dt><b>II-C</b>
<dd>If the library only provides partial memory management but
the file requires full memory management, then this method
degenerates to the same as II-A with the added disadvantage
that the file format is much more complicated.
<dt><b>III-C</b>
<dd>The library and file format both provide complete data
structures for space-efficient memory management.
<ul>
<li>Advantages
<ul>
<li>Files can be opened in constant time since the
free list is read on demand and amortised into the
allocation and deallocation requests.
<li>No data needs to be written back to the file when
it's closed.
<li>File repair utilities don't have to be run to
reclaim memory.
<li>File repair utilities can detect whether an
unreferenced block is a free block or orphaned data.
<li>Sharing the free list between processes is easy.
<li>Efficient use of file space.
</ul>
<li>Disadvantages
<ul>
<li>O(log <em>N</em>) file allocation and deallocation
cost where <em>N</em> is the total number of free
blocks.
<li>O(log <em>N</em>) library allocation and
deallocation cost.
<li>Much more complicated file format.
<li>More complicated library.
</ul>
</ul>
</dl>
<!-- ---------------------------------------------------------------- -->
<h2>The Algorithm for II-B</h2>
<p>The file contains an unsorted, doubly-linked list of free
blocks. The address of the head of the list appears in the
boot block. Each free block contains the following fields:
<center>
<table border cellpadding=4 width="60%">
<tr align=center>
<th width="25%">byte</th>
<th width="25%">byte</th>
<th width="25%">byte</th>
<th width="25%">byte</th>
<tr align=center>
<th colspan=4>Free Block Signature</th>
<tr align=center>
<th colspan=4>Total Free Block Size</th>
<tr align=center>
<th colspan=4>Address of Left Sibling</th>
<tr align=center>
<th colspan=4>Address of Right Sibling</th>
<tr align=center>
<th colspan=4><br><br>Remainder of Free Block<br><br><br></th>
</table>
</center>
<p>The library reads as much of the free list as convenient when
convenient and pushes those entries onto stacks. This can
occur when a file is opened or any time during the life of the
file. There is one stack for each free block size and the
stacks are sorted by size in a balanced tree in memory.
<p>Deallocation involves finding the correct stack or creating
a new one (an O(log <em>K</em>) operation where <em>K</em> is
the number of stacks), pushing the free block info onto the
stack (a constant-time operation), and inserting the free
block into the file free block list (a constant-time operation
which doesn't necessarily involve any I/O since the free blocks
can be cached like other objects). No attempt is made to
coalesce adjacent free blocks into larger blocks.
<p>Allocation involves finding the correct stack (an O(log
<em>K</em>) operation), removing the top item from the stack
(a constant-time operation), and removing the block from the
file free block list (a constant-time operation). If there is
no free block of the requested size or larger, then the file
is extended.
<p>To provide sharability of the free list between processes,
the last step of an allocation will check for the free block
signature and if it doesn't find one will repeat the process.
Alternatively, a process can temporarily remove free blocks
from the file and hold them in it's own private pool.
<p>To summarize...
<dl>
<dt>File opening
<dd>O(<em>N</em>) amortized over the time the file is open,
where <em>N</em> is the number of free blocks. The library
can still function without reading any of the file free
block list.
<dt>Deallocation
<dd>O(log <em>K</em>) where <em>K</em> is the number of unique
sizes of free blocks. File access is constant.
<dt>Allocation
<dd>O(log <em>K</em>). File access is constant.
<dt>File closing
<dd>O(1) even if the library temporarily removes free
blocks from the file to hold them in a private pool since
the pool can still be a linked list on disk.
</dl>
<!-- ---------------------------------------------------------------- -->
<h2>The Algorithm for III-C</h2>
<p>The HDF5 file format supports a general B-tree mechanism
for storing data with keys. If we use a B-tree to represent
all parts of the file that are free and the B-tree is indexed
so that a free file chunk can be found if we know the starting
or ending address, then we can efficiently determine whether a
free chunk begins or ends at the specified address. Call this
the <em>Address B-Tree</em>.
<p>If a second B-tree points to a set of stacks where the
members of a particular stack are all free chunks of the same
size, and the tree is indexed by chunk size, then we can
efficiently find the best-fit chunk size for a memory request.
Call this the <em>Size B-Tree</em>.
<p>All free blocks of a particular size can be linked together
with an unsorted, doubly-linked, circular list and the left
and right sibling addresses can be stored within the free
chunk, allowing us to remove or insert items from the list in
constant time.
<p>Deallocation of a block fo file memory consists of:
<ol type="I">
<li>Add the new free block whose address is <em>ADDR</em> to the
address B-tree.
<ol type="A">
<li>If the address B-tree contains an entry for a free
block that ends at <em>ADDR</em>-1 then remove that
block from the B-tree and from the linked list (if the
block was the first on the list then the size B-tree
must be updated). Adjust the size and address of the
block being freed to include the block just removed from
the free list. The time required to search for and
possibly remove the left block is O(log <em>N</em>)
where <em>N</em> is the number of free blocks.
<li>If the address B-tree contains an entry for the free
block that begins at <em>ADDR</em>+<em>LENGTH</em> then
remove that block from the B-tree and from the linked
list (if the block was the first on the list then the
size B-tree must be updated). Adjust the size of the
block being freed to include the block just removed from
the free list. The time required to search for and
possibly remove the right block is O(log <em>N</em>).
<li>Add the new (adjusted) block to the address B-tree.
The time for this operation is O(log <em>N</em>).
</ol>
<li>Add the new block to the size B-tree and linked list.
<ol type="A">
<li>If the size B-tree has an entry for this particular
size, then add the chunk to the tail of the list. This
is an O(log <em>K</em>) operation where <em>K</em> is
the number of unique free block sizes.
<li>Otherwise make a new entry in the B-tree for chunks of
this size. This is also O(log <em>K</em>).
</ol>
</ol>
<p>Allocation is similar to deallocation.
<p>To summarize...
<dl>
<dt>File opening
<dd>O(1)
<dt>Deallocation
<dd>O(log <em>N</em>) where <em>N</em> is the total number of
free blocks. File access time is O(log <em>N</em>).
<dt>Allocation
<dd>O(log <em>N</em>). File access time is O(log <em>N</em>).
<dt>File closing
<dd>O(1).
</dl>
<hr>
<!-- <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> -->
<!-- Created: Thu Jul 24 15:16:40 PDT 1997 -->
<!-- hhmts start -->
Last modified: Thu Jul 31 14:41:01 EST
<!-- hhmts end -->
</body>
</html>
|