1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230
|
mmap_allocator - A STL allocator that mmaps files
-------------------------------------------------
Introduction
------------
When reading large files (>100MB) into memory, read() calls are usually
not very space and time efficient, since the whole data is copiied at
least once. Furthermore, when using STL containers (like a vector for
example), data is copiied another time unless you specify the location
of the vector content as parameter to read().
It would be nice to tell the vector a filename and have the vector mmap
the file directly. This not only avoids the read() copiing (and the
STL vector copiing) but also allows different processes that read the
same file to see the same physical memory. Fortunately STL foresees
an interface to do exactly this.
Put short, if you need to handle big files that contain unstructured data
(like doubles or even text files), mmap_allocator is worth a try. It was
written as part of a consulting project I did for a large Austrian bank.
License
-------
This code is (c) Johannes Thoma 2012 and is LGPL. Please write me an email to
johannes.thoma@gmx.at if you find this useful, or if you find bugs or want
to extend the library.
How it works
------------
Many STL container classes accept allocators as part of their instantiation
and construction. The default allocator simply does a malloc on allocate and
a free on deallocate. By subclassing the default allocator and replacing the
allocate and deallocate methods with code that memory maps a file and return
a pointer to the mmaped area, we can create STL containers that use memory
mapped regions directly.
Usage
-----
To build this, do a
make
followed by a
make install
When compiling, include the mmappable_vector.h file and link with
the -lmmap_allocator library.
Example
-------
Suppose you have a file with 1024 ints. Then:
mmappable_vector<int> my_vector;
declares a vector that uses mmap_allocator to mmap the file. By calling:
my_vector.mmap_file("testfile", READ_ONLY, 0, 1024);
the mmappable_vector class will call allocate of the mmap_allocator
class, which in turn will mmap the file and set the content of the
vector to the mmapped area. The file's content can then be accessed
the usual way (my_vector[i]).
Use
my_vector.munmap_file();
to drop the mapping (it may remain in an internal cache, however). After
this call all accesses of mmappable_vector elements are invalid, same
goes for the iterators.
Do not forget to:
using namespace mmap_allocator_namespace;
and include:
#include "mmappable_vector.h"
but you probably know that yourself ;)
Please see the test_allocator.cpp file for more examples
(I used Testdriven development for this project and can highly recommend
this development model).
Mode
----
As part of the construction of the mmap_allocator object a mode field can be
specified. It does the following:
* DEFAULT_STL_ALLOCATOR: Default STL allocator (malloc based). Reason
is to have containers that do both and are compatible
* READ_ONLY: Readonly modus. Segfaults when vector content is written to.
* READ_WRITE_PRIVATE: Read/write access, writes are not propagated to disk.
* READ_WRITE_SHARED: Read/write access, writes are propagated to disk
(file is modified)
The offset parameter must be page aligned (PAGE_SIZE, usually 4K or 8K),
else mmap will return an error.
Mmap file pool
--------------
Beginning with 0.3, mmap allocator uses a pool of mmaped files. If you
map the same file with the same access mode again it gets the same
(virtual and physical) memory assigned as it was mapped previously.
We needed this because our program (that uses mmap_allocator) mmaps
a file in many small chunks (10K junks) which eventually lead to a
Out of filedescriptors error (when mapping a 100Meg file in 10K junks).
Following flags can be passed (by |'ing them together) to the mmap_file
method:
* MAP_WHOLE_FILE: Set this flag when you know that you need
the whole (or most parts of the) file later and only want to
request a part of it now.
* ALLOW_REMAP: Set this flag if you are allocating a vector
from a mmapped file and you know that you do not need previous
mappings any more. Normally this is used when the file size
changes (in particular when it grows). Be aware, however that
this could invalidate all allocations for that file that have
been made before.
* BYPASS_FILE_POOL: Set this flag if you want a per-vector
private mapping. This is useful in conjunction with READ_WRITE_PRIVATE
mappings.
Mmappable vector class
----------------------
Beginning with version 0.4, there is a special std::vector subclass
for use with mmapped vectors. This was necessary because there is
no standard way to set the size of an STL vector without initializing
the content (which is not wanted since the content is coming from the
mmapped file). From now on, please use the mmapped_vector class for
using mmap_allocator.
The old method by using the mmap_allocators constructor to set the
parameters for file mapping is still supported, however deprecated.
READ_WRITE caveats
------------------
Unlike reading from a file, a mmappable_vector that is mapped via
the mmap file pool contains all changes already made to the content
before. Suppose if you have:
mmappable_vector<int> p;
if (method == MMAP) {
p.mmap_file("testfile", READ_WRITE_PRIVATE, 0, filesize("testfile"));
} else {
readFromFileIntoVector(p, "testfile", READ_WRITE_PRIVATE, 0, filesize("testfile")
}
and then do something like:
for (it = p.begin();it != p.end();it++) {
*it += 1;
}
This will do not what you would expect at first glance when being called
twice. When the vector maps the file for the second time, it will map it
from the file pool and hence already have the values increased by one.
To avoid this, use the bypass_file_pool flag which will cause the file
being mmapped another time with different virtual memory mappings.
Exceptions
----------
There is a custom exception class mmap_allocator_exception which is
used at various places, for example when the file to be mapped is
not found. Use the e.message() method to find out what happened.
Conversion function templates
-----------------------------
To convert between a STL vector and a mmappable vector, use the
to_std_vector(mappable_vector) and to_mmappable_vector(std_vector)
function templates. However, try to avoid this because this will
copy the whole vector contents.
Verbosity
---------
To debug the allocator, there are two ways:
* call set_verbosity() with a positive value at runtime.
* define MMAP_ALLOCATOR_DEBUG as a positive value at compile time.
The latter is done by make test.
Version history
---------------
* 0.1.0, first release.
* 0.2.0, some interface changes.
* 0.3.0, mmaped file pool.
* 0.3.1, do not remap files when area fits in already mapped file.
* 0.3.2, never use MAP_FIXED.
* 0.3.3, bugfix in computing pointers.
* 0.4.0, mmapped_vector class.
* 0.5.0, cleaner interface, illegal offset bugfix.
* 0.5.1, bypass_file_pool flag.
* 0.5.2, changed bool parameters to flags argument.
* 0.5.3, set_verbosity() to toggle debug output.
* 0.6.0, to_stl_vector function template.
* 0.6.1, flags bugfix.
* 0.7.0, moved mmappable_vector to separate header.
* 0.8.0, to_mmappable_vector conversion function.
* 0.8.1, exception now knows what() method.
* 0.9.0, more standard conformant exception handling.
* 0.9.1, fixed a permission bug in MMAP_READWRITE_PRIVATE.
* 0.10.0, KEEP_FOREVER flag: never close any file.
* 0.10.1, fixed bug with allocating/deallocating 0 bytes.
Author
------
This was written by Johannes Thoma <johannes.thoma@gmx.at>
Thanks to Piotr Nycz from stackoverflow for contributing
the constructor function template in mmappable_vector.h
|