1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
|
\chapter{Memory Mapping}
\label{cha:memmap}
\declaremodule{extension}{numarray.memmap}
\index{character array}
\index{string array}
\section{Introduction}
\label{sec:memmap-intro}
\code{numarray} provides support for the creation of arrays which are
mapped directly onto files with the \code{numarray.memmap} module.
Much of \code{numarray}'s design, the ability to handle misaligned and
byteswapped arrays for instance, was motivated by the desire to create
arrays from portable files which contain binary array data. One
advantage of memory mapping is efficient random access to small
regions of a large file: only the region of the mapped file which is
actually used in array operations needs to be paged into system
memory; the rest of the file remains unread and unwritten.
\code{numarray.memmap} is pure Python and is layered on top of
Python's \code{mmap} module. The basic idea behind \code{numarray}'s
memory mapping is to create a ``buffer'' referring to a region in a
mapped file and to use it as the data store for an array. The
\code{numarray.memmap} module contains two classes, one which
corresponds to an entire mapped file (\class{Memmap}) and one which
corresponds to a contiguous region within a file
(\class{MemmapSlice}). \class{MemmapSlice} objects have these
properties:
\begin{itemize}
\item MemmapSlices can be used as NumArray buffers.
\item MemmapSlices are non-overlapping.
\item MemmapSlices are resizable.
\item Changing the size of a MemmapSlice changes the parent Memmap.
\end{itemize}
\section{Opening a Memmap}
\label{sec:memmap-open}
You can create a \class{Memmap} object by calling the \function{open} function,
as in:
\begin{verbatim}
>>> m = open("memmap.tst","w+",len=48)
>>> m
<Memmap on file 'memmap.tst' with mode='w+', length=48, 0 slices>
\end{verbatim}
Here, the file ``memmap.tst'' is created/truncated to a length of 48
bytes and used to construct a Memmap object in write mode whose
contents are considered undefined.
\section{Slicing a Memmap}
\label{sec:memmap-slicing}
Once opened, a \class{Memmap} object can be sliced into regions.
\begin{verbatim}
# Slice m into the buffers "n" and "p" which will correspond to numarray:
>>> n = m[0:16]
>>> n
<MemmapSlice of length:16 writable>
>>> p = m[24:48]
>>> p
<MemmapSlice of length:24 writable>
\end{verbatim}
NOTE: You cannot make \emph{overlapping} slices of a Memmap:
\begin{verbatim}
>>> q = m[20:28]
Traceback (most recent call last):
...
IndexError: Slice overlaps prior slice of same file.
\end{verbatim}
Deletion of a slice is possible once all other references to it are
forgotten, e.g. all arrays that used it have themselves been deleted.
Deletion of a slice of a Memmap "un-registers" the slice, making that
region of the Memmap available for reallocation. Delete directly from
the Memmap without referring to the MemmapSlice:
\begin{verbatim}
>>> m = Memmap("memmap.tst",mode="w+",len=100)
>>> m1 = m[0:50]
>>> del m[0:50] # note: delete from m, not m1
>>> m2 = m[0:70]
\end{verbatim}
Note that since the region of m1 was deleted, there is no overlap when
m2 is created. However, deleting the region of m1 has invalidated it:
\begin{verbatim}
>>> m1
Traceback (most recent call last):
...
RuntimeError: A deleted MemmapSlice has been used.
\end{verbatim}
Don't mix operations on a Memmap which modify its data or file
structure with slice deletions. In this case, the status of the
modifications is undefined; the underlying map may or may not reflect
the modifications after the deletion.
\section{Creating an array from a MemmapSlice}
\label{sec:memmap-array-construction}
Arrays are created from \class{MemmapSlice}s simply by specifying the
slice as the \var{buffer} parameter of the array. Since the slice is
essentially just a byte string, it's necessary to specify the
\var{type} of the binary data as well.
\begin{verbatim}
>>> a = num.NumArray(buffer=n, shape=(len(n)/4,), type=num.Int32)
>>> a[:] = 0 # Since the initial contents of 'n' are undefined.
>>> a += 1
array([1, 1, 1, 1], type=Int32)
\end{verbatim}
\section{Resizing a MemmapSlice}
\label{sec:memmap-slice}
Arrays based on \class{MemmapSlice} objects are resizable. As soon as
they're resized, slices become un-mapped or ``free floating''.
Resizing a slice affects the parent \class{Memmap}.
\begin{verbatim}
>>> a.resize(6)
array([1, 1, 1, 1, 1, 1], type=Int32)
\end{verbatim}
\section{Forcing file updates and closing the Memmap}
\label{sec:memmap-flushing-closing}
After doing slice resizes or inserting new slices, call
\function{flush} to synchronize the underlying map file with any free
floating slices. This explicit step is required to avoid implicitly
shuffling huge amounts of file space for every \function{resize} or
\function{insert}. After calling \function{flush}, all slices are
once again memory mapped rather than free floating.
\begin{verbatim}
>>> m.flush()
\end{verbatim}
A related concept is ``syncing'' which applies even to arrays which
have not been resized. Since memory maps don't guarantee when the
underlying file will be updated with the values you have written to
the map, call \function{sync} when you want to be sure your changes
are on disk. This is similar to syncing a UNIX file system. Note
that \function{sync} does not consolidate the mapfile with any free
floating slices (newly inserted or resized), it merely ensures that
mapped slices whose contents have been altered are written to disk.
\begin{verbatim}
>>> m.sync()
\end{verbatim}
Now "a" and "b" are both memory mapped on "memmap.tst" again.
When you're done with the memory map and numarray, call
\function{close}. \function{close} calls \function{flush} which will
consolidate resized or inserted slices as necessary.
\begin{verbatim}
>>> m.close()
\end{verbatim}
It is an error to use "m" (or slices of m) any further after closing
it.
\section{numarray.memmap functions}
\label{sec:memmap-functions}
\begin{funcdesc}{open}{filename, mode='r+', len=None}
\label{func:memmap-open}
\function{open} opens a \class{Memmap} object on the file
\var{filename} with the specified \var{mode}. Available \var{mode}
values include 'readonly' ('r'), 'copyonwrite' ('c'), 'readwrite'
('r+'), and 'write' ('w+'), all but the last of which have contents
defined by the file.
Neither mode 'r' nor mode 'c' can affect the underlying map file.
Readonly maps impose no requirement on system swap space and raise
exceptions when their contents are modified. Copy-on-write maps
require system swap space corresponding to their size, but have
modifiable pages which become reassociated with system swap as they
are changed leaving the original map file unaltered. Insufficient
swap space can prevent the creation of a copy-on-write memory map.
Modifications to readwrite memory maps are eventually reflected onto
the map file; see flushing and syncing.
\end{funcdesc}
\begin{funcdesc}{close}{map}
\label{func:memmap-close}
\function{close} closes the \class{Memmap} object specified by
\var{map}.
\end{funcdesc}
\section{Memmap methods}
\label{sec:memmap-methods}
A Memmap object represents an entire mapped file and is sliced to
create objects which can be used as array buffers. It has these
public methods:
\begin{methoddesc}[Memmap]{close}{}
\function{close} unites the \class{Memmap} and any RAM based slices with
its underlying file and removes the mapping and all references to
its slices. Once a \class{Memmap} has been closed, all of its
slices become unusable.
\end{methoddesc}
\begin{methoddesc}[Memmap]{find}{string, offset=0}
find(string, offset=0) returns the first index at which string
is found, or -1 on failure.
\begin{verbatim}
>>> _open("memmap.tst","w+").write("this is a test")
>>> Memmap("memmap.tst",len=14).find("is")
2
>>> Memmap("memmap.tst",len=14).find("is", 3)
5
>>> _open("memmap.tst","w+").write("x")
>>> Memmap("memmap.tst",len=1).find("is")
-1
\end{verbatim}
\end{methoddesc}
\begin{methoddesc}[Memmap]{insert}{offset, size=None, buffer=None}
\function{insert} places a new slice at the specified \var{offset} of
the \class{Memmap}. \var{size} indicates the length in bytes of the
inserted slice when \var{buffer} is not specified. If \function{buffer}
is specified, it should refer to an existing memory object created
using \code{numarray.memory.new_memory} and \function{size} should not
be specified.
\end{methoddesc}
\begin{methoddesc}[Memmap]{flush}{}
\function{flush} writes a \class{Memmap} out to its associated file,
reconciling any inserted or resized slices by backing them directly
on the map file rather than a system swap file. \function{flush}
only makes sense for write and readwrite memory maps.
\end{methoddesc}
\begin{methoddesc}[Memmap]{sync}{}
\function{sync} forces slices which are backed on the map file to be
immediately written to disk. Resized or newly inserted slices are
not affected. \function{sync} only makes sense for write and
readwrite memory maps.
\end{methoddesc}
\section{MemmapSlice methods}
\label{sec:memmap-methods}
A \class{MemmapSlice} object represents a subregion of a
\class{Memmap} and has these public methods:
\begin{methoddesc}[MemmapSlice]{__buffer__}{}
Returns an object which supports the Python buffer protocol and
represents this slice. The Python buffer protocol enables a C
function to obtain the pointer and size corresponding to the data
region of the slice.
\end{methoddesc}
\begin{methoddesc}[MemmapSlice]{resize}{newsize}
\function{resize} expands or contracts this slice to the specified
\var{newsize}.
\end{methoddesc}
%% mode: LaTeX
%% mode: auto-fill
%% fill-column: 79
%% indent-tabs-mode: nil
%% ispell-dictionary: "american"
%% reftex-fref-is-default: nil
%% TeX-auto-save: t
%% TeX-command-default: "pdfeLaTeX"
%% TeX-master: "numarray"
%% TeX-parse-self: t
%% End:
|