File: memmap.tex

package info (click to toggle)
python-numarray 1.5.2-4
links: PTS
area: main
in suites: lenny
size: 8,668 kB
ctags: 11,384
sloc: ansic: 113,864; python: 22,422; makefile: 197; sh: 11
file content (280 lines) | stat: -rw-r--r-- 9,911 bytes
parent folder | download | duplicates (3)
\chapter{Memory Mapping}
\label{cha:memmap}
\declaremodule{extension}{numarray.memmap}
\index{character array}
\index{string array}

\section{Introduction}
\label{sec:memmap-intro}

\code{numarray} provides support for the creation of arrays which are
mapped directly onto files with the \code{numarray.memmap} module.
Much of \code{numarray}'s design, the ability to handle misaligned and
byteswapped arrays for instance, was motivated by the desire to create
arrays from portable files which contain binary array data.  One
advantage of memory mapping is efficient random access to small
regions of a large file: only the region of the mapped file which is
actually used in array operations needs to be paged into system
memory; the rest of the file remains unread and unwritten.

\code{numarray.memmap} is pure Python and is layered on top of
Python's \code{mmap} module.  The basic idea behind \code{numarray}'s
memory mapping is to create a ``buffer'' referring to a region in a
mapped file and to use it as the data store for an array.  The
\code{numarray.memmap} module contains two classes, one which
corresponds to an entire mapped file (\class{Memmap}) and one which
corresponds to a contiguous region within a file
(\class{MemmapSlice}).  \class{MemmapSlice} objects have these
properties:

\begin{itemize}
\item MemmapSlices can be used as NumArray buffers.
\item MemmapSlices are non-overlapping.
\item MemmapSlices are resizable.
\item Changing the size of a MemmapSlice changes the parent Memmap.
\end{itemize}

\section{Opening a Memmap}
\label{sec:memmap-open}

You can create a \class{Memmap} object by calling the \function{open} function,
as in:

\begin{verbatim}
>>> m = open("memmap.tst","w+",len=48)
>>> m
<Memmap on file 'memmap.tst' with mode='w+', length=48, 0 slices>
\end{verbatim}

Here, the file ``memmap.tst'' is created/truncated to a length of 48
bytes and used to construct a Memmap object in write mode whose
contents are considered undefined.  

\section{Slicing a Memmap}
\label{sec:memmap-slicing}

Once opened, a \class{Memmap} object can be sliced into regions.

\begin{verbatim}
# Slice m into the buffers "n" and "p" which will correspond to numarray:

>>> n = m[0:16]
>>> n
<MemmapSlice of length:16 writable>

>>> p = m[24:48]
>>> p
<MemmapSlice of length:24 writable>
\end{verbatim}

NOTE: You cannot make \emph{overlapping} slices of a Memmap:

\begin{verbatim}
>>> q = m[20:28]
Traceback (most recent call last):
...
IndexError: Slice overlaps prior slice of same file.
\end{verbatim}

Deletion of a slice is possible once all other references to it are
forgotten, e.g. all arrays that used it have themselves been deleted.
Deletion of a slice of a Memmap "un-registers" the slice, making that
region of the Memmap available for reallocation.  Delete directly from
the Memmap without referring to the MemmapSlice:

\begin{verbatim}
>>> m = Memmap("memmap.tst",mode="w+",len=100)
>>> m1 = m[0:50]
>>> del m[0:50]      # note: delete from m, not m1
>>> m2 = m[0:70]
\end{verbatim}

Note that since the region of m1 was deleted, there is no overlap when
m2 is created.  However, deleting the region of m1 has invalidated it:

\begin{verbatim}
>>> m1
Traceback (most recent call last):
...
RuntimeError: A deleted MemmapSlice has been used.
\end{verbatim}

Don't mix operations on a Memmap which modify its data or file
structure with slice deletions.  In this case, the status of the
modifications is undefined; the underlying map may or may not reflect
the modifications after the deletion.

\section{Creating an array from a MemmapSlice}
\label{sec:memmap-array-construction}

Arrays are created from \class{MemmapSlice}s simply by specifying the
slice as the \var{buffer} parameter of the array.  Since the slice is
essentially just a byte string, it's necessary to specify the
\var{type} of the binary data as well.

\begin{verbatim}
>>> a = num.NumArray(buffer=n, shape=(len(n)/4,), type=num.Int32)
>>> a[:] = 0  # Since the initial contents of 'n' are undefined.
>>> a += 1
array([1, 1, 1, 1], type=Int32)
\end{verbatim}

\section{Resizing a MemmapSlice}
\label{sec:memmap-slice}

Arrays based on \class{MemmapSlice} objects are resizable.  As soon as
they're resized, slices become un-mapped or ``free floating''.
Resizing a slice affects the parent \class{Memmap}.

\begin{verbatim}
>>> a.resize(6)
array([1, 1, 1, 1, 1, 1], type=Int32)
\end{verbatim}

\section{Forcing file updates and closing the Memmap}
\label{sec:memmap-flushing-closing}

After doing slice resizes or inserting new slices, call
\function{flush} to synchronize the underlying map file with any free
floating slices.  This explicit step is required to avoid implicitly
shuffling huge amounts of file space for every \function{resize} or
\function{insert}.  After calling \function{flush}, all slices are
once again memory mapped rather than free floating.

\begin{verbatim}
>>> m.flush()
\end{verbatim}

A related concept is ``syncing'' which applies even to arrays which
have not been resized.  Since memory maps don't guarantee when the
underlying file will be updated with the values you have written to
the map, call \function{sync} when you want to be sure your changes
are on disk.  This is similar to syncing a UNIX file system.  Note
that \function{sync} does not consolidate the mapfile with any free
floating slices (newly inserted or resized), it merely ensures that
mapped slices whose contents have been altered are written to disk.

\begin{verbatim}
>>> m.sync()
\end{verbatim}

Now "a" and "b" are both memory mapped on "memmap.tst" again.

When you're done with the memory map and numarray, call
\function{close}. \function{close} calls \function{flush} which will
consolidate resized or inserted slices as necessary.

\begin{verbatim}
>>> m.close()
\end{verbatim}

It is an error to use "m" (or slices of m) any further after closing
it.

\section{numarray.memmap functions}
\label{sec:memmap-functions}

\begin{funcdesc}{open}{filename, mode='r+', len=None}
\label{func:memmap-open}
\function{open} opens a \class{Memmap} object on the file
\var{filename} with the specified \var{mode}.  Available \var{mode}
values include 'readonly' ('r'), 'copyonwrite' ('c'), 'readwrite'
('r+'), and 'write' ('w+'), all but the last of which have contents
defined by the file.

Neither mode 'r' nor mode 'c' can affect the underlying map file.
Readonly maps impose no requirement on system swap space and raise
exceptions when their contents are modified.  Copy-on-write maps
require system swap space corresponding to their size, but have
modifiable pages which become reassociated with system swap as they
are changed leaving the original map file unaltered.  Insufficient
swap space can prevent the creation of a copy-on-write memory map.
Modifications to readwrite memory maps are eventually reflected onto
the map file;  see flushing and syncing.
\end{funcdesc}
   
\begin{funcdesc}{close}{map}
\label{func:memmap-close}
\function{close} closes the \class{Memmap} object specified by
\var{map}.
\end{funcdesc}
   
\section{Memmap methods}
\label{sec:memmap-methods}
A Memmap object represents an entire mapped file and is sliced to
create objects which can be used as array buffers.  It has these
public methods:

\begin{methoddesc}[Memmap]{close}{}
  \function{close} unites the \class{Memmap} and any RAM based slices with
  its underlying file and removes the mapping and all references to
  its slices.  Once a \class{Memmap} has been closed, all of its
  slices become unusable.
\end{methoddesc}

\begin{methoddesc}[Memmap]{find}{string, offset=0}
  find(string, offset=0) returns the first index at which string
  is found, or -1 on failure.
  \begin{verbatim}
    >>> _open("memmap.tst","w+").write("this is a test")
    >>> Memmap("memmap.tst",len=14).find("is")
    2
    >>> Memmap("memmap.tst",len=14).find("is", 3)
    5
    >>> _open("memmap.tst","w+").write("x")
    >>> Memmap("memmap.tst",len=1).find("is")
    -1
  \end{verbatim}
\end{methoddesc}

\begin{methoddesc}[Memmap]{insert}{offset, size=None, buffer=None}
  \function{insert} places a new slice at the specified \var{offset} of
  the \class{Memmap}.  \var{size} indicates the length in bytes of the
  inserted slice when \var{buffer} is not specified.  If \function{buffer}
  is specified, it should refer to an existing memory object created
  using \code{numarray.memory.new_memory} and \function{size} should not
  be specified.
\end{methoddesc}

\begin{methoddesc}[Memmap]{flush}{}
  \function{flush} writes a \class{Memmap} out to its associated file,
  reconciling any inserted or resized slices by backing them directly
  on the map file rather than a system swap file.  \function{flush}
  only makes sense for write and readwrite memory maps.
\end{methoddesc}

\begin{methoddesc}[Memmap]{sync}{}
  \function{sync} forces slices which are backed on the map file to be
  immediately written to disk.  Resized or newly inserted slices are
  not affected.  \function{sync} only makes sense for write and
  readwrite memory maps.
\end{methoddesc}

\section{MemmapSlice methods}
\label{sec:memmap-methods}
A \class{MemmapSlice} object represents a subregion of a
\class{Memmap} and has these public methods:

\begin{methoddesc}[MemmapSlice]{__buffer__}{}
  Returns an object which supports the Python buffer protocol and
  represents this slice.  The Python buffer protocol enables a C
  function to obtain the pointer and size corresponding to the data
  region of the slice.
\end{methoddesc}

\begin{methoddesc}[MemmapSlice]{resize}{newsize}
  \function{resize} expands or contracts this slice to the specified
  \var{newsize}.
\end{methoddesc}

%% mode: LaTeX
%% mode: auto-fill
%% fill-column: 79
%% indent-tabs-mode: nil
%% ispell-dictionary: "american"
%% reftex-fref-is-default: nil
%% TeX-auto-save: t
%% TeX-command-default: "pdfeLaTeX"
%% TeX-master: "numarray"
%% TeX-parse-self: t
%% End: