1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
|
ptmalloc2 - a multi-thread malloc implementation
================================================
Wolfram Gloger (wg@malloc.de)
15 Dec 2001
Introduction
============
This package is a modified version of Doug Lea's malloc-2.7.0
implementation (available seperately from ftp://g.oswego.edu/pub/misc)
that I adapted for multiple threads, while trying to avoid lock
contention as much as possible. Many thanks should go to Doug Lea
(dl@cs.oswego.edu) for the great original malloc implementation.
As part of the GNU C library, the source files are available under the
GNU Library General Public License (see the comments in the files).
But as part of this stand-alone package, the code is also available
under the (probably less restrictive) conditions described in the file
'COPYRIGHT'. In any case, there is no warranty whatsoever for this
package.
The current distribution should be available from:
http://www.malloc.de/malloc/ptmalloc2.tar.gz
Please note that ptmalloc2 is currently still in the testing phase.
For example, support for non-ANSI compilers is currently not complete.
For an implementation with a somewhat more proven record, you may look
at ptmalloc.tar.gz, which is based on Doug Lea's malloc-2.6.x and is
integrated into GNU libc until version glibc-2.2.x.
Compilation and usage
=====================
It should be possible to compile malloc.c on any UN*X-like system that
implements the sbrk(), mmap(), munmap() and mprotect() calls. If
mmap() is not available, it is only possible to produce a
non-threadsafe implementation from the source file. See the comments
in the source file for descriptions of the compile-time options.
Several thread interfaces are supported:
o Posix threads (pthreads), compile with `-DUSE_PTHREADS=1'
(and possibly with `-DUSE_TSD_DATA_HACK', see below)
o Solaris threads, compile with `-DUSE_THR=1'
o SGI sproc() threads, compile with `-DUSE_SPROC=1'
o When compiling malloc.c as part of the GNU C library,
i.e. when _LIBC is defined (no other defines necessary)
o no threads, compile without any of the above definitions
The distributed Makefile includes several targets (e.g. `solaris' for
Solaris threads, but you probably want `posix' for recent Solaris
versions) which cause malloc.c to be compiled with the appropriate
flags. The default is to compile for Posix threads. Note that some
compilers need special flags for multi-threaded code, e.g. with
Solaris cc one should use:
% make posix SYS_FLAGS='-mt'
Some additional targets, ending in `-libc', are also provided in the
Makefile, to compare performance of the test programs to the case when
linking with the standard malloc implementation in libc.
A potential problem remains: If any of the system-specific functions
for getting/setting thread-specific data or for locking a mutex call
one of the malloc-related functions internally, the implementation
cannot work at all due to infinite recursion. One example seems to be
Solaris 2.4; a workaround for thr_getspecific() has been inserted into
the thread-m.h file. I would like to hear if this problem occurs on
other systems, and whether similar workarounds could be applied.
For Posix threads, too, an optional hack like that has been integrated
(activated when defining USE_TSD_DATA_HACK) which depends on
`pthread_t' being convertible to an integral type (which is of course
not generally guaranteed). USE_TSD_DATA_HACK is now the default
because I haven't yet found a non-glibc pthreads system where this
hack is _not_ needed.
To use ptmalloc2 (i.e. when linking malloc.o into applications), no
special precautions are necessary.
On some systems, when overriding malloc and linking against shared
libraries, the link order becomes very important. E.g., when linking
C++ programs on Solaris, don't rely on libC being included by default,
but instead put `-lthread' behind `-lC' on the command line:
CC ... malloc.o -lC -lthread
This is because there are global constructors in libC that need
malloc/ptmalloc, which in turn needs to have the thread library to be
already initialized.
Debugging hooks
===============
All calls to malloc(), realloc(), free() and memalign() are routed
through the global function pointers __malloc_hook, __realloc_hook,
__free_hook and __memalign_hook if they are not NULL (see the malloc.h
header file for declarations of these pointers). Therefore the malloc
implementation can be changed at runtime, if care is taken not to call
free() or realloc() on pointers obtained with a different
implementation than the one currently in effect. (The easiest way to
guarantee this is to set up the hooks before any malloc call, e.g.
with a function pointed to by the global variable
__malloc_initialize_hook).
A useful application of the hooks is built-in into ptmalloc2: The
implementation is usually very unforgiving with respect to misuse,
such as free()ing a pointer twice or free()ing a pointer not obtained
with malloc() (these will typically crash the application
immediately). To debug in such situations, you can set the
environment variable `MALLOC_CHECK_' (note the trailing underscore).
Performance will suffer somewhat, but you will get more controlled
behaviour in the case of misuse. If MALLOC_CHECK_=0, wrong free()s
will be silently ignored, if MALLOC_CHECK_=1, diagnostics will be
printed on stderr, and if MALLOC_CHECK_=2, abort() will be called on
any error.
You can now also tune other malloc parameters (normally adjused via
mallopt() calls from the application) with environment variables:
MALLOC_TRIM_THRESHOLD_ for deciding to shrink the heap (in bytes)
MALLOC_TOP_PAD_ how much extra memory to allocate on
each system call (in bytes)
MALLOC_MMAP_THRESHOLD_ min. size for chunks allocated via
mmap() (in bytes)
MALLOC_MMAP_MAX_ max. number of mmapped regions to use
Tests
=====
Two testing applications, t-test1 and t-test2, are included in this
source distribution. Both perform pseudo-random sequences of
allocations/frees, and can be given numeric arguments (all arguments
are optional):
% t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins>
n-total = total number of threads executed (default 10)
n-parallel = number of threads running in parallel (2)
n-allocs = number of malloc()'s / free()'s per thread (10000)
size-max = max. size requested with malloc() in bytes (10000)
bins = number of bins to maintain
The first test `t-test1' maintains a completely seperate pool of
allocated bins for each thread, and should therefore show full
parallelism. On the other hand, `t-test2' creates only a single pool
of bins, and each thread randomly allocates/frees any bin. Some lock
contention is to be expected in this case, as the threads frequently
cross each others arena.
Performance results from t-test1 should be quite repeatable, while the
behaviour of t-test2 depends on scheduling variations.
Conclusion
==========
I'm always interested in performance data and feedback, just send mail
to ptmalloc@malloc.de.
Good luck!
|