File: README

package info (click to toggle)
lam 7.1.4-8
  • links: PTS
  • area: main
  • in suites: forky, sid
  • size: 56,404 kB
  • sloc: ansic: 156,541; sh: 9,991; cpp: 7,699; makefile: 5,621; perl: 488; fortran: 260; asm: 83
file content (192 lines) | stat: -rw-r--r-- 8,336 bytes parent folder | download | duplicates (24)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
ptmalloc - a multi-thread malloc implementation
===============================================

Wolfram Gloger (wg@malloc.de)

19 Dec 1999


Introduction
============

ptmalloc.c is a modified version of Doug Lea's malloc-2.6.4
implementation (available seperately from ftp://g.oswego.edu/pub/misc)
that I adapted for multiple threads, while trying to avoid lock
contention as much as possible.  Many thanks should go to Doug Lea
(dl@cs.oswego.edu) for the great original malloc implementation.

As part of the GNU C library, the source files are available under the
GNU Library General Public License (see the comments in the files).
But as part of this stand-alone package, the code is available under
the (probably less restrictive) conditions described in the file
`COPYRIGHT'.  In any case, there is no warranty whatsoever for this
package.

Compilation and usage
=====================

It should be possible to compile ptmalloc.c on any UN*X-like system
that implements the sbrk(), mmap(), munmap() and mprotect() calls.  If
mmap() is not available, it is only possible to produce a
non-threadsafe implementation from the source file.  See the comments
in the source file for descriptions of the compile-time options.
Several thread interfaces are supported:

 o Posix threads (pthreads), compile with `-DUSE_PTHREADS=1'
   (and possibly with `-DUSE_TSD_DATA_HACK', see below)
 o Solaris threads, compile with `-DUSE_THR=1'
 o SGI sproc() threads, compile with `-DUSE_SPROC=1'
 o When compiling ptmalloc.c as part of the GNU C library,
   i.e. when _LIBC is defined (no other defines necessary)
 o no threads, compile without any of the above definitions

The distributed Makefile includes several targets (e.g. `solaris' for
Solaris threads, but you probably want `posix' for recent Solaris
versions) which cause ptmalloc.c to be compiled with the appropriate
flags.  The default is to compile for Posix threads.  Some additional
targets, ending in `-libc', are also provided, to compare performance
of the test programs to the case when linking with the standard malloc
implementation in libc.

A potential problem remains: If any of the system-specific functions
for getting/setting thread-specific data or for locking a mutex call
one of the malloc-related functions internally, the implementation
cannot work at all due to infinite recursion.  One example seems to be
Solaris 2.4; a workaround for thr_getspecific() has been inserted into
the thread-m.h file.  I would like to hear if this problem occurs on
other systems, and whether similar workarounds could be applied.

For Posix threads, too, an optional hack like that has been integrated
(activated when defining USE_TSD_DATA_HACK) which depends on
`pthread_t' being convertible to an integral type (which is of course
not generally guaranteed).  USE_TSD_DATA_HACK is now the default
because I haven't yet found a non-glibc pthreads system where this
hack is _not_ needed.

To use ptmalloc (i.e. when linking ptmalloc.o into applications), no
special precautions are necessary except calling an initialization
routine, ptmalloc_init(), once before the first call to malloc() (or
calloc(), etc.).  This call happens automatically when:

 o compiling ptmalloc with MALLOC_HOOKS defined (this is the default
   when using the supplied Makefile)
 o using the GNU C library

So in any of these cases, you can omit the explicit ptmalloc_init()
call from applications using ptmalloc.o.

On some systems, when overriding malloc and linking against shared
libraries, the link order becomes very important.  E.g., when linking
C++ programs on Solaris, don't rely on libC being included by default,
but instead put `-lthread' behind `-lC' on the command line:

  CC ... ptmalloc.o -lC -lthread

This is because there are global constructors in libC that need
malloc/ptmalloc, which in turn needs to have the thread library to be
already initialized.

Debugging hooks
===============

When the ptmalloc.c source is compiled with MALLOC_HOOKS defined (this
is recommended), all calls to malloc(), realloc(), free() and
memalign() are routed through the global function pointers
__malloc_hook, __realloc_hook, __free_hook and __memalign_hook if they
are not NULL (see the ptmalloc.h header file for declarations of these
pointers).  Therefore the malloc implementation can be changed at
runtime, if care is taken not to call free() or realloc() on pointers
obtained with a different implementation than the one currently in
effect.  (The easiest way to guarantee this is to set up the hooks
before any malloc call, e.g.  with a function pointed to by the global
variable __malloc_initialize_hook).

A useful application of the hooks is built-in into ptmalloc: The
implementation is usually very unforgiving with respect to misuse,
such as free()ing a pointer twice or free()ing a pointer not obtained
with malloc() (these will typically crash the application
immediately).  To debug in such situations, you can set the
environment variable `MALLOC_CHECK_' (note the trailing underscore).
Performance will suffer somewhat, but you will get more controlled
behaviour in the case of misuse.  If MALLOC_CHECK_=0, wrong free()s
will be silently ignored, if MALLOC_CHECK_=1, diagnostics will be
printed on stderr, and if MALLOC_CHECK_=2, abort() will be called on
any error.

You can now also tune other malloc parameters (normally adjused via
mallopt() calls from the application) with environment variables:

    MALLOC_TRIM_THRESHOLD_    for deciding to shrink the heap (in bytes)

    MALLOC_TOP_PAD_           how much extra memory to allocate on
                              each system call (in bytes)

    MALLOC_MMAP_THRESHOLD_    min. size for chunks allocated via
                              mmap() (in bytes)

    MALLOC_MMAP_MAX_          max. number of mmapped regions to use

Tests
=====

Two testing applications, t-test1 and t-test2, are included in this
source distribution.  Both perform pseudo-random sequences of
allocations/frees, and can be given numeric arguments (all arguments
are optional):

% t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins>

    n-total = total number of threads executed (default 10)
    n-parallel = number of threads running in parallel (2)
    n-allocs = number of malloc()'s / free()'s per thread (10000)
    size-max = max. size requested with malloc() in bytes (10000)
    bins = number of bins to maintain

The first test `t-test1' maintains a completely seperate pool of
allocated bins for each thread, and should therefore show full
parallelism.  On the other hand, `t-test2' creates only a single pool
of bins, and each thread randomly allocates/frees any bin.  Some lock
contention is to be expected in this case, as the threads frequently
cross each others arena.

Performance results from t-test1 should be quite repeatable, while the
behaviour of t-test2 depends on scheduling variations.

Some performance data from t-test1
==================================

The times given are complete program execution times, obtained with
`time t-test1 ...'.

1. SGI Octane, one R12000 300MHz CPU, Irix 6.5, `sproc' threads:

20 threads (4 in parallel), 3000000 malloc calls per thread, max. size
5000 bytes, 5000 bins:

ptmalloc:                       malloc from libc:
real    3m0.521s                real    30m27.240s
user    2m45.336s               user    10m7.592s
sys     0m3.014s                sys     17m6.502s

2. Same as 1., but with POSIX threads:

20 threads (4 in parallel), 3000000 malloc calls per thread, max. size
5000 bytes, 5000 bins:

ptmalloc:                       malloc from libc:
real    3m10.667s               real    5m51.588s
user    2m57.052s               user    5m27.986s
sys     0m2.399s                sys     0m3.098s

(Comparing the two ptmalloc results probably shows the slight
performance penalty from having to compile with USE_TSD_DATA_HACK when
using pthreads on Irix.)

Special section on use of ptmalloc with Linux
=============================================

On Linux, ptmalloc should work with the libpthreads library that is
included with Linux libc-5.x (but this is untested).  Thanks to the
efforts of H.J. Lu and Ulrich Drepper, it is now an integral part of
the GNU C library 2.x releases (libc-6.x), so you don't need to
compile and link ptmalloc.o with glibc.