1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397
|
.so macro.nrf
.nh
.ce 1
WNMEM -- A Better Memory Allocation Scheme for C/UNIX
.sp 2
.ce
Will Naylor
.sp 2
.sp 5
SYNOPSIS
.sp 2
The memory allocator described
has been used successfully in several large C/UNIX applications,
reducing paging and memory leakage, saving programmer time, and
increasing program reliability.
.sp 5
INTRODUCTION
.sp 2
.pp
Dynamic memory allocation is a big advance over fixed, compile time
memory allocation (e.g. FORTRAN) because it allows more efficient
and flexible use of a process's memory. However, dynamic
memory allocation as usually provided (ie. the
.bd new(p)
interface of PASCAL and the
.bd malloc
interface of C/UNIX)
has many defects, which will be described below.
.pp
The discussion of this paper will
refer to the C/UNIX malloc interface.
Briefly, UNIX provides two functions for memory allocation:
.bd malloc(size)
returns a pointer to a memory block of size bytes, and
.bd free(ptr)
frees a block of storage (pointed to by ptr) that was allocated by
.bd malloc.
Obviously,
.bd free
needs to know the size of the memory block it is freeing; this
is stored in an integer-sized block of storage before the
block allocated by
.bd malloc.
Freed blocks are put on a sorted free list.
.bd Malloc
searches this list
to find a block that is big enough to satisfy its request.
This arrangement has several problems:
.lh
1) time efficiency:
.lb
Time to search the free list when
.bd malloc
is called and insert into the free list when
.bd free
is called can be prohibitive if there are many small blocks. Page faults
caused by this search in a large virtual environment make this problem
worse. This problem has been fixed in the new
Berkeley 4.2 release, but this "fix" has made space efficiency worse.
.le
.lh
2) space efficiency:
.lb
Every block of allocated storage comes with an integer-sized
size field, which is pure overhead.
If the actual storage requested is small, this gets
expensive.
.le
.lh
3) fragmentation:
.lb
The policy of always grabbing
the closest-fitting block off the
free list to satisfy a request will cause an accumulation of
small, possibly unuseable blocks on the free list. This will
further waste space and time.
.le
.lh
4) scattering:
.lb
No attempt is made to place memory that is accessed together on the same
page; instead the best-fit policy "scatters" memory throughout the page
space in order to satisfy the best-fit objective. The client application
thus suffers more page faults and runs more slowly than it might. Note that
the malloc interface doesn't receive enough information about memory access
locality to remedy this situation.
.le
.lh
5) lack of safety:
.lb
It is possible to free storage twice, or free storage that was never
allocated. This can result in
.bd malloc
giving out the same storage twice, in a random, maybe even
non-repeatible fashion.
This type of bug
is extremely difficult to track down.
.le
.lh
6) leakage and complexity in freeing:
.lb
Very often one wants to free all at once a large structure that is held
together by random pointers going this way and that. Examples of such
structures are
linked lists or trees. A user-written, often quite complex free
routine must traverse this structure, chasing random pointers and freeing
random blocks. There is no way to know if this routine in fact frees
the entire structure. The observation that the free routine
does not "bomb out" does not guarantee that the entire structure has
been freed. The program will appear to work, yet it will eat up
all of memory because it is "leaking" memory.
.le
.lh
7) lack of development tools
.lb
There exists no way to know where memory is going or how much is being used.
.le
.pp
The UNIX dynamic memory allocation scheme does have
one advantage: simplicity. It is easy to remember the two routines
and their parameters. The memory allocation scheme described below
sacrifices this simplicity to gain on the other 7 fronts
mentioned above.
.sp 5
DISCUSSION OF PREVIOUS WORK
.sp 2
.pp
Most previous work has focused on making the above described scheme
more efficient.
The literature contains
a number of discussions of garbage collection,
a number of discussions of first-fit and best-fit ([1],[6],[9]),
a few strange new algorithms([2],[3],[4]).
A massive study of 35 allocation algorithms in 18 environments [8]
showed that keeping separate free lists for different block sizes outperforms
the other algorithms, including those discussed above. Also outperformed
were LIFO and FIFO ordered free list, and buddy system
algorithms. The multiple free list approach ran as fast or faster than the
other algorithms (except buddy system) and used memory more efficiently (buddy
system wasted the most).
Various multiple free list algorithms were tested; according to this
study their performances are similar.
.pp
The new Berkeley UNIX release uses an algorithm that keeps
free lists for ranges of sizes rather for each size.
I am using a separate free list algorithm for my
general-free allocator (see below).
.pp
All of the above discussions imply that algorithm "efficiency" means
using as little address space as possible and fast allocator execution.
I think these are the wrong goals. Today, address space is cheap.
Allocator execution time is usually only a very small fraction of program
execution time.
What is expensive is page faults. Reducing address space usage can have the
side effect of reducing page faults, but it would be better to sacrifice
address space usage to further reduce page faults.
.pp
The literature
contains little discussion
of reducing page faults.
The literature also contains little discussion of
safety and ease of programming and debugging.
.sp 5
SYSTEM OVERVIEW
.sp 2
.pp
The following is an overview of the wnmem system:
.lh
Hierarchical grouping of allocated memory.
.lb
All memory allocated is "in" a memory group. Memory groups
are "in" other memory groups. Memory groups can be allocated
and freed. When memory or a memory group is allocated, it is allocated
from the "current" memory group.
When a memory group is freed, all member
memory and memory groups are freed. This feature solves the memory
leakage problem cleanly and allows clustering of memory to avoid page
faults.
Blocks of memory in the same group will probably be accessed together;
therefore they should be on the same page(s) to avoid page faults.
The allocator works by grabbing a (page sized) chunk of memory and handing out
pieces of it, thus memory in the same group will be on the same pages.
.le
.lh
Stack of memory groups
.lb
wnmem maintains a stack of memory
groups; the top of this stack is called the "current" memory group.
The typical subsystem pushes a (possibly newly created)
memory group upon entry, runs using
this group, and then pops the memory group upon exit (possibly freeing it also).
Inside the subsystem, "wn_alloc" and "wn_free" calls appear in place of
"malloc" and "free".
Subsystems needing only temporary memory create and push a group upon entry
and then free and pop the group upon exit. Inside, calls to "wn_alloc"
appear whenever the subsystem needs memory. No "wn_free" calls need appear,
because all memory allocated within the subsystem will be freed upon exit.
This saves the programmer the effort of keeping track of individual
memory blocks and freeing them.
.le
.lh
Ability to restrict memory freeing within groups.
.lb
When creating a memory group, it is be possible to ban freeing of
individual memory blocks in
that group.
Note that in many cases, one wants to allocate
a huge structure, figure something out using that structure,
and then free the huge structure.
The ability to
free individual memory pieces is not necessary.
This restriction ends a lot of hassles. No free list to manage,
saving memory and time overhead. No sizes attached to memory blocks,
saving memory overhead.
This saves both time and memory; time usage is reduced by
about 75%, and memory usage by about 15%.
Of course, if the user doesn't restrict freeing,
he gets none of these benefits. But in most of the cases I've come
across, banning individual memory block freeing costs little.
.le
.lh
Better error handling.
.lb
When a memory request cannot be satisfied because memory is nearly
exhausted, a
(user specified) subroutine is called.
Normally, this routine prints an informative error message and exits.
This scheme has several advantages over simply returning NULL when
out of memory, as malloc does.
It makes checking for NULL return
from the allocator unnecessary. It centralizes and simplifies handling the
out-of-memory problem.
.le
.lh
Debugging and development tools.
.lb
Counts of the total memory used and memory used by each group
are printed on demand.
This set of tools is similar in style to the
PROFILE program of UNIX: PROFILE reports where a program spends its time,
these tools report where a program spends its memory.
Tools are provided to trap common application program bugs, such as
array overflows and wild stores, use of unitialized memory, use
of freed memory, freeing memory twice or freeing unallocated memory.
.le
.pp
A precise specification of the calling interface is provided
in Appendix A, which shows the wnmem and wnmemd man pages.
.sp 5
CONCLUSIONS
.sp 2
.pp
This memory allocator has been successfully used in several large
C applications, including a schematic editor, a digital simulator,
an automatic gate array
place and route system, and an analog circuit designer.
In all cases, the results have been dramatic. A large (60,000 line)
interactive program with 1000 wn_alloc calls leaked no memory --
first time, with no debugging to plug leaks. Installation of this
allocator in a large graphics program reduced leakage from
100K per redisplay to
0, with only 1 day's work. Installation into a large, very slow
optimization program reduced paging by 80%.
Use of the debugging features on a 30,000 line
application
(shipped and supposedly working)
trapped 15 memory bugs in only 2 hours debugging time.
Presumeably, these bugs had been causing sporadic crashes
in the field, but the causes had been too elusive to track down and fix.
.pp
It is my impression that much time, both human and machine, is
needlessly wasted because of memory allocation problems.
Intelligent use of wnmem overcomes most of these difficulties;
I therefore hope that wnmem or a memory allocator like it
will soon begin replacing malloc(3) as the workhorse memory
allocator for most C/UNIX applications.
.sp 5
ACKNOWLEDGEMENTS
.sp 2
.pp
I would like to thank the many friends and work associates
who suggested many of the ideas contained
in this paper.
Doug Boyle first called my attention to some of the problems which this memory
allocator solves.
Discussions with Bill Jensen helped clarify in my mind
the overall philosophy of this memory allocator.
Doug Boyle proposed including
a tool to detect and trace memory leakage.
Mark Brown proposed checking for various corruptions, such as freeing
memory twice or freeing unallocated memory.
Mark Brown also proposed
calling an error subroutine when memory is about to run out, rather than
just returning a "no memory left" indicator.
James Bohannon brought many problems to my attention which suggested
many of the other debugging tools.
.pp
I would also like to thank the development groups at the various companies,
especially LSI Logic and Valid Logic Systems, who
endured the unfinished versions of this allocator and
suggested various improvements.
.ce 1
References Consulted
.rf [1]
Aho, A. V., Hopcroft, J. E., and Ullman, J. D. [1983].
Data Structures and Algorithms, Chap. 12, Addison-Wesley,
Reading, Mass.
.rf [2]
Beck, Leland L. [1979]. A Performance Study of the Age-Match Dynamic Storage
Allocation Technique.
.rf [3]
Beck, Leland L. [1979]. The Release-Match Method for Dynamic Storage
Allocation.
.rf [4]
Brent, R. P. [1981]. Efficient Implementation of the First-Fit Strategy for
Dynamic Storage Allocation.
.rf [5]
Kernighan, B. W. and D. M. Ritchie [1978]. The C Programming Language,
Prentice-Hall, Inc., Englewood Cliffs, NJ 07632.
.rf [6]
Knuth, D. E. THE ART OF COMPUTER PROGRAMMING, VOL. 1:
FUNDAMENTAL ALGORITHMS, Addison-Wesley, Reading, Mass., 1973, Section 2.5.
.rf [7]
Knuth, D. E. THE ART OF COMPUTER PROGRAMMING, VOL. 3:
SORTING AND SEARCHING, Addison-Wesley, Reading, Mass., 1973.
.rf [8]
Nielsen, N. R. Dynamic memory allocation in computer simulation.
COMMUNICATIONS OF THE ACM 20, 11 (November 1977), 864-873.
.rf [9]
Shore, J. E. Anomalous behavior of the fifty-percent rule in dynamic
memory allocation. COMMUNICATIONS OF THE ACM 20, 11 (November 1977), 812-820.
|