File: memory_paper.nrf

package info (click to toggle)
libwn6 6.0-12
  • links: PTS
  • area: main
  • in suites: woody
  • size: 6,004 kB
  • ctags: 3,903
  • sloc: ansic: 45,078; makefile: 958; csh: 274; sh: 26
file content (397 lines) | stat: -rwxr-xr-x 12,733 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
.so macro.nrf
.nh
.ce 1
WNMEM -- A Better Memory Allocation Scheme for C/UNIX
.sp 2
.ce
Will Naylor
.sp 2
.sp 5
SYNOPSIS
.sp 2

The memory allocator described
has been used successfully in several large C/UNIX applications,
reducing paging and memory leakage, saving programmer time, and
increasing program reliability. 

.sp 5
INTRODUCTION
.sp 2

.pp
Dynamic memory allocation is a big advance over fixed, compile time
memory allocation (e.g. FORTRAN) because it allows more efficient
and flexible use of a process's memory.  However, dynamic
memory allocation as usually provided (ie. the 
.bd new(p)
interface of PASCAL and the
.bd malloc
interface of C/UNIX)
has many defects, which will be described below.

.pp
The discussion of this paper will
refer to the C/UNIX malloc interface.
Briefly, UNIX provides two functions for memory allocation:
.bd malloc(size)
returns a pointer to a memory block of size bytes, and 
.bd free(ptr)
frees a block of storage (pointed to by ptr) that was allocated by 
.bd malloc.
Obviously, 
.bd free
needs to know the size of the memory block it is freeing; this
is stored in an integer-sized block of storage before the
block allocated by
.bd malloc.
Freed blocks are put on a sorted free list.
.bd Malloc
searches this list
to find a block that is big enough to satisfy its request.
This arrangement has several problems:


.lh
1) time efficiency:

.lb
Time to search the free list when 
.bd malloc
is called and insert into the free list when
.bd free
is called can be prohibitive if there are many small blocks.  Page faults
caused by this search in a large virtual environment make this problem
worse.  This problem has been fixed in the new
Berkeley 4.2 release, but this "fix" has made space efficiency worse.
.le

.lh
2) space efficiency:

.lb
Every block of allocated storage comes with an integer-sized
size field, which is pure overhead.
If the actual storage requested is small, this gets
expensive.  
.le

.lh
3) fragmentation:

.lb
The policy of always grabbing
the closest-fitting block off the
free list to satisfy a request will cause an accumulation of
small, possibly unuseable blocks on the free list.  This will
further waste space and time.
.le

.lh
4) scattering:

.lb
No attempt is made to place memory that is accessed together on the same
page; instead the best-fit policy "scatters" memory throughout the page
space in order to satisfy the best-fit objective.  The client application
thus suffers more page faults and runs more slowly than it might.  Note that
the malloc interface doesn't receive enough information about memory access
locality to remedy this situation.
.le

.lh
5) lack of safety:

.lb
It is possible to free storage twice, or free storage that was never
allocated.  This can result in 
.bd malloc
giving out the same storage twice, in a random, maybe even
non-repeatible fashion.
This type of bug
is extremely difficult to track down.
.le

.lh
6) leakage and complexity in freeing:

.lb
Very often one wants to free all at once a large structure that is held
together by random pointers going this way and that.  Examples of such
structures are
linked lists or trees.  A user-written, often quite complex free
routine must traverse this structure, chasing random pointers and freeing
random blocks.  There is no way to know if this routine in fact frees
the entire structure.  The observation that the free routine
does not "bomb out" does not guarantee that the entire structure has
been freed.  The program will appear to work, yet it will eat up 
all of memory because it is "leaking" memory.
.le

.lh
7) lack of development tools

.lb 
There exists no way to know where memory is going or how much is being used.
.le


.pp
The UNIX dynamic memory allocation scheme does have
one advantage: simplicity.  It is easy to remember the two routines
and their parameters.  The memory allocation scheme described below
sacrifices this simplicity to gain on the other 7 fronts 
mentioned above.
.sp 5
DISCUSSION OF PREVIOUS WORK
.sp 2
.pp 
Most previous work has focused on making the above described scheme
more efficient.
The literature contains
a number of discussions of garbage collection,
a number of discussions of first-fit and best-fit ([1],[6],[9]),
a few strange new algorithms([2],[3],[4]).
A massive study of 35 allocation algorithms in 18 environments [8]
showed that keeping separate free lists for different block sizes outperforms
the other algorithms, including those discussed above.  Also outperformed
were LIFO and FIFO ordered free list, and buddy system 
algorithms.  The multiple free list approach ran as fast or faster than the
other algorithms (except buddy system) and used memory more efficiently (buddy
system wasted the most).
Various multiple free list algorithms were tested; according to this
study their performances are similar.

.pp
The new Berkeley UNIX release uses an algorithm that keeps
free lists for ranges of sizes rather for each size.
I am using a separate free list algorithm for my
general-free allocator (see below).

.pp
All of the above discussions imply that algorithm "efficiency" means
using as little address space as possible and fast allocator execution.
I think these are the wrong goals.  Today, address space is cheap.
Allocator execution time is usually only a very small fraction of program
execution time.
What is expensive is page faults.  Reducing address space usage can have the
side effect of reducing page faults, but it would be better to sacrifice
address space usage to further reduce page faults.

.pp
The literature
contains little discussion
of reducing page faults.
The literature also contains little discussion of 
safety and ease of programming and debugging.

.sp 5
SYSTEM OVERVIEW
.sp 2

.pp
The following is an overview of the wnmem system:


.lh
Hierarchical grouping of allocated memory.

.lb
All memory allocated is "in" a memory group.  Memory groups
are "in" other memory groups.  Memory groups can be allocated
and freed.  When memory or a memory group is allocated, it is allocated
from the "current" memory group.
When a memory group is freed, all member
memory and memory groups are freed.  This feature solves the memory
leakage problem cleanly and allows clustering of memory to avoid page
faults.
Blocks of memory in the same group will probably be accessed together;
therefore they should be on the same page(s) to avoid page faults.  
The allocator works by grabbing a (page sized) chunk of memory and handing out
pieces of it, thus memory in the same group will be on the same pages.  
.le

.lh 
Stack of memory groups

.lb
wnmem maintains a stack of memory
groups; the top of this stack is called the "current" memory group.
The typical subsystem pushes a (possibly newly created)
memory group upon entry, runs using
this group, and then pops the memory group upon exit (possibly freeing it also).
Inside the subsystem, "wn_alloc" and "wn_free" calls appear in place of
"malloc" and "free".

Subsystems needing only temporary memory create and push a group upon entry
and then free and pop the group upon exit.  Inside, calls to "wn_alloc"
appear whenever the subsystem needs memory.  No "wn_free" calls need appear, 
because all memory allocated within the subsystem will be freed upon exit.
This saves the programmer the effort of keeping track of individual
memory blocks and freeing them.
.le

.lh
Ability to restrict memory freeing within groups.

.lb
When creating a memory group, it is be possible to ban freeing of
individual memory blocks in 
that group.
Note that in many cases, one wants to allocate
a huge structure, figure something out using that structure,
and then free the huge structure.
The ability to
free individual memory pieces is not necessary.
This restriction ends a lot of hassles.  No free list to manage,
saving memory and time overhead.  No sizes attached to memory blocks,
saving memory overhead.  
This saves both time and memory; time usage is reduced by
about 75%, and memory usage by about 15%.

Of course, if the user doesn't restrict freeing,
he gets none of these benefits.  But in most of the cases I've come
across, banning individual memory block freeing costs little.
.le

.lh
Better error handling.

.lb
When a memory request cannot be satisfied because memory is nearly
exhausted, a 
(user specified) subroutine is called. 
Normally, this routine prints an informative error message and exits.
This scheme has several advantages over simply returning NULL when
out of memory, as malloc does.
It makes checking for NULL return
from the allocator unnecessary.  It centralizes and simplifies handling the
out-of-memory problem.
.le

.lh
Debugging and development tools.

.lb 
Counts of the total memory used and memory used by each group
are printed on demand.
This set of tools is similar in style to the
PROFILE program of UNIX:  PROFILE reports where a program spends its time,
these tools report where a program spends its memory.

Tools are provided to trap common application program bugs, such as
array overflows and wild stores, use of unitialized memory, use
of freed memory, freeing memory twice or freeing unallocated memory.
.le


.pp
A precise specification of the calling interface is provided
in Appendix A, which shows the wnmem and wnmemd man pages.

.sp 5
CONCLUSIONS
.sp 2

.pp
This memory allocator has been successfully used in several large
C applications, including a schematic editor, a digital simulator, 
an automatic gate array
place and route system, and an analog circuit designer.
In all cases, the results have been dramatic.  A large (60,000 line)
interactive program with 1000 wn_alloc calls leaked no memory -- 
first time, with no debugging to plug leaks.  Installation of this
allocator in a large graphics program reduced leakage from
100K per redisplay to
0, with only 1 day's work.  Installation into a large, very slow
optimization program reduced paging by 80%.  
Use of the debugging features on a 30,000 line 
application
(shipped and supposedly working)
trapped 15 memory bugs in only 2 hours debugging time.
Presumeably, these bugs had been causing sporadic crashes
in the field, but the causes had been too elusive to track down and fix.

.pp 
It is my impression that much time, both human and machine, is
needlessly wasted because of memory allocation problems.  
Intelligent use of wnmem overcomes most of these difficulties;
I therefore hope that wnmem or a memory allocator like it
will soon begin replacing malloc(3) as the workhorse memory
allocator for most C/UNIX applications.

.sp 5
ACKNOWLEDGEMENTS
.sp 2

.pp
I would like to thank the many friends and work associates
who suggested many of the ideas contained
in this paper.
Doug Boyle first called my attention to some of the problems which this memory
allocator solves.
Discussions with Bill Jensen helped clarify in my mind
the overall philosophy of this memory allocator.
Doug Boyle proposed including
a tool to detect and trace memory leakage.
Mark Brown proposed checking for various corruptions, such as freeing
memory twice or freeing unallocated memory.
Mark Brown also proposed
calling an error subroutine when memory is about to run out, rather than
just returning a "no memory left" indicator.
James Bohannon brought many problems to my attention which suggested
many of the other debugging tools.

.pp 
I would also like to thank the development groups at the various companies,
especially LSI Logic and Valid Logic Systems, who 
endured the unfinished versions of this allocator and 
suggested various improvements.





.ce 1
References Consulted



.rf [1]
Aho, A. V., Hopcroft, J. E., and Ullman, J. D. [1983].
Data Structures and Algorithms, Chap. 12, Addison-Wesley,
Reading, Mass.

.rf [2]
Beck, Leland L. [1979].  A Performance Study of the Age-Match Dynamic Storage
Allocation Technique.

.rf [3]
Beck, Leland L. [1979].  The Release-Match Method for Dynamic Storage
Allocation.

.rf [4]
Brent, R. P. [1981].  Efficient Implementation of the First-Fit Strategy for
Dynamic Storage Allocation.

.rf [5]
Kernighan, B. W. and D. M. Ritchie [1978].  The C Programming Language,
Prentice-Hall, Inc., Englewood Cliffs, NJ 07632.

.rf [6]
Knuth, D. E.  THE ART OF COMPUTER PROGRAMMING, VOL. 1: 
FUNDAMENTAL ALGORITHMS, Addison-Wesley, Reading, Mass., 1973, Section 2.5.

.rf [7]
Knuth, D. E.  THE ART OF COMPUTER PROGRAMMING, VOL. 3: 
SORTING AND SEARCHING, Addison-Wesley, Reading, Mass., 1973.

.rf [8]
Nielsen, N. R.  Dynamic memory allocation in computer simulation.
COMMUNICATIONS OF THE ACM 20, 11 (November 1977), 864-873.

.rf [9]
Shore, J. E.  Anomalous behavior of the fifty-percent rule in dynamic
memory allocation.  COMMUNICATIONS OF THE ACM 20, 11 (November 1977), 812-820.