1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
|
Miscellaneous notes on the malloc internals.
First fit malloc with boundary tags for fast freeing - based on the
techniques described in 'The Art of Computer Programming' by Donald
Knuth. The essential features of the boundary tag are described in
Algorithm C, Section 2.5. The actual algorithms here incorporate the
improvements suggested in Exercises 12 and and 15 (I think), and
adaptations to the C/Un*x environment, plus some performance
improvements. By keeping the list circular, we don't have an AVAIL at
all - just use ROVER as the pointer into the list. It also tries to
"preserve the wilderness" -- i.e. try to keep the block nearest the
data break free as far as possible, but it does go overboard about it.
(i.e. it is a simple first fit -- many wilderness preservation schemes
tend to keep the wilderness block out of the free list, and search the
rest of the free list first. I find that degrades performance
somewhat, even if it keeps memory slightly less fragmented)
This has a lot of debugging and tracing support built in - compiling
with -DDEBUG provides fairly comprehensive ASSERT protection to detect
heap corruption. (It aborts on the first sign of trouble). I hope that
every pointer reference is protected with an assertion, so the malloc
should never dump core or behave weirdly because of user program
misbehaviour -- it should blow an assertion instead. If the debugging
version ever gets a segmentation fault, bus error and dumps core
instead of blowing an assertion and then dumping core on an IO trap or
Illegal Instruction (depending on how your system does an abort()), I
want to know...
For the non-debugging malloc:
A minimum size for an allocated block is 4 units - two for the
leading and trailing size word, and two for the next and previous
pointers. Therefore the minimum size that we can allocate is 2
units, since allocated and freed blocks both have the leading and
trailing size blocks. Only freed blocks have the next, prev
pointers. The size block is actually a signed - the sign bit
indicates whether it is a free or allocated block. i.e the tag.
For the debugging malloc:
The minimum size for an allocated block is still 4 units - two for the
leading and trailing size word, and two for the next and previous
pointers. But the minimum size that we can allocate is 1 units, since
allocated and freed blocks both have the leading and trailing size
blocks. Only freed blocks have the next, prev pointers. Only allocated
blocks have the realsize word, which indicates the size of the block
requested by the user in bytes. The next pointer and the realsize word
share the same header word. So the overhead for an allocated block is
3 words when debugging, as opposed to 2 words when not debugging.
(even though the minimum block size is still 4 words)
The size block is actually a signed - the sign bit indicates whether
it is a free or allocated block. i.e the tag. Since the size is in
words, this loss of a bit is no big deal. The realsize block is a full
unsigned word, since it stores the size in bytes.
For the leak tracing malloc
The _*.c files all have the extra macros that record the blocks as
they are allocated, and deleted. The extra tracing macro also prints
the file name and line number before the usual tracing information
about size etc. This permits you to find out where block xxx was
freed, so that you can do an address to name mapping more easily.
Note that to use this stuff, you HAVE to include malloc.h and define
MALLOC_TRACE when compiling. It still works correctly when you don't
include malloc.h, but you won't get all the information.
Performance
The fastest malloc I know of is the BSD4.3 malloc, which uses a number
of bins with different sized blocks, sized in powers of two. All it
has to do to find a free block is an iterated shift to compute the
bin, if the bin is empty then split a block from one of the higher
bins. Unfortunately, it can waste spectacular amounts of space for
many programs.
The most space efficient malloc I know is Sun's Cartesian tree "better
fit" malloc (It has a hidden overhead of 8K of static data, though).
It is also unpleasantly slow.
My malloc is close to the BSD4.3 malloc in performance - the
difference starts to become obvious only as the number of blocks in
the free list climbs towards the 100000 to million range at which
point the 4.3 malloc starts to pull ahead in user time, but quite
often, its VM behaviour deteriorates and makes its elapsed times very
large. My malloc is close to Sun's malloc in space efficiency.
The key to the improved performance over typical first fit allocators
is the roving pointer, as well as the coalescing of freed blocks,
which keeps memory fragmentation and the size of the free list down.
The wilderness preservation heuristic also helps a lot.
Guide to porting it:
Should port without trouble to machines where any ugliness or
weirdness in pointers or segmentation is hidden from the programmer. I
haven't tried porting it to the 286 and below under DOS, nor do I
intend to at present.
You may need to check align.h if your machine has non-standard
alignment requirements. (I interpret standard alignment to be
alignenment on one of char, int, char *, int *, char **, int **,
whichever requires the strictest alignment) If not, define ALIGN and
NALIGN.
PLEASE LEAVE THE MAIN CODE FREE FROM #ifdefs. I'm interested in fixes
for porting it to new machines, only if they do not damage the code.
I'd prefer to find a least common denominator, or hide the difference
in a macro.
People interested in reading and understanding this code should start
in defs.h (after this file, of course) -- it contains all the macros.
It'll help if you've read the following references. After that, study
globals.c, malloc.c, verify.c and dumpheap.c. Running testmalloc and
following the development of the heap is recommended.
After you've ported it, you should be able to run testmalloc without
trouble, as well as the few runs of simumalloc listed in regress,
compiled with the debugging version of malloc. (-DDEBUG defined)
If you uncover a bug, please enhance testmalloc.c with a case that
triggers the bug, if you can. That way, I can make sure future changes
to malloc don't re-introduce that bug.
References:
%A Donald E. Knuth
%T The Art of Computer Programming (Fundamental Algorithms)
%V 1
%I Addison Wesley
%C Reading, Mass.
%D 1973
%A David G. Korn
%A Kiem-Phong Vo
%T In search of a better malloc
%J Proceedings of the USENIX 1985 Summer Conference
%C Portland, Oregon
%D June 1985
%P 489-506
%A C.J. Stephenson
%T Fast fits: new methods for dynamic storage allocation
%J Proceedings of the Ninth ACM Symposium on Operating Systems Principles
%C Bretton Woods, New Hampshire
%D October 1983
%P 30-32
%O abstract only - paper to be published in TOCS
%O published as Operating Systems Review 17:5
%O also appeared in Scientific American somewhere -- reference in Korn and Vo
|