File: NEW_DESIGN

package info (click to toggle)
njamd 0.9.3pre2-4
links: PTS
area: main
in suites: woody
size: 1,704 kB
ctags: 1,056
sloc: ansic: 9,367; sh: 7,921; makefile: 121; perl: 52
file content (336 lines) | stat: -rw-r--r-- 20,816 bytes
The old NJAMD design can be found in OLD_DESIGN. 

What follows is a description of the new NJAMD design. Have a look over      
it while browsing the doxygen docs and source code at                           
http://fscked.org/proj/njamd/docs/internal/html. The .h file                    
documentation is probably your best bet to learn about each object.             
After this long tirade, you'll find some suggestions for ideas to code.         
Don't worry, I won't be working on any of them. You guys will have free         
reign for the rest of the summer. Conversely, this means that if no one         
does anything, my rewrite will have been for naught, and NJAMD may die.         
                                                                                
So why the rewrite?  In the course of developing NJAMD, I started to            
notice that adding new features was becoming exceedingly difficult. The         
code was beginning to feel 'dirty.' Data structures were starting to get        
intertwined, there were hundreds of global variables, and code wasn't           
being reused. I had 3 stack implementations, and 2 table                        
implementations, and few people were able to contribute substantially to        
the library itself. So I decided to take a leap of faith. I decided to          
rewrite NJAMD, focusing more on making a clean, compartmentalized design        
than localized optimizations. I decided to take a structural/almost             
object oriented approach, yet still do it in C.                                 
                                                                 
Blasphemy, you say? Well after observing the GTK+ project, I became convinced
that by adhering to a set of naming conventions and structure definitions, I
could achieve the compartmentalizing I wanted. So I defined the following
rules: 
	1. Everything is an object
	2. Each object has its own .c and .h file
	3. Each object is manifested by a struct, and functions that 
           start with __nj_object_name_function.
	4. Constructors for objects are called _init, destructors are _fini

So with these rules I set out on the design. My initial approach was to start
from the bottom. I coded the stack (stack.h) and the table (table.h) objects
first. From these I defined the concept of a pool. A pool is a data structure
containing a table and a stack, which allows you to get a chunk of memory from
the table, and return it to the stack when you are done. NJAMD contains three
pools, a callstack pool (callstack_pool.h), an entry pool (entry_pool.h), and
a memory pool (memory_pool.h). Above the pools are the allocator
(allocator.h). It brings everything together and returns a fully described and
cataloged user chunk. On the same level as the allocator but independent of
the pools are the signal handlers (signals.h), the pthread wrapper
(threads.h), the output object (output.h), symbol resolution (libc_syms.h),
and the user preferences (prefs.h).  These are all parts of the NJAMD object
(njamd.h).  The only thing the user sees (public.c) is malloc, free, realloc,
calloc, and some options and diagnostic routines 

For all NJAMD objects, there is no allocation that takes place on
construction.  You supply the struct itself, and NJAMD just fills it in. The
memory may come from pieces of user allocated segments. The only exception to
this are objects that come from the pools themselves.

Also, construction is split into two stages, the bootstrap stage, and the user
stage. This is because user options can't be read from the environment before
a couple of mallocs go by (to set up the environ variable itself on most
systems). 

Now to describe the data structures themselves. I'm going to start from the
bottom and work my way up, much the same way I developed the design. Follow
along with the .h files in the Doxygen reference mentioned above. Most of
these data structures you will never use, even in normal NJAMD development.
Most of the time you will probably spend in the allocator or pool level of the
system.

The stack is just your run of the mill every day stack. The only difference is
that you supply the nj_stack_item pointer already filled in to be pushed on.
The data is a nj_generic_t, which is a u_long, so it can be a pointer or an
integer. The stack is always threadsafe.

The table is a type-less array of memory with a top free byte and optional
file backing. It may resize automatically when full, and it may be threadsafe
or not, all depending on the arguments to __nj_table_bootstrap_init(). Memory
is obtained from the table by requesting exclusive access to the top for a
maximum number of bytes (__nj_table_request_top). When the size is known, you
can then release the top with __nj_table_release_top and the actual size of
the table you needed. This style of allocation is used in the callstack pool,
because the length of the callstack isn't known until after we get it.  Or, if
you know the exact size needed, as in the entry pool and memory pool code, you
can use __nj_table_get_chunk, to get a chunk of a specific size. You can
iterate over all entries in the table (or all indices) with
__nj_table_for_all_entries and indices respectively. This used in the entry
pool to dump leaks, or to find information on a specific pointer. It can also
be used to do analysis on or releasing of leaks or mapped memory in the future.

The memory pool is the source of allocated blocks. It contains a table, as
well as a 2D array of stacks. The table is actually a table of tables. That is
each entry in memory_tables is itself a table. This is where the memory itself
comes from. Each sub-table is a 32meg table that has no backing store an is not
resizeable. When more memory is needed, we simply get a new table. This is
done to cut down on the number of system calls to mmap to get memory. There
are 3 stacks of width 4 each, one for each allocation type (over, under,
sunder), and for block sizes of < 1 page, 2 pages, 3 pages, 4 pages
respectively. These caches are used so that we don't have to munmap and then
mmap again in the event that the user is using CHK_FREE=none. This actually
makes us 1/3 as fast as regular libc malloc.

The entry pool is where the entries for each allocated segment are stored. The
entry pool consists of a table and a stack. Entires come from the table with
__nj_entry_pool_request_index. After you get an index, you initialize it with
__nj_entry_pool_init_index (though this may eventually be rolled into
request_index or vice versa.. you'll see why in a moment). This init function
gets a callstack from the callstack pool for the entry, and also fills in the
fields of the nj_heap_entry that is at the index.  When a segment is being
reallocated, __nj_entry_pool_renew_index is called. It will either save
current index and return a new one, or reuse the current index if the user
requested that we not save info on free in the preferences. Now, when an index
is done being used on a free, it is released with __nj_entry_pool_index_fini,
which may or may not release the with the internal function
entry_pool_release_index, depending again on the save free info pref. The only
interface with this pool should be through indexes, so we can easily do post
mortem analysis with the same codebase (and possibly just a new init function
or something to load an existing table). 

We do memory leak accounting by iterating over the entry pool indices as
mentioned above. Leaked memory is detected by scanning this table for entires
that have a special 'freed' callstack index )NJ_CALLSTACK_INDEX_NOTFREE).
When the address space is being recyled (ie CHK_FREE_NONE is set), a stack is
maintained of free heap table entries. The entry pool is persistant, and uses
mmap to maintain a copy of itself on disk for analysis after program
termination.  A utlity can be written that allows you to scan this saved table
(ie the array of structs) for a specific address, namely an address access
that caused a segmentation fault, giving you the location where this address
was allocated, and possibly where it was freed. A function is also provided
(__nj_ptr_info(void *)) that can be called from gdb to lookup an arbitrary
address while the program still runs.

The callstack pool contains a table, a stack, and an optional file. When a
callstack is needed for an entry, you request an index in the callstack pool,
with __nj_callstack_pool_request_index. Callstacks must be  of a fixed size if
the user is using the no save free info pref, and this allows us to have an
__nj_callstack_pool_renew_index that renew callstacks of fixed size on
realloc, and we can also cache callstacks in __nj_callstack_pool_release_index
if this pref is set. Indexes can be printed with
__nj_callstack_pool_print_index. Callstacks can be printed from anywhere
without using the callstack pool with __nj_callstack_dump().

In obtaining the actual callstacks themselves, we can also search for the
first return address lower than the initial value of sbrk(0) (taken in
__nj_portability_init, described below). This allows us to determine where in
the user's code (ie statically linked code) the allocation actually took
place. This means that we only have to wrap malloc and calloc and friends, and
any allocator that calls them can be traced to user code.  For example, we can
tell the user he leaked memory in strdup at such and such a line in his code.
There is an option to provide a full callstack trace, NJAMD_TRACE_LIBS. This
will provide a callstack trace for all libraries that wrap malloc.

In between the pools and the allocator are two semi-objects, the user chunk,
and the block. The block actually has a structure defined for it, but it is
mainly a go-between device so that we can format a block for an allocation
device and get back two values from it, the user chunk and the entry index
location in the block via __nj_block_init, as well as set the fencepost value
in the block. __nj_block_renew will set up a new block, using the move_data
function pointer (either memcpy or memmove) to copy data from the last
segment. __nj_block_calc_size simply calculates the size of a block based on
the allocation type and the user length. The block object is used in
allocation and reallocation. 

The user chunk is the user's pointer. There are really only two top-level user
chunk functions, __nj_user_chunk_get_entry, and __nj_user_chunk_find_entry.
get_entry() will get the entry index for an allocation type from the known
location, verify the corresponding heap entry, and return it. find_entry() is
used when the allocation type can change through the programs execution. It
finds and checks the heap entry index in the user chunk, and also returns a
heap entry. 

Currently, allocation types are seperated by switch statements in the block
object, and by technique or switch statement in the user_chunk object. This
chould be changed to a set of function pointers, though I'm not sure
if this would buy us much in terms of speed. It will clean the design quite a
bit in the allocator.

So this brings us up to the allocator. The allocator object has a memory pool,
an entry pool, and a symbol for libc_realloc. This is where the design starts
to get a little sub-optimal, because we have to take different action depending
on the allocation type the user requests in the prefs. The block size is
calculated from the user size, and the corresponding chunk is obtained from
the memory pool. The block object is then initialized as described above, and
the entry index pointer is set to an index from the entry pool. The entry pool
index is then initialized. In __nj_allocator_resize_user_chunk, we get the
entry from the user_chunk, then we perform an optimization of skipping the
memory pools if the block size or block address does not change. We then renew
the block and the entry index. In __nj_allocator_release_user_chunk, we find
the entry, and hand over the block to the memory pool to release it. We then
finish off the entry index in the entry pool (either release it or keep it).

The preferences are handled by splitting them into two parts. The first part is
the permanent or static prefs. These are read at program startup in
__nj_prefs_user_init and cannot change. The second set is the dynamic prefs.
These can change during user execution, and are protected by a lock to
__nj_prefs_get and __nj_prefs_set them. They are passed through the entire
allocation chain. Currently preferences are only read through the
environment.

The libc symbols are simply a wrapper for dlopen right now. Hopefully we can
get libltdl or our own wrapper working eventually. Currently we only need
symbols from libc or libpthread, and those are the two functions in this
object.

The threads object is used primarily so that we have some sort of marker when
a user starts using threads, as well as where the callstack starts, because
most threads implementations do not properly terminate the end of the
callstack. The way we do this is by resolving pthread_create with the libc
symbol source and building our own pthread_create that always starts with
__nj_threads_launch, and is given our pthread_arg_wrapper. This way all thread
callstacks end with __nj_threads_launch and we can stop the backtrace before
it. In the future, the threads object may server as a wrapper for pthread
mutexes themselves, since FreeBSD has some weirdness about them being used at
all before we user_init.

The signals object wraps signal and sigaction via
__nj_signals_register_user_signal and __nj_signals_register_user_sigaction so
that we can catch deadly signals and exit properly, or leave a core, depending
on user preference and signal type. Signals that the user registers are
recorded in this object and called in __nj_signals_dispatch.

The output object is mostly unimplemented, but the current plan is to have 4
different types of errors, fatal_user_error, fatal_internal_error,
user_warning, and internal_warning. Each message has a class and a code, both
numerical. User output has a heap entry and a ptr, and internal output has a
function and a line. Both have a format strings, which are the only thing that
is used now. 

Lastly, we have the portability object, isn't really an object at all, but
really the last three global variables that make no sense to lump into an
object, as well as the portability defines for PAGE_SIZE, PAGE_MASK,
PAGE_SHIFT and other system constants. The three globals are __nj_anonfd,
which is either -1, or /dev/zero, depending on the operating system,
__nj_prot, which allows reading of overflows to workaround an old glibc bug,
and __nj_sbrk0, which is the value of sbrk(0) at initialization. This is used
to guess the location of main in the address space, which is just below the
real heap. 

Ok, now that we have the basic design described, lets take a tour of how the
system works from a procedural viewpoint.  The library itself is LD_PRELOAD'ed
(it can also be linked, on systems that don't support preloading). When the
system starts, at the first public function call (malloc, free, signal,
pthread_create, whathaveyou), we run __nj_public_init(), which calls the
bootstrap init for the njamd object. Initialization is recursive,
__nj_njamd_bootstrap_init calls the bootstrap init for each object in njamd,
which call the bootstrap init for their objects.

If our global constructor has been called (hint_main(), see njamd.c), we can
guess that main is just about to be called, because constructors are called
last, when everything in the system is ready for the user's code.  The one
problem with this approach is that C++ also uses attribute constructor for
executing constructors of globally declared objects.  If a globally declared
object is constructed between our constructor and main, and it calls malloc, a
segmentation fault may result. We have done some testing, and it seems that
under Linux when the library is LD_PRELOADed, our constructor is the very last
one called, immediately before main. 

Or, if the compile did not support constructors, we can use the callstack code
to make an educated guess with __nj_callstack_crossed_main. Once we know main
has been crossed, we can call __nj_njamd_user_init, which initializes the user
code of all the njamd objects.

Once this initialization is done, and the state is updated, user allocation
can commence. Allocations can take place without user_init, and have default
values for preferences as defined by __nj_prefs_bootstrap_init.

The four versions of the library (overflow, underflow, strict, and none)
differ in their placement of the page of protected memory. Overflow places the
protected page after the user's buffer. When this is done, the user's buffer
must be offset a certain number of bytes to end at this boundary. Underflow
places the buffer length before the user's buffer, and then a protected page
before that.  This is because we must know the length of the buffer to do
anything with it, and this minor leeway is a tradeoff for not faulting that
protected page.  The strict underflow version (mem_sunder.c) does not make
this tradeoff, and faults the protected page and stores the length in it,
trading off roughly a 2X increase in memory and 25% decrease in speed for the
ability to catch underflows of only one byte. With the "none" option, standard
libc malloc is wrapped via dlopen(2), and memory accounting is done. This is
provided for people who simply want to check memory leaks only. No protected
pages are placed near the memory. As a result, the PROT_NONE option is much
faster and lighter than the rest. Of course, the NJAMD_CHK_FREE options do not
apply when this option is set, and have the effective value of none.  

In all allocation types (including none), sentinal "fencepost" values are
placed on the opposite side of the buffer from the protected page, so the user
can be alerted of corrupted memory by the next call to free.  Consistency
checking is performed on free to determine the integrity of non-protected
regions before AND after the buffer (due to alignment, or the version of the
library). At the worst, you always know of a memory error by the time you free
the memory.

Alignment is also covered by this system. However, alignment of n bytes means
that the overflow and fast underflow versions will not detect overwrites of n
bytes or less until the next free. Detecting alignment of a buffer requires
some trickery best understood by looking at the source code with the doxygen
reference.

Upon system termination via segv, or exit, the proper error messages are
printed out (see __nj_signals_dispatch), and the destructors are run for each
object, causing them to print out any information depending on preferences.
Right now the only object that does this is the entry_pool, which prints out
leaks on exit if the preferences request it.

See also http://fscked.org/writings/SHM/shm-4.html for more information.

- Mike Perry <mikepery@fscked.org>

Future design considerations: 
Instead of multiple switch statements, perhaps changing the allocation type 
will change a set of function pointers. Because checking those allocation
types several times during an allocation is a performance problem.

The output object still needs to be implemented, along with classes and codes.
XML or an alternative output format to make interface with a gui would be
great. If nothing else, just codifying them would allow us to output them in a
more parsable fashion.

The stats object needs to be integrated and implemented, possibly extended.
Add the ability to easily ignore sections of code you don't want to have
memory statistics or leak info for. This could simply be an addition of a
dynamic_prefs flag.

We need someone to update the documentation to reflect the ability to change
allocation types with __nj_set_user_prefs.

The ability to release memory leak info and statistics at arbitrary points.
The release of memory leak info would not actually release any heap entries,
but only place a marker or hold an index value of where to start.

The ability to release all currently protected pages. Like mentioned above,
this would simply be a matter of writing an iterator function for
__nj_table_for_all_entries for the entry_pool object, and unmapping all
entries that were freed with CHK_FREE=error or segv. Calling this more than
once would be a problem though.

The ability to release all info on freed segments at arbitrary points. This is
not the same as above.. It could be used to purge the entry_pool table if
full or too large. I'm not sure why you would want to do this.

The new design currently passes all tests that are written, but we need to
write more tests to verify the ability to change allocation type on the fly
in various transitions.