File: prog-guide.sgml

package info (click to toggle)
squid 2.4.6-2woody8
links: PTS
area: main
in suites: woody
size: 8,724 kB
ctags: 9,570
sloc: ansic: 75,398; sh: 2,213; makefile: 1,839; perl: 1,099; awk: 35
file content (2538 lines) | stat: -rw-r--r-- 76,047 bytes
<!doctype linuxdoc system>
<article>
<title>Squid Programmers Guide</title>
<author>Duane Wessels, Squid Developers
<date>$Id: prog-guide.sgml,v 1.28 2000/06/08 18:05:33 hno Exp $</date>

<abstract>
Squid is a WWW Cache application developed by the National Laboratory
for Applied Network Research and members of the Web Caching community.
Squid is implemented as a single, non-blocking process based around
a BSD select() loop.  This document describes the operation of the Squid
source code and is intended to be used by others who wish to customize
or improve it.
</abstract>

<toc>


<!-- %%%% Chapter : INTRODUCTION %%%% -->
<sect>Introduction

	<P>
	The Squid source code has evolved more from empirical
	observation and tinkering, rather than a solid design
	process.  It carries a legacy of being ``touched'' by
	numerous individuals, each with somewhat different techniques
	and terminology.

	<P>
	Squid is a single-process proxy server.  Every request is
	handled by the main process, with the exception of FTP.
	However, Squid does not use a ``threads package'' such has
	Pthreads.  While this might be easier to code, it suffers
	from portability and performance problems.  Instead Squid
	maintains data structures and state information for each
	active request.

	<P>
	The code is often difficult to follow because there are no
	explicit state variables for the active requests.  Instead,
	thread execution progresses as a sequence of ``callback
	functions'' which get executed when I/O is ready to occur,
	or some other event has happened.  As a callback function
	completes, it is responsible for registering the next
	callback function for subsequent I/O.

	<P>
	Note there is only a pseudo-consistent naming scheme.  In
	most cases functions are named like <tt/moduleFooBar()/.
	However, there are also some functions named like
	<tt/module_foo_bar()/.

	<P>
	Note that the Squid source changes rapidly, and some parts
	of this document may become out-of-date.  If you find any
	inconsistencies, please feel free to notify <url
	url="mailto:squid-dev@squid-cache.org" name="the Squid Developers">.

<sect1>Conventions

	<P>
	Function names and file names will be written in a courier
	font, such as <tt/store.c/ and <tt/storeRegister()/.  Data
	structures and their members will be written in an italicized
	font, such as <em/StoreEntry/.


<sect>Overview of Squid Components

	<P>
Squid consists of the following major components

<sect1>Client Side

	<P>
	Here new client connections are accepted, parsed, and
	processed.  This is where we determine if the request is
	a cache HIT, REFRESH, MISS, etc.  With HTTP/1.1 we may have
	multiple requests from a single TCP connection.  Per-connection
	state information is held in a data structure called
	<em/ConnStateData/.  Per-request state information is stored
	in the <em/clientHttpRequest/ structure.
    
<sect1>Server Side

	<P>
	These routines are responsible for forwarding cache misses
	to other servers, depending on the protocol.  Cache misses
	may be forwarded to either origin servers, or other proxy
	caches.  Note that all requests (FTP, Gopher) to other
	proxies are sent as HTTP requests.  <tt/gopher.c/ is somewhat
	complex and gross because it must convert from the Gopher
	protocol to HTTP.  Wais and Gopher don't receive much
	attention because they comprise a relatively insignificant
	portion of Internet traffic.

<sect1>Storage Manager

	<P>
	The Storage Manager is the glue between client and server
	sides.  Every object saved in the cache is allocated a
	<em/StoreEntry/ structure.  While the object is being
	accessed, it also has a <em/MemObject/ structure.

	<P>
	Squid can quickly locate cached objects because it keeps
	(in memory) a hash table of all <em/StoreEntry/'s.  The
	keys for the hash table are MD5 checksums of the objects
	URI.  In addition there is also a storage policy such
	as LRU that keeps track of the objects and determines
	the removal order when space needs to be reclaimed.
	For the LRU policy this is implemented as a doubly linked
	list.

	<P>
	For each object the <em/StoreEntry/ maps to a cache_dir
	and location via sdirn and sfilen. For the "ufs" store
	this file number (sfilen) is converted to a disk pathname
	by a simple modulo of L2 and L1, but other storage drivers may
	map sfilen in other ways.  A cache swap file consists
	of two parts: the cache metadata, and the object data. 
	Note the object data includes the full HTTP reply---headers
	and body.  The HTTP reply headers are not the same as the
	cache metadata.

	<P>
	Client-side requests register themselves with a <em/StoreEntry/
	to be notified when new data arrives.  Multiple clients
	may receive data via a single <em/StoreEntry/.  For POST
	and PUT request, this process works in reverse.  Server-side
	functions are notified when additional data is read from
	the client.

<sect1>Request Forwarding

<sect1>Peer Selection

	<P>
	These functions are responsible for selecting one (or none)
	of the neighbor caches as the appropriate forwarding
	location.

<sect1>Access Control

	<P>
	These functions are responsible for allowing or denying a
	request, based on a number of different parameters.  These
	parameters include the client's IP address, the hostname
	of the requested resource, the request method, etc.  Some
	of the necessary information may not be immediately available,
	for example the origin server's IP address.  In these cases,
	the ACL routines initiate lookups for the necessary
	information and continues the access control checks when
	the information is available.

<sect1>Network Communication

	<P>
	These are the routines for communicating over TCP and UDP
	network sockets.  Here is where sockets are opened, closed,
	read, and written.  In addition, note that the heart of
	Squid (<tt/comm_select()/ or <tt/comm_poll()/) exists here,
	even though it handles all file descriptors, not just
	network sockets.  These routines do not support queuing
	multiple blocks of data for writing.  Consequently, a
	callback occurs for every write request.

<sect1>File/Disk I/O

	<P>
	Routines for reading and writing disk files (and FIFOs).
	Reasons for separating network and disk I/O functions are
	partly historical, and partly because of different behaviors.
	For example, we don't worry about getting a ``No space left
	on device'' error for network sockets.  The disk I/O routines
	support queuing of multiple blocks for writing.  In some
	cases, it is possible to merge multiple blocks into a single
	write request.  The write callback does not necessarily
	occur for every write request.

<sect1>Neighbors

	<P>
	Maintains the list of neighbor caches.  Sends and receives
	ICP messages to neighbors.  Decides which neighbors to
	query for a given request.  File: <tt/neighbors.c/.

<sect1>IP/FQDN Cache

	<P>
	A cache of name-to-address and address-to-name lookups.
	These are hash tables keyed on the names and addresses.
	<tt/ipcache_nbgethostbyname()/ and <tt/fqdncache_nbgethostbyaddr()/
	implement the non-blocking lookups.  Files: <tt/ipcache.c/,
	<tt/fqdncache.c/.

<sect1>Cache Manager

	<P>
	This provides access to certain information needed by the
	cache administrator.  A companion program, <em/cachemgr.cgi/
	can be used to make this information available via a Web
	browser.  Cache manager requests to Squid are made with a 
	special URL of the form
<verb>
	cache_object://hostname/operation
</verb>
	The cache manager provides essentially ``read-only'' access
	to information.  It does not provide a method for configuring
	Squid while it is running.

<sect1>Network Measurement Database

	<P>
	In a number of situation, Squid finds it useful to know the
	estimated network round-trip time (RTT) between itself and
	origin servers.  A particularly useful is example is
	the peer selection algorithm.  By making RTT measurements, a
	Squid cache will know if it, or one if its neighbors, is closest
	to a given origin server.  The actual measurements are made
	with the <em/pinger/ program, described below.  The measured
	values are stored in a database indexed under two keys.  The 
	primary index field is the /24 prefix of the origin server's
	IP address.  Secondly, a hash table of fully-qualified host
	names have have data structures with links to the appropriate
	network entry.  This allows Squid to quickly look up measurements
	when given either an IP address, or a host name.  The /24 prefix
	aggregation is used to reduce the overall database size.  File:
	<tt/net_db.c/.

<sect1>Redirectors

	<P>
	Squid has the ability to rewrite requests from clients.  After
	checking the access controls, but before checking for cache hits,
	requested URLs may optionally be written to an external
	<em/redirector/ process.  This program, which can be highly
	customized, may return a new URL to replace the original request.
	Common applications for this feature are extended access controls
	and local mirroring.  File: <tt/redirect.c/.

<sect1>Autonomous System Numbers

	<P>
	Squid supports Autonomous System (AS) numbers as another 
	access control element.  The routines in <tt/asn.c/
	query databases which map AS numbers into lists of CIDR
	prefixes.  These results are stored in a radix tree which
	allows fast searching of the AS number for a given IP address.

<sect1>Configuration File Parsing

	<P>
	The primary configuration file specification is in the file
	<tt/cf.data.pre/.  A simple utility program, <tt/cf_gen/,
	reads the <tt/cf.data.pre/ file and generates <tt/cf_parser.c/
	and <tt/squid.conf/.  <tt/cf_parser.c/ is included directly
	into <tt/cache_cf.c/ at compile time.

<sect1>Callback Data Database

	<P>
	Squid's extensive use of callback functions makes it very
	susceptible to memory access errors.  Care must be taken
	so that the <tt/callback_data/ memory is still valid when
	the callback function is executed.  The routines in <tt/cbdata.c/
	provide a uniform method for managing callback data memory,
	canceling callbacks, and preventing erroneous memory accesses.

<sect1>Debugging

	<P>
	Squid includes extensive debugging statements to assist in
	tracking down bugs and strange behavior.  Every debug statement
	is assigned a section and level.  Usually, every debug statement
	in the same source file has the same section.  Levels are chosen
	depending on how much output will be generated, or how useful the
	provided information will be.  The <em/debug_options/ line 
	in the configuration file determines which debug statements will
	be shown and which will not.  The <em/debug_options/ line
	assigns a maximum level for every section.  If a given debug
	statement has a level less than or equal to the configured
	level for that section, it will be shown.  This description
	probably sounds more complicated than it really is.
	File: <em/debug.c/.  Note that <tt/debug()/ itself is a macro.

<sect1>Error Generation

	<P>
	The routines in <tt/errorpage.c/ generate error messages from
	a template file and specific request parameters.  This allows
	for customized error messages and multilingual support.

<sect1>Event Queue

	<P>
	The routines in <tt/event.c/ maintain a linked-list event
	queue for functions to be executed at a future time.  The
	event queue is used for periodic functions such as performing
	cache replacement, cleaning swap directories, as well as one-time
	functions such as ICP query timeouts.

<sect1>Filedescriptor Management

	<P>
	Here we track the number of filedescriptors in use, and the
	number of bytes which has been read from or written to each
	file descriptor.


<sect1>Hashtable Support

	<P>
	These routines implement generic hash tables.  A hash table
	is created with a function for hashing the key values, and a
	function for comparing the key values.

<sect1>HTTP Anonymization

	<P>
	These routines support anonymizing of HTTP requests leaving
	the cache.  Either specific request headers will be removed
	(the ``standard'' mode), or only specific request headers
	will be allowed (the ``paranoid'' mode).


<sect1>Internet Cache Protocol

	<P>
	Here we implement the Internet Cache Protocol.  This 
	protocol is documented in the RFC 2186 and RFC 2187.
	The bulk of code is in the <tt/icp_v2.c/ file.  The 
	other, <tt/icp_v3.c/ is a single function for handling
	ICP queries from Netcache/Netapp caches; they use
	a different version number and a slightly different message
	format.

<sect1>Ident Lookups

	<P>
	These routines support RFC 931 ``Ident'' lookups.   An ident
	server running on a host will report the user name associated
	with a connected TCP socket.  Some sites use this facility for
	access control and logging purposes.

<sect1>Memory Management

	<P>
	These routines allocate and manage pools of memory for
	frequently-used data structures.  When the <em/memory_pools/
	configuration option is enabled, unused memory is not actually
	freed.  Instead it is kept for future use.  This may result
	in more efficient use of memory at the expense of a larger
	process size.

<sect1>Multicast Support

	<P>
	Currently, multicast is only used for ICP queries.   The
	routines in this file implement joining a UDP 
	socket to a multicast group (or groups), and setting
	the multicast TTL value on outgoing packets.

<sect1>Persistent Server Connections

	<P>
	These routines manage idle, persistent HTTP connections
	to origin servers and neighbor caches.  Idle sockets
	are indexed in a hash table by their socket address
	(IP address and port number).  Up to 10 idle sockets
	will be kept for each socket address, but only for
	15 seconds.  After 15 seconds, idle socket connections
	are closed.

<sect1>Refresh Rules

	<P>
	These routines decide wether a cached object is stale or fresh,
	based on the <em/refresh_pattern/ configuration options.
	If an object is fresh, it can be returned as a cache hit.
	If it is stale, then it must be revalidated with an	
	If-Modified-Since request.

<sect1>SNMP Support

	<P>
	These routines implement SNMP for Squid.  At the present time,
	we have made almost all of the cachemgr information available
	via SNMP.

<sect1>URN Support

	<P>
	We are experimenting with URN support in Squid version 1.2.
	Note, we're not talking full-blown generic URN's here. This
	is primarily targeted towards using URN's as an smart way
	of handling lists of mirror sites.  For more details, please
	see <url        url="http://squid.nlanr.net/Squid/urn-support.html"
	name="URN support in Squid">.


<sect>External Programs

<sect1>dnsserver

	<P>
	Because the standard <tt/gethostbyname(3)/ library call
	blocks, Squid must use external processes to actually make
	these calls.  Typically there will be ten <tt/dnsserver/
	processes spawned from Squid.  Communication occurs via
	TCP sockets bound to the loopback interface.  The functions
	in <tt/dns.c/ are primarily concerned with starting and
	stopping the dnsservers.  Reading and writing to and from
	the dnsservers occurs in the IP and FQDN cache modules.

<sect1>pinger

	<P>
	Although it would be possible for Squid to send and receive
	ICMP messages directly, we use an external process for
	two important reasons:
	<enum>
	<item>Because squid handles many filedescriptors simultaneously,
	we get much more accurate RTT measurements when ICMP is
	handled by a separate process.
	<item>Superuser privileges are required to send and receive
	ICMP.  Rather than require Squid to be started as root,
	we prefer to have the smaller and simpler <em/pinger/
	program installed with setuid permissions.
	</enum>

<sect1>unlinkd

	<P>
	The <tt/unlink(2)/ system call can cause a process to block
	for a significant amount of time.  Therefore we do not want
	to make unlink() calls from Squid.  Instead we pass them
	to this external process.

<sect1>redirector

	<P>
	A redirector process reads URLs on stdin and writes (possibly
	changed) URLs on stdout.  It is implemented as an external
	process to maximize flexibility.

<sect>Flow of a Typical Request

	<P>
	<enum>
	<item>
	A client connection is accepted by the <em/client-side/.
	The HTTP request is parsed.

	<item>
	The access controls are checked.  The client-side builds
	an ACL state data structure and registers a callback function
	for notification when access control checking is completed.

	<item>
	After the access controls have been verified, the client-side
	looks for the requested object in the cache.  If is a cache
	hit, then the client-side registers its interest in the
	<em/StoreEntry/.  Otherwise, Squid needs to forward the
	request, perhaps with an If-Modified-Since header.

	<item>
	The request-forwarding process begins with <tt/protoDispatch/.
	This function begins the peer selection procedure, which
	may involve sending ICP queries and receiving ICP replies.
	The peer selection procedure also involves checking
	configuration options such as <em/never_direct/ and
	<em/always_direct/.

	<item>
	When the ICP replies (if any) have been processed, we end
	up at <em/protoStart/.  This function calls an appropriate
	protocol-specific function for forwarding the request.
	Here we will assume it is an HTTP request.

	<item>
	The HTTP module first opens a connection to the origin
	server or cache peer.  If there is no idle persistent socket
	available, a new connection request is given to the Network
	Communication module with a callback function.  The
	<tt/comm.c/ routines may try establishing a connection
	multiple times before giving up.

	<item>
	When a TCP connection has been established, HTTP builds a
	request buffer and submits it for writing on the socket.
	It then registers a read handler to receive and process
	the HTTP reply.

	<item>
	As the reply is initially received, the HTTP reply headers
	are parsed and placed into a reply data structure.  As
	reply data is read, it is appended to the <em/StoreEntry/.
	Every time data is appended to the <em/StoreEntry/, the
	client-side is notified of the new data via a callback
	function.

	<item>
	As the client-side is notified of new data, it copies the
	data from the StoreEntry and submits it for writing on the
	client socket.

	<item>
	As data is appended to the <em/StoreEntry/, and the client(s)
	read it, the data may be submitted for writing to disk.

	<item>
	When the HTTP module finishes reading the reply from the
	upstream server, it marks the <em/StoreEntry/ as ``complete.''
	The server socket is either closed or given to the persistent
	connection pool for future use.

	<item>
	When the client-side has written all of the object data,
	it unregisters itself from the <em/StoreEntry/.  At the
	same time it either waits for another request from the
	client, or closes the client connection.

	</enum>

<sect>Callback Functions

<sect>The Main Loop: <tt/comm_select()/

	<P>
	At the core of Squid is the <tt/select(2)/ system call.
	Squid uses <tt/select()/ or <tt/poll(2)/ to process I/O on
	all open file descriptors.  Hereafter we'll only use
	``select'' to refer generically to either system call.

	<P>
	The <tt/select()/ and <tt/poll()/ system calls work by
	waiting for I/O events on a set of file descriptors.  Squid
	only checks for <em/read/ and <em/write/ events. Squid
	knows that it should check for reading or writing when
	there is a read or write handler registered for a given
	file descriptor.  Handler functions are registered with
	the <tt/commSetSelect/ function.  For example:
<verb>
	commSetSelect(fd, COMM_SELECT_READ, clientReadRequest, conn, 0);
</verb>
	In this example, <em/fd/ is a TCP socket to a client
	connection.  When there is data to be read from the socket,
	then the select loop will execute
<verb>
	clientReadRequest(fd, conn);
</verb>

	<P>
	The I/O handlers are reset every time they are called.  In
	other words, a handler function must re-register itself
	with <tt/commSetSelect/ if it wants to continue reading or
	writing on a file descriptor.  The I/O handler may be
	canceled before being called by providing NULL arguments,
	e.g.:
<verb>
	commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0);
</verb>

	<P>
	These I/O handlers (and others) and their associated callback
	data pointers are saved in the <em/fde/ data structure:
<verb>
	struct _fde {
		...
		PF *read_handler;
		void *read_data;
		PF *write_handler;
		void *write_data;
		close_handler *close_handler;
		DEFER *defer_check;
		void *defer_data;
	};
</verb>
	<em/read_handler/ and <em/write_handler/ are called when
	the file descriptor is ready for reading or writing,
	respectively.  The <em/close_handler/ is called when the
	filedescriptor is closed.   The <em/close_handler/ is
	actually a linked list of callback functions to be called.

	<P>
	In some situations we want to defer reading from a
	filedescriptor, even though it has data for us to read.
	This may be the case when data arrives from the server-side
	faster than it can be written to the client-side.  Before
	adding a filedescriptor to the ``read set'' for select, we
	call <em/defer_check/ (if it is non-NULL).  If <em/defer_check/
	returns 1, then we skip the filedescriptor for that time
	through the select loop.



	<P>
	These handlers are stored in the <em/FD_ENTRY/ structure
	as defined in <tt/comm.h/.  <tt/fd_table[]/ is the global
	array of <em/FD_ENTRY/ structures.  The handler functions
	are of type <em/PF/, which is a typedef:
<verb>
    typedef void (*PF) (int, void *);
</verb>
	The close handler is really a linked list of handler
	functions.  Each handler also has an associated pointer
	<tt/(void *data)/ to some kind of data structure.

	<P>
	<tt/comm_select()/ is the function which issues the select()
	system call.  It scans the entire <tt/fd_table[]/ array
	looking for handler functions.  Each file descriptor with
	a read handler will be set in the <tt/fd_set/ read bitmask.
	Similarly, write handlers are scanned and bits set for the
	write bitmask.  <tt/select()/ is then called, and the return
	read and write bitmasks are scanned for descriptors with
	pending I/O.  For each ready descriptor, the handler is
	called.  Note that the handler is cleared from the
	<em/FD_ENTRY/ before it is called.

	<P>
	After each handler is called, <tt/comm_select_incoming()/
	is called to process new HTTP and ICP requests.

	<P>
	Typical read handlers are
	<tt/httpReadReply()/,
	<tt/diskHandleRead()/,
	<tt/icpHandleUdp()/,
	and <tt/ipcache_dnsHandleRead()/.
	Typical write handlers are
	<tt/commHandleWrite()/,
	<tt/diskHandleWrite()/,
	and <tt/icpUdpReply()/.
	The handler function is set with <tt/commSetSelect()/, with the
	exception of the close handlers, which are set with
	<tt/comm_add_close_handler()/.

	<P>
	The close handlers are normally called from <tt/comm_close()/.
	The job of the close handlers is to deallocate data structures
	associated with the file descriptor.  For this reason
	<tt/comm_close()/ must normally be the last function in a
	sequence to prevent accessing just-freed memory.

	<P>
	The timeout and lifetime handlers are called for file
	descriptors which have been idle for too long.  They are
	further discussed in a following chapter.

<!-- %%%% Chapter : CLIENT REQUEST PROCESSING %%%% -->
<sect>Processing Client Requests

	<P>
	To be written...

<!-- %%%% Chapter : STORAGE MANAGER %%%% -->
<sect>Storage Manager

<sect1>Introduction

	<P>
	The Storage Manager is the glue between client and server
	sides.  Every object saved in the cache is allocated a
	<em/StoreEntry/ structure.  While the object is being
	accessed, it also has a <em/MemObject/ structure.

	<P>
	Squid can quickly locate cached objects because it keeps
	(in memory) a hash table of all <em/StoreEntry/'s.  The
	keys for the hash table are MD5 checksums of the objects
	URI.  In addition there is also a storage policy such
	as LRU that keeps track of the objects and determines
	the removal order when space needs to be reclaimed.
	For the LRU policy this is implemented as a doubly linked
	list.

	<P>
	For each object the <em/StoreEntry/ maps to a cache_dir
	and location via sdirn and sfilen. For the "ufs" store
	this file number (sfilen) is converted to a disk pathname
	by a simple modulo of L2 and L1, but other storage drivers may
	map sfilen in other ways.  A cache swap file consists
	of two parts: the cache metadata, and the object data. 
	Note the object data includes the full HTTP reply---headers
	and body.  The HTTP reply headers are not the same as the
	cache metadata.

	<P>
	Client-side requests register themselves with a <em/StoreEntry/
	to be notified when new data arrives.  Multiple clients
	may receive data via a single <em/StoreEntry/.  For POST
	and PUT request, this process works in reverse.  Server-side
	functions are notified when additional data is read from
	the client.
	
<sect1>Object storage

	<P>
	To be written...

<sect1>Object retreival

	<P>
	To be written...

<!-- %%%% Chapter : STORAGE INTERFACE %%%% -->
<sect>Storage Interface

<sect1>Introduction

	<P>
	Traditionally, Squid has always used the Unix filesystem (UFS)
	to store cache objects on disk.  Over the years, the
	poor performance of UFS has become very obvious.  In most
	cases, UFS limits Squid to about 30-50 requests per second.
	Our work indicates that the poor performance is mostly
	due to the synchronous nature of <tt/open()/ and <tt/unlink()/
	system calls, and perhaps thrashing of inode/buffer caches.

	<P>
	We want to try out our own, customized filesystems with Squid.
	In order to do that, we need a well-defined interface
	for the bits of Squid that access the permanent storage
	devices. We also require tigher control of the replacement
	policy by each storage module, rather than a single global
	replacement policy.

<sect1>Build structure

	<P>
	The storage types live in squid/src/fs/ . Each subdirectory corresponds
	to the name of the storage type. When a new storage type is implemented
	configure.in must be updated to autogenerate a Makefile in
	squid/src/fs/$type/ from a Makefile.in file.

	<P>
	configure will take a list of storage types through the
	<em/--enable-store-io/ parameter. This parameter takes a list of
	space seperated storage types. For example,
	--enable-store-io="ufs coss" .

	<P>
	Each storage type must create an archive file
	<tt/in squid/src/fs/$type/.a . This file is automatically linked into
	squid at compile time.

	<P>
	Each storefs must export a function named <tt/storeFsSetup_$type()/.
	This function is called at runtime to initialise each storage type.
	The list of storage types is passed through <tt/store_modules.sh/
	to generate the initialisation function <tt/storeFsSetup()/. This
	function lives in <tt/store_modules.c/.

	<P>
	An example of the automatically generated file:

<verb>
	/* automatically generated by ./store_modules.sh ufs coss
	 * do not edit
	 */
	#include "squid.h"

	extern STSETUP storeFsSetup_ufs;
	extern STSETUP storeFsSetup_coss;
	void storeFsSetup(void)
	{
       		storeFsAdd("ufs", storeFsSetup_ufs);
        	storeFsAdd("coss", storeFsSetup_coss);
	}
</verb>


<sect1>Initialisation of a storage type

	<P>
	Each storage type initialises through the <tt/storeFsSetup_$type()/
	function.  The <tt/storeFsSetup_$type()/ function takes a single
	argument - a <tt/storefs_entry_t/ pointer. This pointer references
	the storefs_entry to initialise. A typical setup function is as
	follows:

<verb>
	void
	storeFsSetup_ufs(storefs_entry_t *storefs)
	{   
		assert(!ufs_initialised);
		storefs->parsefunc = storeUfsDirParse;
		storefs->reconfigurefunc = storeUfsDirReconfigure;
		storefs->donefunc = storeUfsDirDone;
		ufs_state_pool = memPoolCreate("UFS IO State data", sizeof(ufsstate_t));
		ufs_initialised = 1;
	}
</verb>

	<P>
	There are five function pointers in the storefs_entry which require
	initialising. In this example, some protection is made against the
	setup function being called twice, and a memory pool is initialised
	for use inside the storage module.

	<P>
	Each function will be covered below.


<sect2>done

	<P>
<verb>
	typedef void
	STFSSHUTDOWN(void);
</verb>

	<P>
	This function is called whenever the storage system is to be shut down.
	It should take care of deallocating any resources currently allocated.


<verb>
	typedef void STFSPARSE(SwapDir *SD, int index, char *path);
	typedef void STFSRECONFIGURE(SwapDir *SD, int index, char *path);
</verb>

	<P>
	These functions handle configuring and reconfiguring a storage
	directory. Additional arguments from the cache_dir configuration
	line can be retrieved through calls to strtok() and GetInteger().

	<P>
	<em/STFSPARSE/ has the task of initialising a new swapdir. It should
	parse the remaining arguments on the cache_dir line, initialise the
	relevant function pointers and data structures, and choose the
	replacement policy. <em/STFSRECONFIGURE/ deals with reconfiguring an
	active swapdir.  It should parse the remaining arguments on the
	cache_dir line and change any active configuration parameters. The
	actual storage initialisation is done through the <em/STINIT/ function
	pointer in the SwapDir.

	<P>
<verb>
	struct _SwapDir {
		char *type;				/* Pointer to the store dir type string */
		int cur_size;				/* Current swapsize in kb */
		int low_size;				/* ?? */
		int max_size;				/* Maximum swapsize in kb */
		char *path;				/* Path to store */
		int index;				/* This entry's index into the swapDir array */
		int suggest;				/* Suggestion for UFS style stores (??) */
		size_t max_objsize;			/* Maximum object size for this store */
		union {					/* Replacement policy-specific fields */
		#ifdef HEAP_REPLACEMENT
			struct {
				heap *heap;
			} heap;
		#endif
			struct {
				dlink_list list;
				dlink_node *walker;
			} lru;
		} repl;
		int removals;
		int scanned;
		struct {
			unsigned int selected:1;	/* Currently selected for write */
			unsigned int read_only:1;	/* This store is read only */
		} flags;
		STINIT *init;				/* Initialise the fs */
		STNEWFS *newfs;				/* Create a new fs */
		STDUMP *dump;				/* Dump fs config snippet */
		STFREE *freefs;				/* Free the fs data */
		STDBLCHECK *dblcheck;			/* Double check the obj integrity */
		STSTATFS *statfs;			/* Dump fs statistics */
		STMAINTAINFS *maintainfs;		/* Replacement maintainence */
		STCHECKOBJ *checkob;			/* Check if the fs will store an object, and get the FS load */
		/* These two are notifications */
		STREFOBJ *refobj;			/* Reference this object */
		STUNREFOBJ *unrefobj;			/* Unreference this object */
		STCALLBACK *callback;			/* Handle pending callbacks */
		STSYNC *sync;				/* Sync the directory */
		struct {
			STOBJCREATE *create;		/* Create a new object */
			STOBJOPEN *open;		/* Open an existing object */
			STOBJCLOSE *close;		/* Close an open object */
			STOBJREAD *read;		/* Read from an open object */
			STOBJWRITE *write;		/* Write to a created object */
			STOBJUNLINK *unlink;		/* Remove the given object */
		} obj;
		struct {
			STLOGOPEN *open;		/* Open the log */
			STLOGCLOSE *close;		/* Close the log */
			STLOGWRITE *write;		/* Write to the log */
			struct {
				STLOGCLEANOPEN *open;	/* Open a clean log */
				STLOGCLEANWRITE *write;	/* Write to the log */
				void *state;		/* Current state */
			} clean;
		} log;
		void *fsdata;				/* FS-specific data */
	};
</verb>



<sect1>Operation of a storage module

	<P>
	Squid understands the concept of multiple diverse storage directories.
	Each storage directory provides a caching object store, with object
	storage, retrieval, indexing and replacement. 

	<P>
	Each open object has associated with it a <em/storeIOState/ object. The
	<em/storeIOState/ object is used to record the state of the current
	object. Each <em/storeIOState/ can have a storage module specific data
	structure containing information private to the storage module.

	<P>
<verb>
	struct _storeIOState {
		sdirno swap_dirn;		/* SwapDir index */
		sfileno swap_filen;		/* Unique file index number */
		StoreEntry *e;			/* Pointer to parent StoreEntry */
		mode_t mode;			/* Mode - O_RDONLY or O_WRONLY */
		size_t st_size;			/* Size of the object if known */
		off_t offset;			/* current _on-disk_ offset pointer */
		STFNCB *file_callback;		/* called on delayed sfileno assignments */
		STIOCB *callback;		/* IO Error handler callback */
		void *callback_data;		/* IO Error handler callback data */
		struct {
			STRCB *callback;	/* Read completion callback */
			void *callback_data;	/* Read complation callback data */
		} read;
		struct {
			unsigned int closing:1; /* debugging aid */
		} flags;
		void *fsstate;			/* pointer to private fs state */
	};
</verb>

	<P>
	Each <em/SwapDir/ has the concept of a maximum object size. This is used
	as a basic hint to the storage layer in first choosing a suitable
	<em/SwapDir/. The checkobj function is then called for suitable
	candidate <em/SwapDirs/ to find out whether it wants to store a
	given <em/StoreEntry/. A <em/maxobjsize/ of -1 means 'any size'.

	<P>
	The specific filesystem operations listed in the SwapDir object are
	covered below.

<sect2>initfs

	<P>
<verb>
	typedef void
	STINIT(SwapDir *SD);
</verb>

	<P>
	Initialise the given <em/SwapDir/. Operations such as verifying and
	rebuilding the storage and creating any needed bitmaps are done
	here.


<sect2>newfs

	<P>
<verb>
	typedef void
	STNEWFS(SwapDir *SD);
</verb>

	<P>
	Called for each configured <em/SwapDir/ to perform filesystem
	initialisation. This happens when '-z' is given to squid on the
	command line.


<sect2>dumpfs

	<P>
<verb>
	typedef void
	STDUMP(StoreEntry *e, const char *path, SwapDir *SD);
</verb>

	<P>
	Dump the configuration of the current <em/SwapDir/ to the given
	<em/StoreEntry/.  Used to grab a configuration file dump from th
	<em/cachemgr/ interface. 'const char *' refers to the path of the
	given <em/Swapdir/, and is redundant.


<sect2>freefs

	<P>
<verb>
	typedef void
	STFREE(SwapDir *SD);
</verb>

	<P>
	Free the <em/SwapDir/ filesystem information. This routine should
	deallocate <em/SD->fsdata/.


<sect2>doublecheckfs

	<P>
<verb>
	typedef int
	STDBLCHECK(SwapDir *SD, StoreEntry *e);
</verb>

	<P>
	Double-check the given object for validity. Called during rebuild if
	the '-S' flag is given to squid on the command line. Returns 1 if the
	object is indeed valid, and 0 if the object is found invalid.


<sect2>statfs

	<P>
<verb>
	typedef void
	STSTATFS(SwapDir *SD, StoreEntry *e);
</verb>

	<P>
	Called to retrieve filesystem statistics, such as usage, load and
	errors. The information should be appended to the passed
	<em/StoreEntry/ e.


<sect2>maintainfs

	<P>
<verb>
	typedef void
	STMAINTAINFS(SwapDir *SD);
</verb>

	<P>
	Called periodically to replace objects. The active replacement policy
	should be used to timeout unused objects in order to make room for
	new objects. 

<sect2>callback

	<P>
<verb>
	typedef void
	STCALLBACK(SwapDir *SD);
</verb>

	<P>
	This function is called inside the comm_select/comm_poll loop to handle
	any callbacks pending.


<sect2>sync

	<P>
<verb>
	typedef void
	STSYNC(SwapDir *SD);
</verb>

	<P>
	This function is called whenever a sync to disk is required. This
	function should not return until all pending data has been flushed to
	disk.


<sect2>parse/reconfigure

	<P>

<sect2>checkobj

	<P>
<verb>
	typedef int
	STCHECKOBJ(SwapDir *SD, const StoreEntry *e);
</verb>

	<P>
	Called by <tt/storeDirSelectSwapDir()/ to determine whether the
	<em/SwapDir/ will store the given <em/StoreEntry/ object. If the
	<em/SwapDir/ is not willing to store the given <em/StoreEntry/
	-1 should be returned. Otherwise, a value between 0 and 1000 should
	be returned indicating the current IO load. A value of 1000 indicates
	the <em/SwapDir/ has an IO load of 100%. This is used by
	<tt/storeDirSelectSwapDir()/ to choose the <em/SwapDir/ with the
	lowest IO load.


<sect2>referenceobj

	<P>
<verb>
	typedef void
	STREFOBJ(SwapDir *SD, StoreEntry *e);
</verb>

	<P>
	Called whenever an object is locked by <tt/storeLockObject()/.
	It is typically used to update the objects position in the replacement
	policy.


<sect2>unreferenceobj

	<P>
<verb>
	typedef void
	STUNREFOBJ(SwapDir *SD, StoreEntry *e);
</verb>

	<P>
	Called whenever the object is unlocked by <tt/storeUnlockObject()/
	and the lock count reaches 0. It is also typically used to update the
	objects position in the replacement policy.


<sect2>createobj

	<P>
<verb>
	typedef storeIOState *
	STOBJCREATE(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data);
</verb>

	<P>
	Create an object in the <em/SwapDir/ *SD. <em/file_callback/ is called
	whenever the filesystem allocates or reallocates the <em/swap_filen/.
	Note - <em/STFNCB/ is called with a generic cbdata pointer, which
	points to the <em/StoreEntry/ e.  The <em/StoreEntry/ should not be
	modified EXCEPT for the replacement policy fields.

	<P>
	The IO callback should be called when an error occurs and when the
	object is closed. Once the IO callback is called, the <em/storeIOState/
	becomes invalid.

	<P>
	<em/STOBJCREATE/ returns a <em/storeIOState/ suitable for writing on
	sucess, or NULL if an error occurs.


<sect2>openobj

	<P>
<verb>
	typedef storeIOState *
	STOBJOPEN(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data);
</verb>

	<P>
	Open the <em/StoreEntry/ in <em/SwapDir/ *SD for reading. Much the
	same is applicable from <em/STOBJCREATE/, the major difference being
	that the data passed to <em/file_callback/ is the relevant
	<em/store_client/ .


<sect2>closeobj

	<P>
<verb>
	typedef void
	STOBJCLOSE(SwapDir *SD, storeIOState *sio);
</verb>

	<P>
	Close an opened object. The <em/STIOCB/ callback should be called at
	the end of this routine.


<sect2>readobj

	<P>
<verb>
	typedef void
	STOBJREAD(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *read_callback, void *read_callback_data);
</verb>

	<P>
	Read part of the object of into <em/buf/. It is safe to request a read
	when there are other pending reads or writes. <em/STRCB/ is called at
	completion.

	<P>
	If a read operation fails, the filesystem layer notifies the
	calling module by calling the <em/STIOCB/ callback with an
	error status code.


<sect2>writeobj

	<P>
<verb>
	typedef void
	STOBJWRITE(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, FREE *freefunc);
</verb>

	<P>
	Write the given block of data to the given store object. <em/buf/ is
	allocated by the caller. When the write is complete, the data is freed
	through <em/free_func/.

	<P>
	If a write operation fails, the filesystem layer notifies the
	calling module by calling the <em/STIOCB/ callback with an
	error status code.


<sect2>unlinkobj

	<P>
<verb>
	typedef void STOBJUNLINK(SwapDir *, StoreEntry *);
</verb>

	<P>
	Remove the <em/StoreEntry/ e from the <em/SwapDir/ SD and the
	replacement policy.



<sect1>Store IO calls

	<P>
	These routines are used inside the storage manager to create and
	retrieve objects from a storage directory.

<sect2>storeCreate()

	<P>
<verb>
	storeIOState *
	storeCreate(StoreEntry *e, STIOCB *file_callback, STIOCB *close_callback, void * callback_data)
</verb>

	<P>
	<tt/storeCreate/ is called to store the given <em/StoreEntry/ in
	a storage directory. 

	<P>
	<tt/callback/ is a function that will be called either when
	an error is encountered, or when the object is closed (by
	calling <tt/storeClose()/).  If the open request is
	successful, there is no callback.  The calling module must
	assume the open request will succeed, and may begin reading
	or writing immediately.

	<P>
	<tt/storeCreate()/ may return NULL if the requested object
	can not be created.  In this case the <tt/callback/ function
	will not be called.


<sect2>storeOpen()

	<P>
<verb>
	storeIOState *
	storeOpen(StoreEntry *e, STFNCB * file_callback, STIOCB * callback, void *callback_data)
</verb>

	<P>
	<tt/storeOpen/ is called to open the given <em/StoreEntry/ from
	the storage directory it resides on.

	<P>
	<tt/callback/ is a function that will be called either when
	an error is encountered, or when the object is closed (by
	calling <tt/storeClose()/).  If the open request is
	successful, there is no callback.  The calling module must
	assume the open request will succeed, and may begin reading
	or writing immediately.

	<P>
	<tt/storeOpen()/ may return NULL if the requested object
	can not be openeed.  In this case the <tt/callback/ function
	will not be called.


<sect2>storeRead()

        <P>
<verb>
	void
	storeRead(storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *callback, void *callback_data)
</verb>

	<P>
	<tt/storeRead()/ is more complicated than the other functions
	because it requires its own callback function to notify the
	caller when the requested data has actually been read.
	<em/buf/ must be a valid memory buffer of at least <em/size/
	bytes.  <em/offset/ specifies the byte offset where the
	read should begin.  Note that with the Swap Meta Headers
	prepended to each cache object, this offset does not equal
	the offset into the actual object data.

	<P>
	The caller is responsible for allocating and freeing <em/buf/ .


<sect2>storeWrite()

        <P>
<verb>
	void
	storeWrite(storeIOState *sio, char *buf, size_t size, off_t offset, FREE *free_func)
</verb>

	<P>
	<tt/storeWrite()/ submits a request to write a block
	of data to the disk store.
	The caller is responsible for allocating <em/buf/, but since
	there is no per-write callback, this memory must be freed by
	the lower filesystem implementation.  Therefore, the caller
	must specify the <em/free_func/ to be used to deallocate
	the memory.

	<P>
	If a write operation fails, the filesystem layer notifies the
	calling module by calling the <em/STIOCB/ callback with an
	error status code.


<sect2>storeUnlink()

        <P>
<verb>
        void
	storeUnlink(StoreEntry *e)
</verb>

	<P>
	<tt/storeUnlink()/ removes the cached object from the disk
	store.  There is no callback function, and the object
	does not need to be opened first.  The filesystem
	layer will remove the object if it exists on the disk.


<sect2>storeOffset()

        <P>
<verb>
        off_t storeOffset(storeIOState *sio)
</verb>



	<P>
	<tt/storeOffset()/ returns the current _ondisk_ offset. This is used to
	determine how much of an objects memory can be freed to make way for
	other in-transit and cached objects. You must make sure that the
	<em/storeIOState->offset/ refers to the ondisk offset, or undefined
	results will occur. For reads, this returns the current offset of
	sucessfully read data, not including queued reads.


<sect1>Callbacks

<sect2><em/STIOCB/ callback

	<P>
<verb>
	void
	stiocb(void *data, int errorflag, storeIOState *sio)
</verb>

	<P>
	The <em/stiocb/ function is passed as a parameter to
	<tt/storeOpen()/.  The filesystem layer calls <em/stiocb/
	either when an I/O error occurs, or when the disk
	object is closed.

	<P>
	<em/errorflag/ is one of the following:
<verb>
	#define DISK_OK                   (0)
	#define DISK_ERROR               (-1)
	#define DISK_EOF                 (-2)
	#define DISK_NO_SPACE_LEFT       (-6)
</verb>

	<P>
	Once the The <em/stiocb/ function has been called,
	the <em/sio/ structure should not be accessed further.

<sect2><em/STRCB/ callback

	<P>
<verb>
	void
	strcb(void *data, const char *buf, size_t len)
</verb>

	<P>
	The <em/strcb/ function is passed as a parameter to
	<tt/storeRead()/.  The filesystem layer calls <em/strcb/
	after a block of data has been read from the disk and placed
	into <em/buf/.  <em/len/ indicates how many bytes were
	placed into <em/buf/.  The <em/strcb/ function is only
	called if the read operation is successful.  If it fails,
	then the <em/STIOCB/ callback will be called instead.

<sect1>State Logging

	<P>
	These functions deal with state
	logging and related tasks for a squid storage system.
	These functions are used (called) in <tt/store_dir.c/.

	<P>
	Each storage system must provide the functions described
	in this section, although it may be a no-op (null) function
	that does nothing.  Each function is accessed through a
	function pointer stored in the <em/SwapDir/ structure:

<verb>
    struct _SwapDir {
        ...
        STINIT *init;
        STNEWFS *newfs;
        struct {
            STLOGOPEN *open;
            STLOGCLOSE *close;
            STLOGWRITE *write;
            struct {
                STLOGCLEANOPEN *open;
                STLOGCLEANWRITE *write;
                void *state;
            } clean;
        } log;
        ....
    };
</verb>

<sect2><tt/log.open()/

	<P>
<verb>
	void
	STLOGOPEN(SwapDir *);
</verb>

	<P>
	The <tt/log.open/ function, of type <em/STLOGOPEN/,
	is used to open or initialize the state-holding log
	files (if any) for the storage system.  For UFS this
	opens the <em/swap.state/ files.

	<P>
	The <tt/log.open/ function may be called any number of
	times during Squid's execution.  For example, the
	process of rotating, or writing clean logfiles closes
	the state log and then re-opens them.  A <em/squid -k reconfigure/
	does the same.

<sect2><tt/log.close()/

	<P>
<verb>
	void
	STLOGCLOSE(SwapDir *);
</verb>

	<P>
	The <tt/log.close/ function, of type <em/STLOGCLOSE/, is
	obviously the counterpart to <tt/log.open/.  It must close
	the open state-holding log files (if any) for the storage
	system.

<sect2><tt/log.write()/

	<P>
<verb>
	void
	STLOGWRITE(const SwapDir *, const StoreEntry *, int op);
</verb>

	<P>
	The <tt/log.write/ function, of type <em/STLOGWRITE/, is
	used to write an entry to the state-holding log file.  The
	<em/op/ argument is either <em/SWAP_LOG_ADD/ or <em/SWAP_LOG_DEL/.
	This feature may not be required by some storage systems
	and can be implemented as a null-function (no-op).

<sect2><tt/log.clean.start()/

	<P>
<verb>
	int
	STLOGCLEANSTART(SwapDir *);
</verb>

	<P>
	The <tt/log.clean.start/ function, of type <em/STLOGCLEANSTART/,
	is used for the process of writing "clean" state-holding
	log files.  The clean-writing procedure is initiated by
	the <em/squid -k rotate/ command.  This is a special case
	because we want to optimize the process as much as possible.
	This might be a no-op for some storage systems that don't
	have the same logging issues as UFS.

	<P>
	The <em/log.clean.state/ pointer may be used to
	keep state information for the clean-writing process, but
	should not be accessed by upper layers.

<sect2><tt/log.clean.nextentry()/

	<P>
<verb>
	StoreEntry *
	STLOGCLEANNEXTENTRY(SwapDir *);
</verb>

	<P>
	Gets the next entry that is a candidate for the clean log.

	<P>
	Returns NULL when there is no more objects to log

<sect2><tt/log.clean.write()/

	<P>
<verb>
	void
	STLOGCLEANWRITE(SwapDir *, const StoreEntry *);
</verb>

	<P>
	The <tt/log.clean.write/ function, of type <em/STLOGCLEANWRITE/,
	writes an entry to the clean log file (if any).

<sect2><tt/log.clean.done()/

	<P>
<verb>
	void
	STLOGCLEANDONE(SwapDir *);
</verb>

	<P>
	Indicates the end of the clean-writing process and signals
	the storage system to close the clean log, and rename or
	move them to become the official state-holding log ready
	to be opened.

<sect1>Replacement policy implementation

<P>
The replacement policy can be updated during STOBJREAD/STOBJWRITE/STOBJOPEN/
STOBJCLOSE as well as STREFOBJ and STUNREFOBJ. Care should be taken to
only modify the relevant replacement policy entries in the StoreEntry.
The responsibility of replacement policy maintainence has been moved into
each SwapDir so that the storage code can have tight control of the
replacement policy. Cyclic filesystems such as COSS require this tight
coupling between the storage layer and the replacement policy.


<sect1>Removal policy API

	<P>
	The removal policy is responsible for determining in which order
	objects are deleted when Squid needs to reclaim space for new objects.
	Such a policy is used by a object storage for maintaining the stored
	objects and determining what to remove to reclaim space for new objects.
	(together they implements a replacement policy)
	
<sect2>API
	<P>
	It is implemented as a modular API where a storage directory or
	memory creates a policy of choice for maintaining it's objects,
	and modules registerering to be used by this API.

<sect3>createRemovalPolicy()

<P>
<verb>
	RemovalPolicy policy = createRemovalPolicy(cons char *type, cons char *args)
</verb>

	<P>
	Creates a removal policy instance where object priority can be
	maintained

	<P>
	The returned RemovalPolicy instance is cbdata registered

<sect3>policy.Free()
	
	<P>
<verb>
	policy-&gt;Free(RemovalPolicy *policy)
</verb>

<P>
	Destroys the policy instance and frees all related memory.

<sect3>policy.Add()

<P>
<verb>
	policy-&gt;Add(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node)
</verb>

	<P>
	Adds a StoreEntry to the policy instance.
	
	<P>
	datap is a pointer to where policy specific data can be stored
	for the store entry, currently the size of one (void *) pointer.

<sect3>policy.Remove()

<P>
<verb>
	policy-&gt;Remove(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node)
</verb>

	<P>
	Removes a StoreEntry from the policy instance out of
	policy order. For example when an object is replaced
	by a newer one or is manually purged from the store.

	<P>
	datap is a pointer to where policy specific data is stored
	for the store entry, currently the size of one (void *) pointer.

<sect3>policy.Referenced()

<P>
<verb>
	policy-&gt;Referenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node)
</verb>

	<P>
	Tells the policy that a StoreEntry is going to be referenced. Called
	whenever a entry gets locked.

	<P>
	node is a pointer to where policy specific data is stored
	for the store entry, currently the size of one (void *) pointer.

<sect3>policy.Dereferenced()

<P>
<verb>
	policy-&gt;Dereferenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node)
</verb>

	<P>
	Tells the policy that a StoreEntry has been referenced. Called when
	an access to the entry has finished.

	<P>
	node is a pointer to where policy specific data is stored
	for the store entry, currently the size of one (void *) pointer.

<sect3>policy.WalkInit()

<P>
<verb>
	RemovalPolicyWalker walker = policy-&gt;WalkInit(RemovalPolicy *policy)
</verb>

	<P>
	Initiates a walk of all objects in the policy instance.
	The objects is returned in an order suitable for using
	as reinsertion order when rebuilding the policy.

	<P>
	The returned RemovalPolicyWalker instance is cbdata registered

	<P>
	Note: The walk must be performed as an atomic operation
	with no other policy actions interveaning, or the outcome
	will be undefined.

<sect3>walker.Next()

	<P>
<verb>
	const StoreEntry *entry = walker-&gt;Next(RemovalPolicyWalker *walker)
</verb>

<P>
	Gets the next object in the walk chain

	<P>
	Return NULL when there is no further objects

<sect3>walker.Done()

<P>
<verb>
	walker-&gt;Done(RemovalPolicyWalker *walker)
</verb>

	<P>
	Finishes a walk of the maintaned objects, destroys
	walker.

<sect3>policy.PurgeInit()

<P>
<verb>
	RemovalPurgeWalker purgewalker = policy-&gt;PurgeInit(RemovalPolicy *policy, int max_scan)
</verb>

	<P>
	Initiates a search for removal candidates. Seach depth is indicated
	by max_scan.

	<P>
	The returned RemovalPurgeWalker instance is cbdata registered

	<P>
	Note: The walk must be performed as an atomic operation
	with no other policy actions interveaning, or the outcome
	will be undefined.

<sect3>purgewalker.Next()

<P>
<verb>
	StoreEntry *entry = purgewalker-&gt;Next(RemovalPurgeWalker *purgewalker)
</verb>

	<P>
	Gets the next object to purge. The purgewalker will remove each
	returned object from the policy.
	
	<P>It is the policys responsibility to verify that the object
	isn't locked or otherwise prevented from being removed. What this
	means is that the policy must not return objects where
	storeEntryLocked() is true.

	<P>
	Return NULL when there is no further purgeable objects in the policy.

<sect3>purgewalker.Done()

<P>
<verb>
	purgewalker-&gt;Done(RemovalPurgeWalker *purgewalker)
</verb>

	<P>
	Finishes a walk of the maintaned objects, destroys
	walker and restores the policy to it's normal state.

<sect2>Future removal policy implementation

<sect3>Source layout

<P>
	Policy implementations resides in src/repl/&lt;name&gt;/, and a make in
	such a directory must result in a object archive src/repl/&lt;name&gt;.a
	containing all the objects implementing the policy.

<sect3>Internal structures

<sect4>RemovalPolicy

<P>
<verb>
	typedef struct _RemovalPolicy RemovalPolicy;
	struct _RemovalPolicy {
	    char *_type;
	    void *_data;
	    void (*add)(RemovalPolicy *policy, StoreEntry *);
	    ... /* see the API definition above */
	};
</verb>

<P>
	The _type member is mainly for debugging and diagnostics purposes, and
	should be a pointer to the name of the policy (same name as used for
	creation)

<P>
	The _data member is for storing policy specific information.

<sect4>RemovalPolicyWalker

<P>
<verb>
	typedef struct _RemovalPolicyWalker RemovalPolicyWalker;
	struct _RemovalPolicyWalker {
	    RemovalPolicy *_policy;
	    void *_data;
	    StoreEntry *(*next)(RemovalPolicyWalker *);
	    ... /* see the API definition above */
	};
</verb>

<sect4>RemovalPolicyNode

<P>
<verb>
	typedef struct _RemovalPolicyNode RemovalPolicyNode;
	struct _RemovalPolicyNode {
	    void *data;
	};
</verb>

	Stores policy specific information about a entry. Currently
	there is only space for a single pointer, but plans are to
	maybe later provide more space here to allow simple policies
	to store all their data "inline" to preserve some memory.

<sect3>Policy registration

<P>
	Policies are automatically registered in the Squid binary from the
	policy selection made by the user building Squid. In the future this
	might get extended to support loadable modules. All registered
	policies are available to object stores which wishes to use them.

<sect3>Policy instance creation

<P>
	Each policy must implement a "create/new" function "<tt/RemovalPolicy *
	createRemovalPolicy_&lt;name&gt;(char *arguments)/". This function
	creates the policy instance and populates it with at least the API
	methods supported. Currently all API calls are mandatory, but the
	policy implementation must make sure to NULL fill the structure prior
	to populating it in order to assure future API compability.

<P>
	It should also populate the _data member with a pointer to policy
	specific data.

<P>
	Prior to returning the created instance must be registered as
	callback-data by calling cbdataAdd().

<sect3>Walker

<P>
	When a walker is created the policy populates it with at least the API
	methods supported. Currently all API calls are mandatory, but the
	policy implementation must make sure to NULL fill the structure prior
	to populating it in order to assure future API compability.

<P>
	Prior to returning the created instance must be registered as
	callback-data by calling cbdataAdd().

<sect2>Design notes/bugs

<P>
	The RemovalPolicyNode design is incomplete/insufficient. The intention
	was to abstract the location of the index pointers from the policy
	implementation to allow the policy to work on both on-disk and memory
	caches, but unfortunately the purge method for HEAP based policies
	needs to update this, and it is also preferable if the purge method
	in general knows how to clear the information. I think the agreement
	was that the current design of thightly coupling the two together
	on one StoreEntry is not the best design possible.

<P>
	It is debated if the design in having the policy index control the
	clean index writes is the correct approach. Perhaps not. Perhaps a
	more appropriate design is probably to do the store indexing
	completely outside the policy implementation (i.e. using the hash
	index), and only ask the policy to dump it's state somehow.

<P>
	The Referenced/Dereferenced() calls is today mapped to lock/unlock
	which is an approximation of when they are intended to be called.
	However, the real intention is to have Referenced() called whenever
	an object is referenced, and Dereferenced() only called when the
	object has actually been used for anything good.

<!-- %%%% Chapter : FORWARDING SELECTION %%%% -->
<sect>Forwarding Selection

	<P>
	To be written...

<!-- %%%% Chapter : IP/FQDN CACHE %%%% -->
<sect>IP Cache and FQDN Cache

<sect1> Introduction

	<P>
	The IP cache is a built-in component of squid providing
	Hostname to IP-Number translation functionality and managing
	the involved data-structures. Efficiency concerns require
	mechanisms that allow non-blocking access to these mappings.
	The IP cache usually doesn't block on a request except for
	special cases where this is desired (see below).

<sect1> Data Structures 

	<P>
	The data structure used for storing name-address mappings
	is a small hashtable (<em>static hash_table *ip_table</em>),
	where structures of type <em>ipcache_entry</em> whose most
	interesting members are:

<verb>
	struct _ipcache_entry {
		char *name;
		time_t lastref;
		ipcache_addrs addrs;
		struct _ip_pending *pending_head;
		char *error_message;
		unsigned char locks;
		ipcache_status_t status:3;
	}
</verb>


<sect1> External overview

	<P>
	Main functionality
	is provided through calls to:
	<descrip>

	<tag>ipcache_nbgethostbyname(const char *name, IPH *handler,
	void *handlerdata)</tag>
	where <em/name/ is the name of the host to resolve,
	<em/handler/ is a pointer to the function to be called when
	the reply from the IP cache (or the DNS if the IP cache
	misses) and <em/handlerdata/ is information that is passed
	to the handler and does not affect the IP cache.

	<tag>ipcache_gethostbyname(const char *name,int flags)</tag>
	is different in that it only checks if an entry exists in
	it's data-structures and does not by default contact the
	DNS, unless this is requested, by setting the <em/flags/
	to <em/IP_BLOCKING_LOOKUP/ or <em/IP_LOOKUP_IF_MISS/.

	<tag>ipcache_init()</tag> is called from <em/mainInitialize()/
	after disk initialization and prior to the reverse fqdn
	cache initialization

	<tag>ipcache_restart()</tag> is called to clear the IP
	cache's data structures, cancel all pending requests.
	Currently, it is only called from <em/mainReconfigure/.

	</descrip>

<sect1> Internal Operation 

	<P>
	Internally, the execution flow is as follows: On a miss,
	<em/ipcache_getnbhostbyname/ checks whether a request for
	this name is already pending, and if positive, it creates
	a new entry using <em/ipcacheAddNew/ with the <em/IP_PENDING/
	flag set . Then it calls <em/ipcacheAddPending/ to add a
	request to the queue together with data and handler.  Else,
	<em/ipcache_dnsDispatch()/ is called to directly create a
	DNS query or to <em/ipcacheEnqueue()/ if all no DNS port
	is free.  <em/ipcache_call_pending()/ is called regularly
	to walk down the pending list and call handlers. LRU clean-up
	is performed through <em/ipcache_purgelru()/ according to
	the <em/ipcache_high/ threshold.

<!-- %%%% Chapter : SERVER PROTOCOLS %%%% -->
<sect>Server Protocols
<sect1>HTTP

	<P>
	To be written...

<sect1>FTP

	<P>
	To be written...

<sect1>Gopher

	<P>
	To be written...

<sect1>Wais

	<P>
	To be written...

<sect1>SSL

	<P>
	To be written...

<sect1>Passthrough

	<P>
	To be written...

<!-- %%%% Chapter : TIMEOUTS %%%% -->
<sect>Timeouts

	<P>
	To be written...

<!-- %%%% Chapter : EVENTS %%%% -->
<sect>Events

	<P>
	To be written...

<!-- %%%% Chapter : ACCESS CONTROLS %%%% -->
<sect>Access Controls

	<P>
	To be written...

<!-- %%%% Chapter : ICP %%%% -->
<sect>ICP

	<P>
	To be written...

<!-- %%%% Chapter : NETDB %%%% -->
<sect>Network Measurement Database

	<P>
	To be written...

<!-- %%%% Chapter : Error Pages %%%% -->
<sect>Error Pages

	<P>
	To be written...

<!-- %%%% Chapter : Callback Data Base %%%% -->
<sect>Callback Data Database

	<P>
	Squid's extensive use of callback functions makes it very
	susceptible to memory access errors.  For a blocking operation
	with callback functions, the normal sequence of events is as
	follows:
<verb>
	callback_data = malloc(...);
	...
	fooOperationStart(bar, callback_func, callback_data);
	...
	fooOperationComplete(...);
	callback_func(callback_data, ....);
	...
	free(callback_data);
</verb>
	However, things become more interesting if we want or need
	to free the callback_data, or otherwise cancel the callback,
	before the operation completes.

	<P>
	The callback data database lets us do this in a uniform and
	safe manner.  Every callback_data pointer must be added to the
	database.  It is then locked while the blocking operation executes
	elsewhere, and is freed when the operation completes.  The normal
	sequence of events is:
<verb>
	callback_data = malloc(...);
	cbdataAdd(callback_data);
	...
	cbdataLock(callback_data);
	fooOperationStart(bar, callback_func, callback_data);
	...
	fooOperationComplete(...);
	if (cbdataValid(callback_data)) {
		callback_func(callback_data, ....);
	cbdataUnlock(callback_data);
	cbdataFree(callback_data);
</verb>

	<P>
	With this scheme, nothing bad happens if <tt/cbdataFree/ gets called
	before <tt/cbdataUnlock/:
<verb>
	callback_data = malloc(...);
	cbdataAdd(callback_data);
	...
	cbdataLock(callback_data);
	fooOperationStart(bar, callback_func, callback_data);
	...
	cbdataFree(callback_data);
	...
	fooOperationComplete(...);
	if (cbdataValid(callback_data)) {
		callback_func(callback_data, ....);
	cbdataUnlock(callback_data);
</verb>
	In this case, when <tt/cbdataFree/ is called before 
	<tt/cbdataUnlock/, the callback_data gets marked as invalid.  Before
	executing the callback function, <tt/cbdataValid/ will return 0
	and callback_func is never executed.  When <tt/cbdataUnlock/ gets
	called, it notices that the callback_data is invalid and will
	then call <tt/cbdataFree/.

<!-- %%%% Chapter : CACHE MANAGER %%%% -->
<sect>Cache Manager

	<P>
	To be written...

<!-- %%%% Chapter : HTTP Headers %%%% -->
<sect>HTTP Headers

	<P>
	<em/Files:/
        <tt/HttpHeader.c/,
        <tt/HttpHeaderTools.c/,
        <tt/HttpHdrCc.c/,
        <tt/HttpHdrContRange.c/,
        <tt/HttpHdrExtField.c/,
        <tt/HttpHdrRange.c/


	<P> 
	<tt/HttpHeader/ class encapsulates methods and data for HTTP header
	manipulation.  <tt/HttpHeader/ can be viewed as a collection of HTTP
	header-fields with such common operations as add, delete, and find.
	Compared to an ascii "string" representation, <tt/HttpHeader/ performs
	those operations without rebuilding the underlying structures from
	scratch or searching through the entire "string".

<sect1>General remarks

	<P>
	<tt/HttpHeader/ is a collection (or array) of HTTP header-fields. A header
	field is represented by an <tt/HttpHeaderEntry/ object. <tt/HttpHeaderEntry/ is
	an (id, name, value) triplet.  Meaningful "Id"s are defined for
	"well-known" header-fields like "Connection" or "Content-Length".
	When Squid fails to recognize a field, it uses special "id",
	<em/HDR_OTHER/.  Ids are formed by capitalizing the corresponding HTTP
	header-field name and replacing dashes ('-') with underscores ('_').

	<P>
	Most operations on <tt/HttpHeader/ require a "known" id as a parameter. The
	rationale behind the later restriction is that Squid programmer should
	operate on "known" fields only. If a new field is being added to
	header processing, it must be given an id.
 
<sect1>Life cycle

	<P> 
	<tt/HttpHeader/ follows a common pattern for object initialization and
	cleaning:

<verb>
    /* declare */
    HttpHeader hdr;
    
    /* initialize (as an HTTP Request header) */
    httpHeaderInit(&amp;hdr, hoRequest);

    /* do something */
    ...

    /* cleanup */
    httpHeaderClean(&amp;hdr);
</verb>

	<P> 
	Prior to use, an <tt/HttpHeader/ must be initialized. A
	programmer must specify if a header belongs to a request
	or reply message. The "ownership" information is used mostly
	for statistical purposes.

	<P>
	Once initialized, the <tt/HttpHeader/ object <em/must/ be,
	eventually, cleaned.  Failure to do so will result in a
	memory leak.

	<P>
	Note that there are no methods for "creating" or "destroying"
	a "dynamic" <tt/HttpHeader/ object. Looks like headers are
	always stored as a part of another object or as a temporary
	variable. Thus, dynamic allocation of headers is not needed.


<sect1>Header Manipulation.

	<P>
	The mostly common operations on HTTP headers are testing
	for a particular header-field (<tt/httpHeaderHas()/),
	extracting field-values (<tt/httpHeaderGet*()/), and adding
	new fields (<tt/httpHeaderPut*()/).

	<P>
	<tt/httpHeaderHas(hdr, id)/ returns true if at least one
	header-field specified by "id" is present in the header.
	Note that using <em/HDR_OTHER/ as an id is prohibited.
	There is usually no reason to know if there are "other"
	header-fields in a header.

	<P>
	<tt/httpHeaderGet&lt;Type&gt;(hdr, id)/ returns the value
	of the specified header-field.  The "Type" must match
	header-field type. If a header is not present a "null"
	value is returned. "Null" values depend on field-type, of
	course.

	<P>
	Special care must be taken when several header-fields with
	the same id are preset in the header. If HTTP protocol
	allows only one copy of the specified field per header
	(e.g. "Content-Length"), <tt/httpHeaderGet&lt;Type&gt;()/
	will return one of the field-values (chosen semi-randomly).
	If HTTP protocol allows for several values (e.g. "Accept"),
	a "String List" will be returned.

	<P>
	It is prohibited to ask for a List of values when only one
	value is permitted, and visa-versa. This restriction prevents
	a programmer from processing one value of an header-field
	while ignoring other valid values.

	<P>
	<tt/httpHeaderPut&lt;Type&gt;(hdr, id, value)/ will add an
	header-field with a specified field-name (based on "id")
	and field_value. The location of the newly added field in
	the header array is undefined, but it is guaranteed to be
	after all fields with the same "id" if any. Note that old
	header-fields with the same id (if any) are not altered in
	any way.

	<P>
	The value being put using one of the <tt/httpHeaderPut()/
	methods is converted to and stored as a String object.

	<P>
	Example:

<verb>
	    /* add our own Age field if none was added before */
	    int age = ...
	    if (!httpHeaderHas(hdr, HDR_AGE))
		httpHeaderPutInt(hdr, HDR_AGE, age);
</verb>

	<P>
	There are two ways to delete a field from a header. To
	delete a "known" field (a field with "id" other than
	<em/HDR_OTHER/), use <tt/httpHeaderDelById()/ function.
	Sometimes, it is convenient to delete all fields with a
	given name ("known" or not) using <tt/httpHeaderDelByName()/
	method. Both methods will delete <em/all/ fields specified.

	<P>
	The <em/httpHeaderGetEntry(hdr, pos)/ function can be used
	for iterating through all fields in a given header. Iteration
	is controlled by the <em/pos/ parameter. Thus, several
	concurrent iterations over one <em/hdr/ are possible. It
	is also safe to delete/add fields from/to <em/hdr/ while
	iteration is in progress.

<verb>
	/* delete all fields with a given name */
	HttpHeaderPos pos = HttpHeaderInitPos;
	HttpHeaderEntry *e;
	while ((e = httpHeaderGetEntry(hdr, &amp;pos))) {
		if (!strCaseCmp(e->name, name))
			... /* delete entry */
	}
</verb>

	Note that <em/httpHeaderGetEntry()/ is a low level function
	and must not be used if high level alternatives are available.
	For example, to delete an entry with a given name, use the
	<em/httpHeaderDelByName()/ function rather than the loop
	above.

<sect1>I/O and Headers.

	<P>
	To store a header in a file or socket, pack it using
	<tt/httpHeaderPackInto()/ method and a corresponding
	"Packer". Note that <tt/httpHeaderPackInto/ will pack only
	header-fields; request-lines and status-lines are not
	prepended, and CRLF is not appended. Remember that neither
	of them is a part of HTTP message header as defined by the
	HTTP protocol.


<sect1>Adding new header-field ids.

	<P> 
	Adding new ids is simple. First add new HDR_ entry to the
	http_hdr_type enumeration in enums.h. Then describe a new
	header-field attributes in the HeadersAttrs array located
	in <tt/HttpHeader.c/. The last attribute specifies field
	type. Five types are supported: integer (<em/ftInt/), string
	(<em/ftStr/), date in RFC 1123 format (<em/ftDate_1123/),
	cache control field (<em/ftPCc/), range field (<em/ftPRange/),
	and content range field (<em/ftPContRange/).  Squid uses
	type information to convert internal binary representation
	of fields to their string representation (<tt/httpHeaderPut/
	functions) and visa-versa (<tt/httpHeaderGet/ functions).

	<P>
	Finally, add new id to one of the following arrays:
	<em/GeneralHeadersArr/, <em/EntityHeadersArr/,
	<em/ReplyHeadersArr/, <em/RequestHeadersArr/.  Use HTTP
	specs to determine the applicable array.  If your header-field
	is an "extension-header", its place is in <em/ReplyHeadersArr/
	and/or in <em/RequestHeadersArr/. You can also use
	<em/EntityHeadersArr/ for "extension-header"s that can be
	used both in replies and requests.  Header fields other
	than "extension-header"s must go to one and only one of
	the arrays mentioned above.

	<P>
	Also, if the new field is a "list" header, add it to the
	<em/ListHeadersArr/ array.  A "list" field-header is the
	one that is defined (or can be defined) using "&num;" BNF
	construct described in the HTTP specs. Essentially, a field
	that may have more than one valid field-value in a single
	header is a "list" field.

	<P>
	In most cases, if you forget to include a new field id in
	one of the required arrays, you will get a run-time assertion.
	For rarely used fields, however, it may take a long time
	for an assertion to be triggered.

	<P>
	There is virtually no limit on the number of fields supported
	by Squid. If current mask sizes cannot fit all the ids (you
	will get an assertion if that happens), simply enlarge
	HttpHeaderMask type in <tt/typedefs.h/.


<sect1>A Word on Efficiency.

	<P>
	<tt/httpHeaderHas()/ is a very cheap (fast) operation
	implemented using a bit mask lookup.

	<P>
	Adding new fields is somewhat expensive if they require
	complex conversions to a string.

	<P>
	Deleting existing fields requires scan of all the entries
	and comparing their "id"s (faster) or "names" (slower) with
	the one specified for deletion.

	<P>
	Most of the operations are faster than their "ascii string"
	equivalents.

<sect>File Formats

<sect1><em/swap.state/

<P>
NOTE: this information is current as of version 2.2.STABLE4.

<P>
A <em/swap.state/ entry is defined by the <em/storeSwapLogData/
structure, and has the following elements:
<verb>
struct _storeSwapLogData {
    char op;
    int swap_file_number;
    time_t timestamp;
    time_t lastref;
    time_t expires;
    time_t lastmod;
    size_t swap_file_sz;
    u_short refcount;
    u_short flags;
    unsigned char key[MD5_DIGEST_CHARS];
};
</verb>

<descrip>
<tag/op/
	Either SWAP_LOG_ADD (1) when an object is added to
	the disk storage, or SWAP_LOG_DEL (2) when an object is
	deleted.

<tag/swap_file_number/
	The 32-bit file number which maps to a pathname.  Only
	the low 24-bits are relevant.  The high 8-bits are
	used as an index to an array of storage directories, and
	are set at run time because the order of storage directories
	may change over time.

<tag/timestamp/
	A 32-bit Unix time value that represents the time when
	the origin server generated this response.  If the response
	has a valid <em/Date:/ header, this timestamp corresponds
	to that time.  Otherwise, it is set to the Squid process time
	when the response is read (as soon as the end of headers are
	found).

<tag/lastref/
	The last time that a client requested this object.
	Strictly speaking, this time is set whenver the StoreEntry
	is locked (via <em/storeLockObject()/). 

<tag/expires/
	The value of the response's <em/Expires:/ header, if any.
	If the response does not have an <em/Expires:/ header, this
	is set to -1.  If the response has an invalid (unparseable)
	<em/Expires:/ header, it is also set to -1.  There are some cases
	where Squid sets <em/expires/ to -2.  This happens for the
	internal ``netdb'' object and for FTP URL responses.

<tag/lastmod/
	The value of the response's <em/Last-modified:/ header, if any.
	This is set to -1 if there is no <em/Last-modified:/ header,
	or if it is unparseable.

<tag/swap_file_sz/
	This is the number of bytes that the object occupies on
	disk.  It includes the Squid ``swap file header''.

<tag/refcount/
	The number of times that this object has been accessed (referenced).
	Since its a 16-bit quantity, it is susceptible to overflow
	if a single object is accessed 65,536 times before being replaced.

<tag/flags/
	A copy of the <em/StoreEntry/ flags field.  Used as a sanity
	check when rebuilding the cache at startup.  Objects that
	have the KEY_PRIVATE flag set are not added back to the cache.

<tag/key/
	The 128-bit MD5 hash for this object.
	
</descrip>

Note that <em/storeSwapLogData/ entries are written in native machine
byte order.  They are not necessarily portable across architectures.

<sect>Store ``swap meta'' Description
<p>
``swap meta'' refers to a section of meta data stored at the beginning
of an object that is stored on disk.  This meta data includes information
such as the object's cache key (MD5), URL, and part of the StoreEntry
structure.

<p>
The meta data is stored using a TYPE-LENGTH-VALUE format.  That is,
each chunk of meta information consists of a TYPE identifier, a
LENGTH field, and then the VALUE (which is LENGTH octets long).

<sect1>Types

<p>
As of Squid-2.3, the following TYPES are defined (from <em/enums.h/):
<descrip>
<tag/STORE_META_VOID/
	Just a placeholder for the zeroth value.   It is never used
	on disk.

<tag/STORE_META_KEY_URL/
	This represents the case when we use the URL as the cache
	key, as Squid-1.1 does.  Currently we don't support using
	a URL as a cache key, so this is not used.

<tag/STORE_META_KEY_SHA/
	For a brief time we considered supporting SHA (secure
	hash algorithm) as a cache key.  Nobody liked it, and
	this type is not currently used.

<tag/STORE_META_KEY_MD5/
	This represents the MD5 cache key that Squid currently uses.
	When Squid opens a disk file for reading, it can check that
	this MD5 matches the MD5 of the user's request.  If not, then
	something went wrong and this is probably the wrong object.

<tag/STORE_META_URL/
	The object's URL.  This also may be matched against a user's
	request for cache hits to make sure we got the right object.

<tag/STORE_META_STD/
	This is the ``standard metadata'' for an object.  Really
	its just this middle chunk of the StoreEntry structure:
<verb>
	time_t timestamp;
	time_t lastref;
	time_t expires;
	time_t lastmod;
	size_t swap_file_sz;
	u_short refcount;
	u_short flags;
</verb>

<tag/STORE_META_HITMETERING/
	Reserved for future hit-metering (RFC 2227) stuff.

<tag/STORE_META_VALID/
	?

<tag/STORE_META_END/
	Marks the last valid META type.

</descrip>


<sect1>Implementation Notes

<p>
When writing an object to disk, we must first write the meta data.
This is done with a couple of functions.  First, <tt/storeSwapMetaPack()/
takes a <em/StoreEntry/ as a parameter and returns a <em/tlv/ linked
list.  Second, <tt/storeSwapMetaPack()/ converts the <em/tlv/ list
into a character buffer that we can write.

<p>
Note that the <em/MemObject/ has a member called <em/swap_hdr_sz/.
This value is the size of that character buffer; the size of the
swap file meta data.  The <em/StoreEntry/ has a member named
<em/swap_file_sz/ that represents the size of the disk file.
Thus, the size of the object ``content'' is
<verb>
	StoreEntry->swap_file_sz  - MemObject->swap_hdr_sz;
</verb>
Note that the swap file content includes the HTTP reply headers
and the HTTP reply body (if any).

<p>
When reading a swap file, there is a similar process to extract
the swap meta data.  First, <tt/storeSwapMetaUnpack()/ converts a
character buffer into a <em/tlv/ linked list.  It also tells us
the value for <em/MemObject->swap_hdr_sz/.

</article>