File: FAQ-11.html

package info (click to toggle)
squid 1.1.21-1
  • links: PTS
  • area: main
  • in suites: hamm
  • size: 2,828 kB
  • ctags: 3,705
  • sloc: ansic: 34,400; sh: 1,975; perl: 899; makefile: 559
file content (278 lines) | stat: -rw-r--r-- 11,728 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
<HTML>
<HEAD>
<TITLE>SQUID Frequently Asked Questions: How does Squid work?</TITLE>
</HEAD>
<BODY>
<A HREF="FAQ-10.html">Previous</A>
<A HREF="FAQ-12.html">Next</A>
<A HREF="FAQ.html#toc11">Table of Contents</A>
<HR>
<H2><A NAME="s11">11. How does Squid work?</A></H2>

<H2><A NAME="ss11.1">11.1 What are cachable objects?</A></H2>

<P>An Internet Object is a file, docuument or response to a query for
an Internet service such as FTP, HTTP, or gopher.  A client requests
an Internet object from a caching proxy; the proxy server fetches
the object (either from the host specified in the URL or from a
parent or sibling cache) delivers it to the client.</P>


<H2><A NAME="what-is-icp"></A> <A NAME="ss11.2">11.2 What is the ICP protocol?</A></H2>

<P>ICP is a protocol used for communication among squid caches.
The ICP protocol is defined in two Internet Drafts (soon
to be RFC's).  One document describes
version 2 draft document.  You can find it at the
<A HREF="http://ircache.nlanr.net/Cache/ICP/icpv2-protocol.txt">the protocol itself</A>
and another describes
<A HREF="http://ircache.nlanr.net/Cache/ICP/icpv2-application.txt">the application of ICP</A>
to hierarchical Web caching.</P>

<P>ICP is primarily used within a cache hierarchy to locate specific
objects in sibling caches.  If a squid cache does not have a
requested document, it sends an ICP query to its siblings, and the
siblings respond with ICP replies indicating a ``HIT'' or a ``MISS.''
The cache then uses the replies to choose from which cache to
resolve its own MISS.</P>

<P>ICP also supports multiplexed transmission of multiple object
streams over a single TCP connection.  ICP is currently implemented
on top of UDP.  Current versions of Squid also support ICP via
multicast.</P>


<H2><A NAME="ss11.3">11.3 What is the <EM>dnsserver</EM>?</A></H2>

<P>The <EM>dnsserver</EM> is a process forked by <EM>squid</EM> to
resolve IP addresses from domain names.  This is necessary because
the <CODE>gethostbyname(3)</CODE> function blocks the calling process
until the DNS query is completed.</P>
<P>Squid must use non-blocking I/O at all times, so DNS lookups are
implemented external to the main process.  The <EM>dnsserver</EM>
processes do not cache DNS lookups, that is implemented inside the
<EM>squid</EM> process.</P>



<H2><A NAME="ss11.4">11.4 What is the <EM>ftpget</EM> program for?</A></H2>

<P>The <EM>ftpget</EM> program is an FTP client used for retrieving
files from FTP servers.  Because the FTP protocol is complicated,
it is easier to implement it separately from the main <EM>squid</EM>
code.</P>



<H2><A NAME="ss11.5">11.5 FTP PUT's don't work;</A></H2>

<P><EM>It seems that FTP puts don't work through squid.  Is there a fix 
and/or work-in-progress for this?</EM></P>

<P>Not at the moment; supporting this would require an <EM>ftpput</EM>
program.  </P>


<H2><A NAME="ss11.6">11.6 What is a cache hierarchy?  What are parents and siblings?</A></H2>


<P>A cache hierarchy is a collection of caching proxy servers organized
in a logical parent/child and sibling arrangement so that caches
closest to Internet gateways (closest to the backbone transit
entry-points) act as parents to caches at locations farther from
the backbone.  The parent caches resolve ``misses'' for their children.
In other words, when a cache requests an object from its parent,
and the parent does not have the object in its cache, the parent
fetches the object, caches it, and delivers it to the child.  This
ensures that the hierarchy achieves the maximum reduction in
bandwidth utilization on the backbone transit links, helps reduce
load on Internet information servers outside the network served by
the hierarchy, and builds a rich cache on the parents so that the
other child caches in the hierarchy will obtain better ``hit'' rates
against their parents.</P>

<P>In addition to the parent-child relationships, squid supports the
notion of siblings:  caches at the same level in the hierarchy,
provided to distribute cache server load.  Each cache in the
hierarchy independently decides whether to fetch the reference from
the object's home site or from parent or sibling caches, using a
a simple resolution protocol.  Siblings will not fetch an object
for another sibling to resolve a cache ``miss.''</P>


<H2><A NAME="ss11.7">11.7 What is the Squid cache resolution algorithm?</A></H2>


<P>
<UL>
<LI>Send ICP queries to all appropriate siblings</LI>
<LI>Wait for all replies to arrive with a configurable timeout
(the default is two seconds).</LI>
<LI>Begin fetching the object upon receipt of the first HIT reply,
or</LI>
<LI>Fetch the object from the first parent which replied with MISS
(subject to weighting values), or</LI>
<LI>Fetch the object from the source</LI>
</UL>
</P>

<P>The algorithm is somewhat more complicated when firewalls
are involved.</P>

<P>The <CODE>single_parent_bypass</CODE> directive can be used to skip
the ICP queries if the only appropriate sibling is a parent cache
(i.e., if there's only one place you'd fetch the object from, why
bother querying?)</P>


<H2><A NAME="ss11.8">11.8 What features are Squid developers currently working on?</A></H2>


<P>There are several open issues for the caching project namely
more automatic load balancing and (both configured and
dynamic) selection of parents, routing, multicast 
cache-to-cache communication, and better recognition of URLs 
that are not worth caching.</P>
<P>The current 
<A HREF="http://squid.nlanr.net/Squid/Devel/todo.html">Squid Developers to-do list</A>
is available for your reading enjoyment.</P>

<P>Prospective developers should review the resources available at the
<A HREF="http://squid.nlanr.net/Squid/Devel/">Squid developers corner</A></P>


<H2><A NAME="ss11.9">11.9 Tell me more about Internet traffic workloads</A></H2>


<P>Workload can be characterized as the burden a client or
group of clients imposes on system.  Understanding the
nature of workloads is important to the managing system
capacity.</P>
<P>If you are interested in Internet traffic workloads then NLANR's
<A HREF="http://www.nlanr.net/NA/">Network Analysis activities</A> is a good place to start.</P>


<H2><A NAME="ss11.10">11.10 What are the tradeoffs of caching with the NLANR cache system?</A></H2>


<P>The NLANR root caches are at the NSF supercomputer centers (SCCs),
which are interconnected via NSF's high speed backbone service
(vBNS).  So inter-cache communication between the NLANR root caches
does not cross the Internet.</P>

<P>The benefits of hierarchical caching (namely, reduced network
bandwidth consumption, reduced access latency, and improved
resiliency) come at a price.  Caches higher in the hierarchy must
field  the misses of their descendents. If the equilibrium hit rate
of a leaf cache is 50%, half of all leaf references have to be
resolved through a second level cache rather than directly from
the object's source.  If this second level cache has most of the
documents, it is usually still a win, but if higher level caches
often don't have the document, or become overloaded, then they
could actually increase access latency, rather than reduce it.</P>



<H2><A NAME="ss11.11">11.11 Where can I find out more about firewalls?</A></H2>

<P>Please see the
<A HREF="http://www.greatcircle.com/firewalls/">Firewalls mailing list and FAQ</A>
information site.</P>


<H2><A NAME="ss11.12">11.12 What is the ``Storage LRU Expiration Age?''</A></H2>

<P>For example:
<PRE>
        Storage LRU Expiration Age:      4.31 days
</PRE>
</P>

<P>The LRU expiration age is a dynamically-calculated value.  Any objects
which have not been accessed for this amount of time will be removed from
the cache to make room for new, incoming objects.  Another way of looking
at this is that it would 
take your cache approximately this many days to go from empty to full at
your current traffic levels.</P>

<P>As your cache becomes more busy, the LRU age becomes lower so that more
objects will be removed to make room for the new ones.  Ideally, your
cache ill have an LRU age value in the range of at least 3 days.  If the
LRU age is lower than 3 days, then your cache is probably not big enough
to handle the volume of requests it receives.  By adding more disk space
you could increase your cache hit ratio.</P>

<P>The configuration parameter <EM>reference_age</EM> places an upper limit on 
your cache's LRU expiration age.</P>


<H2><A NAME="ss11.13">11.13 What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''?</A></H2>

<P>Consider a pair of caches named A and B.  It may be the case that A can
reach B, and vice-versa, but B has poor reachability to the rest of the
Internet.  
In this case, we would like B to recognize that it has poor reachability
and somehow convey this fact to its neighbor caches.</P>

<P>Squid will track the ratio of failed-to-successful requests over short
time periods.  A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR.  When the failed-to-successful ratio exceeds 1.0,
then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors.
Note, Squid will still return ICP_HIT for cache hits.</P>


<H2><A NAME="ss11.14">11.14 Does squid perodically re-read its configuration file?</A></H2>

<P>No, you must send a HUP signal to have Squid re-read its configuration file,
including access control lists.  An easy way to do this is with the <EM>-k</EM>
command line option:
<PRE>
        squid -k reconfigure
</PRE>
</P>


<H2><A NAME="ss11.15">11.15 How does <EM>unlinkd</EM> work?</A></H2>

<P><EM>unlinkd</EM> is an external process used for unlinking unused cache files.
Performing the unlink operation in an external process opens up some 
race-condition problems for Squid.  If we are not careful, the following
sequence of events could occur:
<OL>
<LI>An object with swap file number <B>S</B> is removed from the cache.  </LI>
<LI>We want to unlink file <B>F</B> which corresponds to swap file number <B>S</B>,
so we write pathname <B>F</B> to the <EM>unlinkd</EM> socket.
We also mark <B>S</B> as available in the filemap.</LI>
<LI>We have a new object to swap out.  It is allocated to the first available
file number, which happens to be <B>S</B>.  Squid opens file <B>F</B> for writing.</LI>
<LI>The <EM>unlinkd</EM> process reads the request to unlink <B>F</B> and issues the
actual unlink call. </LI>
</OL>
</P>
<P>So, the problem is, how can we guarantee that <EM>unlinkd</EM> will not
remove a cache file that Squid has recently allocated to a new object?
The approach we have taken is to have Squid keep a stack of unused (but
not deleted!)  swap file numbers.  The stack size is hard-coded at 128
entries.  We only give unlink requests to <EM>unlinkd</EM> when the unused
file number stack is full.  Thus, if we ever have to start unlinking
files, we have a pool of 128 file numbers to choose from which we know
will not be removed by <EM>unlinkd</EM>.</P>

<P>In terms of implementation, the only way to send unlink requests to
the <EM>unlinkd</EM> process is via the <EM>storePutUnusedFileno</EM> function.</P>

<P>Unfortunately there are times when Squid can not use the <EM>unlinkd</EM> process
but must call <EM>unlink(2)</EM> directly.  One of these times is when the cache
swap size is over the high water mark.  If we push the released file numbers
onto the unused file number stack, and the stack is not full, then no files
will be deleted, and the actual disk usage will remain unchanged.  So, when
we exceed the high water mark, we must call <EM>unlink(2)</EM> directly.</P>




<HR>
<A HREF="FAQ-10.html">Previous</A>
<A HREF="FAQ-12.html">Next</A>
<A HREF="FAQ.html#toc11">Table of Contents</A>
</BODY>
</HTML>