File: FAQ-4.html

package info (click to toggle)
squid 1.1.21-1
  • links: PTS
  • area: main
  • in suites: hamm
  • size: 2,828 kB
  • ctags: 3,705
  • sloc: ansic: 34,400; sh: 1,975; perl: 899; makefile: 559
file content (457 lines) | stat: -rw-r--r-- 18,515 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
<HTML>
<HEAD>
<TITLE>SQUID Frequently Asked Questions: Configuration issues</TITLE>
</HEAD>
<BODY>
<A HREF="FAQ-3.html">Previous</A>
<A HREF="FAQ-5.html">Next</A>
<A HREF="FAQ.html#toc4">Table of Contents</A>
<HR>
<H2><A NAME="s4">4. Configuration issues</A></H2>

<H2><A NAME="ss4.1">4.1 How do I join a cache hierarchy?</A></H2>

<P>To place your cache in a hierarchy, use the <CODE>cache_host</CODE>
directive in <EM>squid.conf</EM> to specify the parent and sibling
nodes.</P>

<P>For example, the following <EM>squid.conf</EM> file on
<CODE>childcache.example.com</CODE> configures its cache to retrieve
data from one parent cache and two sibling caches:</P>
<P>
<PRE>
        #  squid.conf - On the host: childcache.example.com
        #
        #  Format is: hostname  type  http_port  udp_port
        #
        cache_host parentcache.example.com   parent  3128 3130
        cache_host childcache2.example.com   sibling 3128 3130
        cache_host childcache3.example.com   sibling 3128 3130
</PRE>
</P>
<P>The <CODE>cache_host_domain</CODE> directive allows you to specify that
certain caches siblings or parents for certain domains:</P>
<P>
<PRE>
        #  squid.conf - On the host: sv.cache.nlanr.net
        #
        #  Format is: hostname  type  http_port  udp_port
        #

        cache_host electraglide.geog.unsw.edu.au parent 3128 3130
        cache_host cache1.nzgate.net.nz          parent 3128 3130
        cache_host pb.cache.nlanr.net   parent 3128 3130
        cache_host it.cache.nlanr.net   parent 3128 3130
        cache_host sd.cache.nlanr.net   parent 3128 3130
        cache_host uc.cache.nlanr.net   sibling 3128 3130
        cache_host bo.cache.nlanr.net   sibling 3128 3130
        cache_host_domain electraglide.geog.unsw.edu.au .au
        cache_host_domain cache1.nzgate.net.nz   .au .aq .fj .nz
        cache_host_domain pb.cache.nlanr.net     .uk .de .fr .no .se .it
        cache_host_domain it.cache.nlanr.net     .uk .de .fr .no .se .it
        cache_host_domain sd.cache.nlanr.net     .mx .za .mu .zm
</PRE>
</P>
<P>The configuration above indicates that the cache will use
<CODE>pb.cache.nlanr.net</CODE> and <CODE>it.cache.nlanr.net</CODE>
for domains uk, de, fr, no, se and it, <CODE>sd.cache.nlanr.net</CODE>
for domains mx, za, mu and zm, and <CODE>cache1.nzgate.net.nz</CODE>
for domains au, aq, fj, and nz.</P>


<H2><A NAME="ss4.2">4.2 How do I join NLANR's cache hierarchy?</A></H2>

<P>We have a simple set of
<A HREF="http://ircache.nlanr.net/Cache/joining.html">guidelines for joining</A>
the NLANR cache hierarchy.</P>


<H2><A NAME="ss4.3">4.3 Why should I want to join NLANR's cache hierarchy?</A></H2>

<P>The NLANR hierarchy can provide you with an initial source for parent or
sibling caches.  Joining the NLANR global cache system will frequently
improve the performance of your caching service.</P>


<H2><A NAME="ss4.4">4.4 How do I register my cache with NLANR's registration service?</A></H2>

<P>Just enable these options in your <EM>squid.conf</EM> and you'll be
registered:
<PRE>
        cache_announce 24
        announce_to sd.cache.nlanr.net:3131
</PRE>
</P>
<P><EM>NOTE:</EM> announcing your cache <B>is not</B> the same thing as
joining the NLANR cache hierarchy.
You can join the NLANR cache hierarchy without registering, and
you can register without joining the NLANR cache hierarchy.</P>



<H2><A NAME="ss4.5">4.5 How do I find other caches close to me and arrange parent/child/sibling relationships with them?</A></H2>

<P>Visit the NLANR cache
<A HREF="http://ircache.nlanr.net/Cache/Tracker/">registration database</A>
to discover other caches near you.  Keep in mind that just because
a cache is registered in the database <B>does not</B> mean they
are willing to be your parent/sibling/child.  But it can't hurt to ask...</P>



<H2><A NAME="ss4.6">4.6 My cache registration is not appearing in the Tracker database.</A></H2>

<P>
<UL>
<LI>Your site will not be listed if your cache IP address does not have
a DNS PTR record. If we can't map the IP address back to a domain
name, it will be listed as ``Unknown.''</LI>
<LI>The registration messages are sent with UDP. We may not be receiving
your announcement message due to firewalls which block UDP, or
dropped packets due to congestion.</LI>
</UL>
</P>


<H2><A NAME="ss4.7">4.7 What is the httpd-accelerator mode?</A></H2>

<P>Occasionally people have trouble understanding accelerators and
proxy caches, usually resulting from mixed up interpretations of
"incoming" and ``outgoing" data.  I think in terms of requests (i.e.,
an outgoing request is from the local site out to the big bad
Internet)  The data received in reply is incoming, of course.
Others think in the opposite sense of ``a request for incoming data".</P>

<P>An accelerator caches incoming requests for outgoing data (i.e.,
that which you publish to the world).  It takes load away from your
HTTP server and internal network.  You move the server away from
port 80 (or whatever your published port is), and substitute the
accelerator, which then pulls the HTTP data from the ``real"
HTTP server (only the accelerator needs to know where the real
server is).  The outside world sees no difference (apart from an
increase in speed, with luck).</P>

<P>Quite apart from taking the load of a site's normal web server,
accelerators can also sit outside firewalls or other network
bottlenecks and talk to HTTP servers inside, reducing traffic across
the bottleneck and simplifying the configuration.  Two or more
accelerators communicating via ICP can increase the speed and
resilience of a web service to any single failure.</P>

<P>The Squid redirector can make one accelerator act as a single
front-end for multiple servers.  If you need to move parts of your
filesystem from one server to another, or if separately administered
HTTP servers should logically appear under a single URL hierarchy,
the accelerator makes the right thing happen.</P>

<P>If you wish only to cache the ``rest of the world" to improve local users
browsing performance, then accelerator mode is irrelevant.  Sites which
own and publish a URL hierarchy use an accelerator to improve other
sites' access to it.  Sites wishing to improve their local users' access
to other sites' URLs use proxy caches.  Many sites, like us, do both and
hence run both.</P>

<P>Measurement of the Squid cache and its Harvest counterpart suggest an
order of magnitude performance improvement over CERN or other widely
available caching software.  This order of magnitude performance
improvement on hits suggests that the cache can serve as an httpd
accelerator, a cache configured to act as a site's primary httpd server
(on port 80), forwarding references that miss to the site's real httpd
(on port 81).</P>

<P>In such a configuration, the web administrator renames all
non-cacheable URLs to the httpd's port (81).  The cache serves
references to cacheable objects, such as HTML pages and GIFs, and
the true httpd (on port 81) serves references to non-cacheable
objects, such as queries and cgi-bin programs.  If a site's usage
characteristics tend toward cacheable objects, this configuration
can dramatically reduce the site's web workload.</P>

<P>Note that it is best not to run a single <EM>squid</EM> process as
both an httpd-accelerator and a proxy cache, since these two modes
will have different working sets. You will get better performance
by running two separate caches on separate machines. However, for
compatability with how administrators are accustomed to running
other servers that provide both proxy and Web serving capability
(eg, CERN), the Squid supports operation as both a proxy and
an accelerator if you set the <CODE>httpd_accel_with_proxy</CODE>
variable to <CODE>on</CODE> inside your <EM>squid.conf</EM>
configuration file.</P>


<H2><A NAME="ss4.8">4.8 How do I configure Squid to work behind a firewall?</A></H2>

<P>If you are behind a firewall then you can't make direct connections
to the outside world, so you <B>must</B> use a
parent cache.  Squid doesn't use ICP queries for a request if it's
behind a firewall or if there is only one parent.</P>

<P>You can use the <CODE>inside_firewall</CODE> directive in
<EM>squid.conf</EM> to specify a list of domains internal to your
Internet firewall.  For example:
<PRE>
        inside_firewall example.com
</PRE>

You can also specify multiple domains:
<PRE>
        inside_firewall example.com example.org example.net
</PRE>
</P>
<P>The use of <CODE>inside_firewall</CODE> affects the server selection
algorithm in two ways.  Objects not matching any of the listed
domains will be considered beyond the firewall. For these:</P>
<P>
<UL>
<LI>There will be no DNS lookups for the URL-host.</LI>
<LI>The cache will always fetch the object from one of the parent
or sibling caches.</LI>
</UL>
</P>
<P>As a special case you may specify the domain as <CODE>none</CODE> to
force all requests to be fetched from siblings and parents.</P>



<H2><A NAME="ss4.9">4.9 I have <EM>dnsserver</EM> processes that aren't being used, should I lower the number in <EM>squid.conf</EM>?</A></H2>

<P>The <EM>dnsserver</EM> processes are used by <EM>squid</EM> because the <CODE>gethostbyname(3)</CODE> library routines used to
convert web sites names to their internet addresses
blocks until the function returns (i.e., the process that calls
it has to wait for a reply). Since there is only one <EM>squid</EM>
process, everyone who uses the cache would have to wait each
time the routine was called.  This is why the <EM>dnsserver</EM> is
a separate process, so that these processes can block,
without causing blocking in <EM>squid</EM>.</P>

<P>It's very important that there are enough <EM>dnsserver</EM>
processes to cope with every access you will need, otherwise
<EM>squid</EM> will stop occasionally.  A good rule of thumb is to
make sure you have at least the maximum number of dnsservers
<EM>squid</EM> has <B>ever</B> needed on your system,
and probably add two to be on the safe side. In other words, if
you have only ever seen at most three <EM>dnsserver</EM> processes
in use, make at least five.  Remember that a <EM>dnsserver</EM> is
small and, if unused, will be swapped out.</P>


<H2><A NAME="ss4.10">4.10 Does Squid support Socks?</A></H2>

<P><I>We would like to use Squid, but we need it to use socks to connect to
the world outside our firewall.</I></P>
<P>No changes are necessary to use Squid with socks5.
Simply add the usual <CODE>-Dbind=SOCKSbind</CODE> etc., to the compile line and
<CODE>-lsocks</CODE> to the link line.</P>
<P>
<BLOCKQUOTE>
--- Carson Gaspar (carson@cugc.org)
</BLOCKQUOTE>
</P>


<H2><A NAME="ss4.11">4.11 How does Squid decide when to refresh a cached object?</A></H2>


<P>
<A HREF="mailto:bertold@tohotom.vein.hu">Kolics Bertold</A>
has made an excellent
<A HREF="http://squid.nlanr.net/Squid/FAQ/refresh-flowchart.gif">flow chart diagram</A> showing this process.</P>


<H2><A NAME="ss4.12">4.12 How can I easily change the default HTTP port?</A></H2>

<P>Before you run the configure script, simply set the <EM>CACHE_HTTP_PORT</EM>
environment variable.
<PRE>
        setenv CACHE_HTTP_PORT 8080
        ./configure
        make
        make install
</PRE>
</P>


<H2><A NAME="ss4.13">4.13 Is it possible to control how big each cache_dir is?</A></H2>

<P>With Squid-1.1 it is NOT possible.  Each <EM>cache_dir</EM> is assumed
to be the same size.  The <EM>cache_swap</EM> setting defines the size of
all <EM>cache_dir</EM>'s taken together.  If you have N <EM>cache_dir</EM>'s
then each one will hold <EM>cache_swap</EM> &divide; N Megabytes.</P>


<H2><A NAME="ss4.14">4.14 Squid and <EM>http-gw</EM> from the TIS toolkit.</A></H2>

<P>Several people on both the <EM>fwtk-users</EM> and the
<EM>squid-users</EM> mailing asked
about using Squid in combination with http-gw from the
<A HREF="http://www.tis.com/">TIS toolkit</A>.
The most elegant way in my opinion is to run an internal Squid caching
proxyserver which handles client requests and let this server forward
it's requests to the http-gw running on the firewall. Cache hits won't
need to be handled by the firewall.</P>

<P>In this example Squid runs on the same server as the http-gw, Squid uses
8000 and http-gw uses 8080 (web).  The local domain is <EM>home.nl</EM>.</P>

<H3>Firewall configuration:</H3>

<P>Either run http-gw as a daemon from the <EM>/etc/rc.d/rc.local</EM> (Linux
Slackware):
<PRE>
        exec /usr/local/fwtk/http-gw -daemon 8080
</PRE>

or run it from inetd like this:
<PRE>
        web stream      tcp      nowait.100  root /usr/local/fwtk/http-gw http-gw
</PRE>

I increased the watermark to 100 because a lot of people run into
problems with the default value.</P>

<P>Make sure you have at least the following line in
<EM>/usr/local/etc/netperm-table</EM>:
<PRE>
        http-gw:        hosts 127.0.0.1
</PRE>

You could add the IP-address of your own workstation to this rule and
make sure the http-gw by itself workstest, like:
<PRE>
        http-gw:                hosts 127.0.0.1 10.0.0.1
</PRE>
</P>

<H3>Squid configuration:</H3>

<P>The following settings are important:</P>
<P>
<PRE>
        http_port       8000
        icp_port        0

        cache_host      localhost.home.nl parent 8080 0 default
        inside_firewall   home.nl
</PRE>

This tells Squid to use the parent for all domains other than <EM>home.nl</EM>.
Below, <EM>access.log</EM> entries show what happens if you do a reload on the
Squid-homepage:</P>
<P>
<PRE>
872739961.631   1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://squid.nlanr.net/ - DEFAULT_PARENT/localhost.home.nl -
872739962.976   1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.007   1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://squid.nlanr.net/Squid/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.061   1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://squid.nlanr.net/Squid/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl 
</PRE>
</P>

<P>http-gw entries in syslog:</P>
<P>
<PRE>
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=squid.nlanr.net path=/
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=squid.nlanr.net path=/Squid/Squidlogo2.gif
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=squid.nlanr.net path=/Squid/squidnow.gif
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
</PRE>
</P>


<P>To summarize:</P>

<P>Advantages:
<UL>
<LI>http-gw allows you to selectively block ActiveX and Java, and it's
primary design goal is security.</LI>
<LI>The firewall doesn't need to run large applications like Squid.</LI>
<LI>The internal Squid-server still gives you the benefit of caching.</LI>
</UL>
</P>

<P>Disadvantages:
<UL>
<LI>The internal Squid proxyserver can't (and shouldn't) work with other
parent or neighbor caches.</LI>
<LI>Initial requests are slower because these go through http-gw, http-gw
also does reverse lookups. Run a nameserver on the firewall or use an
internal nameserver.</LI>
</UL>
</P>

<P>
<BLOCKQUOTE>
--
<A HREF="mailto:RvdOever@baan.nl">Rodney van den Oever</A></BLOCKQUOTE>
</P>



<H2><A NAME="ss4.15">4.15 What is ``HTTP_X_FORWARDED_FOR''?  Why does squid provide it to WWW servers, and how can I stop it?</A></H2>

<P>When a proxy-cache is used, a server does not see the connection
coming from the originating client.  Many people like to implement
access controls based on the client address.
To accomodate these people, Squid adds its own request header
called "X-Forwarded-For" which looks like this:
<PRE>
        X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
</PRE>

Entries are always IP addresses, or the word <EM>unknown</EM> if the address
could not be determined or if it has been disabled with the
<EM>forwarded_for</EM> configuration option.</P>

<P>We must note that access controls based on this header are extremely
weak and simple to fake.  Anyone may hand-enter a request with any IP
address whatsoever.  This is perhaps the reason why client IP addresses
have been omitted from the HTTP/1.1 specification.</P>



<H2><A NAME="ss4.16">4.16 Can I use the redirector to return HTTP redirect messages?</A></H2>

<P>Normally, the <EM>redirector</EM> feature is used to rewrite requested URLs.
Squid then transparently requests the new URL.  However, in some situations,
it may be desirable to return an HTTP "301" or "302" redirect message
to the client.  This is now possible with Squid version 1.1.19.</P>

<P>Simply modify your redirector program to append either "301:" or "302:"
before the new URL.  For example, the following script might be used
to direct external clients to a secure Web server for internal documents:
<PRE>
#!/usr/local/bin/perl
$|=1;
        while (&lt;&gt;) {
                @X = split;
                $url = $X[0];
                if ($url =~ /^http:\/\/internal\.foo\.com/) {
                        $url =~ s/^http/https/;
                        $url =~ s/internal/secure/;
                        print &quot;302:$url\n&quot;;
                } else {
                        print &quot;$url\n&quot;;
                }
        }
</PRE>
</P>

<P>Please see sections 10.3.2 and 10.3.3 of
<A HREF="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt">RFC 2068</A>
for an explanation of the 301 and 302 HTTP reply codes.</P>



<HR>
<A HREF="FAQ-3.html">Previous</A>
<A HREF="FAQ-5.html">Next</A>
<A HREF="FAQ.html#toc4">Table of Contents</A>
</BODY>
</HTML>