File: FAQ-12.html

package info (click to toggle)
squid 2.6.5-6etch5
  • links: PTS
  • area: main
  • in suites: etch
  • size: 12,540 kB
  • ctags: 13,801
  • sloc: ansic: 105,278; sh: 6,083; makefile: 1,297; perl: 1,245; awk: 40
file content (1065 lines) | stat: -rw-r--r-- 47,114 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.21">
 <TITLE>SQUID Frequently Asked Questions: How does Squid work?</TITLE>
 <LINK HREF="FAQ-13.html" REL=next>
 <LINK HREF="FAQ-11.html" REL=previous>
 <LINK HREF="FAQ.html#toc12" REL=contents>
</HEAD>
<BODY>
<A HREF="FAQ-13.html">Next</A>
<A HREF="FAQ-11.html">Previous</A>
<A HREF="FAQ.html#toc12">Contents</A>
<HR>
<H2><A NAME="memory"></A> <A NAME="s12">12.</A> <A HREF="FAQ.html#toc12">How does Squid work?</A></H2>

<H2><A NAME="ss12.1">12.1</A> <A HREF="FAQ.html#toc12.1">What are cachable objects?</A>
</H2>

<P>An Internet Object is a file, document or response to a query for
an Internet service such as FTP, HTTP, or gopher.  A client requests
an Internet object from a caching proxy; if the object
is not already cached, the proxy server fetches
the object (either from the host specified in the URL or from a
parent or sibling cache) and delivers it to the client.</P>

<H2><A NAME="what-is-icp"></A> <A NAME="ss12.2">12.2</A> <A HREF="FAQ.html#toc12.2">What is the ICP protocol?</A>
</H2>

<P>ICP is a protocol used for communication among squid caches.
The ICP protocol is defined in two Internet RFC's.
<A HREF="http://www.ircache.net/Cache/ICP/rfc2186.txt">RFC 2186</A>
describes the protocol itself, while
<A HREF="http://www.ircache.net/Cache/ICP/rfc2187.txt">RFC 2187</A>
describes the application of ICP to hierarchical Web caching.</P>

<P>ICP is primarily used within a cache hierarchy to locate specific
objects in sibling caches.  If a squid cache does not have a
requested document, it sends an ICP query to its siblings, and the
siblings respond with ICP replies indicating a ``HIT'' or a ``MISS.''
The cache then uses the replies to choose from which cache to
resolve its own MISS.</P>

<P>ICP also supports multiplexed transmission of multiple object
streams over a single TCP connection.  ICP is currently implemented
on top of UDP.  Current versions of Squid also support ICP via
multicast.</P>

<H2><A NAME="ss12.3">12.3</A> <A HREF="FAQ.html#toc12.3">What is the <EM>dnsserver</EM>?</A>
</H2>

<P>The <EM>dnsserver</EM> is a process forked by <EM>squid</EM> to
resolve IP addresses from domain names.  This is necessary because
the <CODE>gethostbyname(3)</CODE> function blocks the calling process
until the DNS query is completed.</P>
<P>Squid must use non-blocking I/O at all times, so DNS lookups are
implemented external to the main process.  The <EM>dnsserver</EM>
processes do not cache DNS lookups, that is implemented inside the
<EM>squid</EM> process.</P>


<H2><A NAME="ss12.4">12.4</A> <A HREF="FAQ.html#toc12.4">What is the <EM>ftpget</EM> program for?</A>
</H2>

<P><EM>ftpget</EM> exists only in Squid 1.1 and Squid 1.0 versions.</P>
<P>The <EM>ftpget</EM> program is an FTP client used for retrieving
files from FTP servers.  Because the FTP protocol is complicated,
it is easier to implement it separately from the main <EM>squid</EM>
code.</P>


<H2><A NAME="ss12.5">12.5</A> <A HREF="FAQ.html#toc12.5">FTP PUT's don't work!</A>
</H2>

<P>FTP PUT should work with Squid-2.0 and later versions.  If you
are using Squid-1.1, then you need to upgrade before PUT will work.</P>

<H2><A NAME="ss12.6">12.6</A> <A HREF="FAQ.html#toc12.6">What is a cache hierarchy?  What are parents and siblings?</A>
</H2>


<P>A cache hierarchy is a collection of caching proxy servers organized
in a logical parent/child and sibling arrangement so that caches
closest to Internet gateways (closest to the backbone transit
entry-points) act as parents to caches at locations farther from
the backbone.  The parent caches resolve ``misses'' for their children.
In other words, when a cache requests an object from its parent,
and the parent does not have the object in its cache, the parent
fetches the object, caches it, and delivers it to the child.  This
ensures that the hierarchy achieves the maximum reduction in
bandwidth utilization on the backbone transit links, helps reduce
load on Internet information servers outside the network served by
the hierarchy, and builds a rich cache on the parents so that the
other child caches in the hierarchy will obtain better ``hit'' rates
against their parents.</P>

<P>In addition to the parent-child relationships, squid supports the
notion of siblings:  caches at the same level in the hierarchy,
provided to distribute cache server load.  Each cache in the
hierarchy independently decides whether to fetch the reference from
the object's home site or from parent or sibling caches, using a
a simple resolution protocol.  Siblings will not fetch an object
for another sibling to resolve a cache ``miss.''</P>

<H2><A NAME="ss12.7">12.7</A> <A HREF="FAQ.html#toc12.7">What is the Squid cache resolution algorithm?</A>
</H2>


<P>
<UL>
<LI>Send ICP queries to all appropriate siblings</LI>
<LI>Wait for all replies to arrive with a configurable timeout
(the default is two seconds).</LI>
<LI>Begin fetching the object upon receipt of the first HIT reply,
or</LI>
<LI>Fetch the object from the first parent which replied with MISS
(subject to weighting values), or</LI>
<LI>Fetch the object from the source</LI>
</UL>
</P>

<P>The algorithm is somewhat more complicated when firewalls
are involved.</P>

<P>The <CODE>single_parent_bypass</CODE> directive can be used to skip
the ICP queries if the only appropriate sibling is a parent cache
(i.e., if there's only one place you'd fetch the object from, why
bother querying?)</P>

<H2><A NAME="ss12.8">12.8</A> <A HREF="FAQ.html#toc12.8">What features are Squid developers currently working on?</A>
</H2>


<P>There are several open issues for the caching project namely
more automatic load balancing and (both configured and
dynamic) selection of parents, routing, multicast
cache-to-cache communication, and better recognition of URLs
that are not worth caching.</P>
<P>For our other to-do list items, please
see our ``TODO'' file in the recent source distributions.</P>

<P>Prospective developers should review the resources available at the
<A HREF="http://www.squid-cache.org/Devel/">Squid developers corner</A></P>

<H2><A NAME="ss12.9">12.9</A> <A HREF="FAQ.html#toc12.9">Tell me more about Internet traffic workloads</A>
</H2>


<P>Workload can be characterized as the burden a client or
group of clients imposes on a system.  Understanding the
nature of workloads is important to the managing system
capacity.</P>
<P>If you are interested in Internet traffic workloads then NLANR's
<A HREF="http://www.nlanr.net/NA/">Network Analysis activities</A> is a good place to start.</P>

<H2><A NAME="ss12.10">12.10</A> <A HREF="FAQ.html#toc12.10">What are the tradeoffs of caching with the NLANR cache system?</A>
</H2>


<P>The NLANR root caches are at the NSF supercomputer centers (SCCs),
which are interconnected via NSF's high speed backbone service
(vBNS).  So inter-cache communication between the NLANR root caches
does not cross the Internet.</P>

<P>The benefits of hierarchical caching (namely, reduced network
bandwidth consumption, reduced access latency, and improved
resiliency) come at a price.  Caches higher in the hierarchy must
field  the misses of their descendents. If the equilibrium hit rate
of a leaf cache is 50%, half of all leaf references have to be
resolved through a second level cache rather than directly from
the object's source.  If this second level cache has most of the
documents, it is usually still a win, but if higher level caches
often don't have the document, or become overloaded, then they
could actually increase access latency, rather than reduce it.</P>


<H2><A NAME="ss12.11">12.11</A> <A HREF="FAQ.html#toc12.11">Where can I find out more about firewalls?</A>
</H2>

<P>Please see the
<A HREF="http://www.faqs.org/faqs/firewalls-faq/">Firewalls FAQ</A>
information site.</P>

<H2><A NAME="ss12.12">12.12</A> <A HREF="FAQ.html#toc12.12">What is the ``Storage LRU Expiration Age?''</A>
</H2>

<P>For example:
<PRE>
        Storage LRU Expiration Age:      4.31 days
</PRE>
</P>

<P>The LRU expiration age is a dynamically-calculated value.  Any objects
which have not been accessed for this amount of time will be removed from
the cache to make room for new, incoming objects.  Another way of looking
at this is that it would
take your cache approximately this many days to go from empty to full at
your current traffic levels.</P>

<P>As your cache becomes more busy, the LRU age becomes lower so that more
objects will be removed to make room for the new ones.  Ideally, your
cache will have an LRU age value in the range of at least 3 days.  If the
LRU age is lower than 3 days, then your cache is probably not big enough
to handle the volume of requests it receives.  By adding more disk space
you could increase your cache hit ratio.</P>

<P>The configuration parameter <EM>reference_age</EM> places an upper limit on
your cache's LRU expiration age.</P>

<H2><A NAME="ss12.13">12.13</A> <A HREF="FAQ.html#toc12.13">What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''?</A>
</H2>

<P>Consider a pair of caches named A and B.  It may be the case that A can
reach B, and vice-versa, but B has poor reachability to the rest of the
Internet.
In this case, we would like B to recognize that it has poor reachability
and somehow convey this fact to its neighbor caches.</P>

<P>Squid will track the ratio of failed-to-successful requests over short
time periods.  A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR.  When the failed-to-successful ratio exceeds 1.0,
then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors.
Note, Squid will still return ICP_HIT for cache hits.</P>

<H2><A NAME="ss12.14">12.14</A> <A HREF="FAQ.html#toc12.14">Does squid periodically re-read its configuration file?</A>
</H2>

<P>No, you must send a HUP signal to have Squid re-read its configuration file,
including access control lists.  An easy way to do this is with the <EM>-k</EM>
command line option:
<PRE>
        squid -k reconfigure
</PRE>
</P>

<H2><A NAME="ss12.15">12.15</A> <A HREF="FAQ.html#toc12.15">How does <EM>unlinkd</EM> work?</A>
</H2>

<P><EM>unlinkd</EM> is an external process used for unlinking unused cache files.
Performing the unlink operation in an external process opens up some
race-condition problems for Squid.  If we are not careful, the following
sequence of events could occur:
<OL>
<LI>An object with swap file number <B>S</B> is removed from the cache.</LI>
<LI>We want to unlink file <B>F</B> which corresponds to swap file number <B>S</B>,
so we write pathname <B>F</B> to the <EM>unlinkd</EM> socket.
We also mark <B>S</B> as available in the filemap.</LI>
<LI>We have a new object to swap out.  It is allocated to the first available
file number, which happens to be <B>S</B>.  Squid opens file <B>F</B> for writing.</LI>
<LI>The <EM>unlinkd</EM> process reads the request to unlink <B>F</B> and issues the
actual unlink call.</LI>
</OL>
</P>
<P>So, the problem is, how can we guarantee that <EM>unlinkd</EM> will not
remove a cache file that Squid has recently allocated to a new object?
The approach we have taken is to have Squid keep a stack of unused (but
not deleted!)  swap file numbers.  The stack size is hard-coded at 128
entries.  We only give unlink requests to <EM>unlinkd</EM> when the unused
file number stack is full.  Thus, if we ever have to start unlinking
files, we have a pool of 128 file numbers to choose from which we know
will not be removed by <EM>unlinkd</EM>.</P>

<P>In terms of implementation, the only way to send unlink requests to
the <EM>unlinkd</EM> process is via the <EM>storePutUnusedFileno</EM> function.</P>

<P>Unfortunately there are times when Squid can not use the <EM>unlinkd</EM> process
but must call <EM>unlink(2)</EM> directly.  One of these times is when the cache
swap size is over the high water mark.  If we push the released file numbers
onto the unused file number stack, and the stack is not full, then no files
will be deleted, and the actual disk usage will remain unchanged.  So, when
we exceed the high water mark, we must call <EM>unlink(2)</EM> directly.</P>

<H2><A NAME="ss12.16">12.16</A> <A HREF="FAQ.html#toc12.16">What is an icon URL?</A>
</H2>

<P>One of the most unpleasant things Squid must do is generate HTML
pages of Gopher and FTP directory listings.  For some strange
reason, people like to have little <EM>icons</EM> next to each
listing entry, denoting the type of object to which the
link refers (image, text file, etc.).</P>

<P>In Squid 1.0 and 1.1, we used internal browser icons with names
like <EM>gopher-internal-image</EM>.  Unfortunately, these were
not very portable.  Not all browsers had internal icons, or
even used the same names.  Perhaps only Netscape and Mosaic
used these names.</P>

<P>For Squid 2 we include a set of icons in the source distribution.
These icon files are loaded by Squid as cached objects at runtime.
Thus, every Squid cache now has its own icons to use in Gopher and FTP
listings.  Just like other objects available on the web, we refer to
the icons with
<A HREF="ftp://ftp.isi.edu/in-notes/rfc1738.txt">Uniform Resource Locators</A>, or <EM>URLs</EM>.</P>

<H2><A NAME="ss12.17">12.17</A> <A HREF="FAQ.html#toc12.17">Can I make my regular FTP clients use a Squid cache?</A>
</H2>

<P>Nope, its not possible.  Squid only accepts HTTP requests.  It speaks
FTP on the <EM>server-side</EM>, but <B>not</B> on the <EM>client-side</EM>.</P>

<P>The very cool
<A HREF="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/">wget</A>
will download FTP URLs via Squid (and probably any other proxy cache).</P>

<H2><A NAME="ss12.18">12.18</A> <A HREF="FAQ.html#toc12.18">Why is the select loop average time so high?</A>
</H2>


<P><I>Is there any way to speed up the time spent dealing with select? Cachemgr
shows:</I>
<PRE>
        Select loop called: 885025 times, 714.176 ms avg
</PRE>
</P>

<P>This number is NOT how much time it takes to handle filedescriptor I/O.
We simply count the number of times select was called, and divide the
total process running time by the number of select calls.</P>

<P>This means, on average it takes your cache .714 seconds to check all
the open file descriptors once.   But this also includes time select()
spends in a wait state when there is no I/O on any file descriptors.
My relatively idle workstation cache has similar numbers:
<PRE>
        Select loop called: 336782 times, 715.938 ms avg
</PRE>

But my busy caches have much lower times:
<PRE>
        Select loop called: 16940436 times, 10.427 ms avg
        Select loop called: 80524058 times, 10.030 ms avg
        Select loop called: 10590369 times, 8.675 ms avg
        Select loop called: 84319441 times, 9.578 ms avg
</PRE>
</P>

<H2><A NAME="ss12.19">12.19</A> <A HREF="FAQ.html#toc12.19">How does Squid deal with Cookies?</A>
</H2>

<P>The presence of Cookies headers in <B>requests</B> does not affect whether
or not an HTTP reply can be cached.   Similarly, the presense of
<EM>Set-Cookie</EM> headers in <B>replies</B> does not affect whether
the reply can be cached.</P>

<P>The proper way to deal with <EM>Set-Cookie</EM> reply headers, according
to 
<A HREF="ftp://ftp.isi.edu/in-notes/rfc2109.txt">RFC 2109</A>
is to cache the whole object, <EM>EXCEPT</EM> the <EM>Set-Cookie</EM> header lines.</P>


<P>With Squid-1.1, we can not filter out specific HTTP headers, so
Squid-1.1 does not cache any response which contains a <EM>Set-Cookie</EM>
header.</P>

<P>With Squid-2, however, we can filter out specific HTTP headers.  But instead
of filtering them on the receiving-side, we filter them on the sending-side.
Thus, Squid-2 does cache replies with <EM>Set-Cookie</EM> headers, but
it filters out the <EM>Set-Cookie</EM> header itself for cache hits.</P>

<H2><A NAME="ss12.20">12.20</A> <A HREF="FAQ.html#toc12.20">How does Squid decide when to refresh a cached object?</A>
</H2>

<P>When checking the object freshness, we calculate these values:
<UL>
<LI><EM>OBJ_DATE</EM> is the time when the object was given out by the
origin server.  This is taken from the HTTP Date reply header.</LI>
<LI><EM>OBJ_LASTMOD</EM> is the time when the object was last modified,
given by the HTTP Last-Modified reply header.</LI>
<LI><EM>OBJ_AGE</EM> is how much the object has aged <EM>since</EM> it was retrieved:
<PRE>
        OBJ_AGE = NOW - OBJ_DATE
</PRE>
</LI>
<LI><EM>LM_AGE</EM> is how old the object was <EM>when</EM> it was retrieved:
<PRE>
        LM_AGE = OBJ_DATE - OBJ_LASTMOD
</PRE>
</LI>
<LI><EM>LM_FACTOR</EM> is the ratio of <EM>OBJ_AGE</EM> to <EM>LM_AGE</EM>:
<PRE>
        LM_FACTOR = OBJ_AGE / LM_AGE
</PRE>
</LI>
<LI><EM>CLIENT_MAX_AGE</EM> is the (optional) maximum object age the client will
accept as taken from the HTTP/1.1 Cache-Control request header.</LI>
<LI><EM>EXPIRES</EM> is the (optional) expiry time from the server reply headers.</LI>
</UL>
</P>

<P>These values are compared with the parameters of the <EM>refresh_pattern</EM>
rules.  The refresh parameters are:
<UL>
<LI>URL regular expression</LI>
<LI><EM>CONF_MIN</EM>:
The time (in minutes) an object without an explicit expiry
time should be considered fresh. The recommended value is 0, any higher
values may cause dynamic applications to be erronously cached unless the
application designer has taken the appropriate actions.
</LI>
<LI><EM>CONF_PERCENT</EM>:
A percentage of the objects age (time since last
modification age) an object without explicit exipry time will be
considered fresh.
</LI>
<LI><EM>CONF_MAX</EM>:
An upper limit on how long objects without an explicit
expiry time will be considered fresh.
</LI>
</UL>
</P>

<P>The URL regular expressions are checked in the order listed until a
match is found.  Then the algorithms below are applied for determining
if an object is fresh or stale.</P>

<H3>Squid-1.1 and Squid-1.NOVM algorithm</H3>

<P>
<PRE>
    if (CLIENT_MAX_AGE)
        if (OBJ_AGE > CLIENT_MAX_AGE)
            return STALE
    if (OBJ_AGE &lt;= CONF_MIN)
        return FRESH
    if (EXPIRES) {
        if (EXPIRES &lt;= NOW)
            return STALE
        else
            return FRESH
    }
    if (OBJ_AGE > CONF_MAX)
        return STALE
    if (LM_FACTOR &lt; CONF_PERCENT)
        return FRESH
    return STALE
</PRE>
</P>

<P>
<A HREF="mailto:bertold@tohotom.vein.hu">Kolics Bertold</A>
has made an excellent
<A HREF="http://www.squid-cache.org/Doc/FAQ/refresh-flowchart.gif">flow chart diagram</A> showing this process.</P>

<H3>Squid-2 algorithm</H3>

<P>For Squid-2 the refresh algorithm has been slightly modified to give the
<EM>EXPIRES</EM> value a higher precedence, and the <EM>CONF_MIN</EM> value
lower precedence:
<PRE>
    if (EXPIRES) {
        if (EXPIRES &lt;= NOW)
            return STALE
        else
            return FRESH
    }
    if (CLIENT_MAX_AGE)
        if (OBJ_AGE > CLIENT_MAX_AGE)
            return STALE
    if (OBJ_AGE > CONF_MAX)
        return STALE
    if (OBJ_DATE > OBJ_LASTMOD) {
        if (LM_FACTOR &lt; CONF_PERCENT)
            return FRESH
        else
            return STALE
    }
    if (OBJ_AGE &lt;= CONF_MIN)
        return FRESH
    return STALE
</PRE>
</P>


<H2><A NAME="ss12.21">12.21</A> <A HREF="FAQ.html#toc12.21">What exactly is a <EM>deferred read</EM>?</A>
</H2>

<P>The cachemanager I/O page lists <EM>deferred reads</EM> for various
server-side protocols.</P>
<P>Sometimes reading on the server-side gets ahead of writing to the
client-side.  Especially if your cache is on a fast network and your
clients are connected at modem speeds.  Squid-1.1 will read up to 256k
(per request) ahead before it starts to defer the server-side reads.</P>

<H2><A NAME="ss12.22">12.22</A> <A HREF="FAQ.html#toc12.22">Why is my cache's inbound traffic equal to the outbound traffic?</A>
</H2>

<P><I>I've been monitoring
the traffic on my cache's ethernet adapter an found a behavior I can't explain:
the inbound traffic is equal to the outbound traffic. The differences are
negligible. The hit ratio reports 40%.
Shouldn't the outbound be at least 40% greater than the inbound?</I></P>
<P>by 
<A HREF="mailto:david@avarice.nepean.uws.edu.au">David J N Begley</A></P>
<P>I can't account for the exact behavior you're seeing, but I can offer this
advice;  whenever you start measuring raw Ethernet or IP traffic on
interfaces, you can forget about getting all the numbers to exactly match what
Squid reports as the amount of traffic it has sent/received.</P>

<P>Why?</P>

<P>Squid is an application - it counts whatever data is sent to, or received
from, the lower-level networking functions;  at each successively lower layer,
additional traffic is involved (such as header overhead, retransmits and
fragmentation, unrelated broadcasts/traffic, etc.).  The additional traffic is
never seen by Squid and thus isn't counted - but if you run MRTG (or any
SNMP/RMON measurement tool) against a specific interface, all this additional
traffic will "magically appear".</P>

<P>Also remember that an interface has no concept of upper-layer networking (so
an Ethernet interface doesn't distinguish between IP traffic that's entirely
internal to your organization, and traffic that's to/from the Internet);  this
means that when you start measuring an interface, you have to be aware of
*what* you are measuring before you can start comparing numbers elsewhere.</P>

<P>It is possible (though by no means guaranteed) that you are seeing roughly
equivalent input/output because you're measuring an interface that both
retrieves data from the outside world (Internet), *and* serves it to end users
(internal clients).  That wouldn't be the whole answer, but hopefully it gives
you a few ideas to start applying to your own circumstance.</P>

<P>To interpret any statistic, you have to first know what you are measuring;
for example, an interface counts inbound and outbound bytes - that's it.  The
interface doesn't distinguish between inbound bytes from external Internet
sites or from internal (to the organization) clients (making requests).  If
you want that, try looking at RMON2.</P>

<P>Also, if you're talking about a 40% hit rate in terms of object
requests/counts then there's absolutely no reason why you should expect a 40%
reduction in traffic;  after all, not every request/object is going to be the
same size so you may be saving a lot in terms of requests but very little in
terms of actual traffic.</P>

<H2><A NAME="ss12.23">12.23</A> <A HREF="FAQ.html#toc12.23">How come some objects do not get cached?</A>
</H2>

<P>To determine whether a given object may be cached, Squid takes many
things into consideration.  The current algorithm (for Squid-2)
goes something like this:</P>
<P>
<OL>
<LI>Responses with <EM>Cache-Control: Private</EM> are NOT cachable.</LI>
<LI>Responses with <EM>Cache-Control: No-Cache</EM> are NOT cachable.</LI>
<LI>Responses with <EM>Cache-Control: No-Store</EM> are NOT cachable.</LI>
<LI>Responses for requests with an <EM>Authorization</EM> header
are cachable ONLY if the reponse includes <EM>Cache-Control: Public</EM>.</LI>
<LI>Responses with <EM>Vary</EM> headers are NOT cachable because Squid
does not yet support Vary features.</LI>
<LI>The following HTTP status codes are cachable:
<UL>
<LI>200 OK</LI>
<LI>203 Non-Authoritative Information</LI>
<LI>300 Multiple Choices</LI>
<LI>301 Moved Permanently</LI>
<LI>410 Gone</LI>
</UL>

However, if Squid receives one of these responses from a neighbor
cache, it will NOT be cached if ALL of the <EM>Date</EM>, <EM>Last-Modified</EM>,
and <EM>Expires</EM> reply headers are missing.  This prevents such objects
from bouncing back-and-forth between siblings forever.</LI>
<LI>A 302 Moved Temporarily response is cachable ONLY if the response
also includes an <EM>Expires</EM> header.</LI>
<LI>The following HTTP status codes are ``negatively cached'' for
a short amount of time (configurable):
<UL>
<LI>204 No Content</LI>
<LI>305 Use Proxy</LI>
<LI>400 Bad Request</LI>
<LI>403 Forbidden</LI>
<LI>404 Not Found</LI>
<LI>405 Method Not Allowed</LI>
<LI>414 Request-URI Too Large</LI>
<LI>500 Internal Server Error</LI>
<LI>501 Not Implemented</LI>
<LI>502 Bad Gateway</LI>
<LI>503 Service Unavailable</LI>
<LI>504 Gateway Time-out</LI>
</UL>
</LI>
<LI>All other HTTP status codes are NOT cachable, including:
<UL>
<LI>206 Partial Content</LI>
<LI>303 See Other</LI>
<LI>304 Not Modified</LI>
<LI>401 Unauthorized</LI>
<LI>407 Proxy Authentication Required</LI>
</UL>
</LI>
</OL>
</P>

<H2><A NAME="ss12.24">12.24</A> <A HREF="FAQ.html#toc12.24">What does <EM>keep-alive ratio</EM> mean?</A>
</H2>

<P>The <EM>keep-alive ratio</EM> shows up in the <EM>server_list</EM>
cache manager page for Squid 2.</P>
<P>This is a mechanism to try detecting neighbor caches which might
not be able to deal with persistent connections.  Every
time we send a <EM>proxy-connection: keep-alive</EM> request header
to a neighbor, we count how many times the neighbor sent us
a <EM>proxy-connection: keep-alive</EM> reply header.  Thus, the
<EM>keep-alive ratio</EM> is the ratio of these two counters.</P>

<P>If the ratio stays above 0.5, then we continue to assume the neighbor
properly implements persistent connections.  Otherwise, we will stop
sending the keep-alive request header to that neighbor.</P>

<H2><A NAME="ss12.25">12.25</A> <A HREF="FAQ.html#toc12.25">How does Squid's cache replacement algorithm work?</A>
</H2>

<P>Squid uses an LRU (least recently used) algorithm to replace old cache
objects.  This means objects which have not been accessed for the
longest time are removed first.  In the source code, the
StoreEntry->lastref value is updated every time an object is accessed.</P>

<P>Objects are not necessarily removed ``on-demand.''  Instead, a regularly
scheduled event runs to periodically remove objects.  Normally this
event runs every second.</P>

<P>Squid keeps the cache disk usage between the low and high water marks.
By default the low mark is 90%, and the high mark is 95% of the total
configured cache size.  When the disk usage is close to the low mark,
the replacement is less aggressive (fewer objects removed).  When the
usage is close to the high mark, the replacement is more aggressive
(more objects removed).</P>

<P>When selecting objects for removal, Squid examines some number of objects
and determines which can be removed and which cannot.
A number of factors determine whether or not any given object can be
removed.  If the object is currently being requested, or retrieved
from an upstream site, it will not be removed.   If the object is
``negatively-cached'' it will be removed.  If the object has a private
cache key, it will be removed (there would be no reason to keep it --
because the key is private, it can never be ``found'' by subsequent requests).
Finally, if the time since last access is greater than the LRU threshold,
the object is removed.</P>

<P>The LRU threshold value is dynamically calculated based on the current
cache size and the low and high marks.  The LRU threshold scaled
exponentially between the high and low water marks.  When the store swap
size is near the low water mark, the LRU threshold is large.  When the
store swap size is near the high water mark, the LRU threshold is small.
The threshold automatically adjusts to the rate of incoming requests.
In fact, when your cache size has stabilized, the LRU threshold
represents how long it takes to fill (or fully replace) your cache at
the current request rate.  Typical values for the LRU threshold are 1 to
10 days.</P>

<P>Back to selecting objects for removal.  Obviously it is not possible to
check every object in the cache every time we need to remove some of them.
We can only check a small subset each time.  The way in which
this is implemented is very different between Squid-1.1 and Squid-2.</P>

<H3>Squid 1.1</H3>

<P>The Squid cache storage is implemented as a hash table with some number
of "hash buckets."  Squid-1.1 scans one bucket at a time and sorts all the
objects in the bucket by their LRU age.  Objects with an LRU age
over the threshold are removed.  The scan rate is adjusted so that
it takes approximately 24 hours to scan the entire cache.  The
store buckets are randomized so that we don't always scan the same
buckets at the same time of the day.</P>

<P>This algorithm has some flaws.  Because we only scan one bucket,
there are going to be better candidates for removal in some of
the other 16,000 or so buckets.  Also, the qsort() function
might take a non-trivial amount of CPU time, depending on how many
entries are in each bucket.</P>

<H3>Squid 2</H3>

<P>For Squid-2 we eliminated the need to use qsort() by indexing
cached objects into an automatically sorted linked list.  Every time
an object is accessed, it gets moved to the top of the list.  Over time,
the least used objects migrate to the bottom of the list.  When looking
for objects to remove, we only need to check the last 100 or so objects
in the list.  Unfortunately this approach increases our memory usage
because of the need to store three additional pointers per cache object.
But for Squid-2 we're still ahead of the game because we also replaced
plain-text cache keys with MD5 hashes.</P>

<H2><A NAME="pub-priv-keys"></A> <A NAME="ss12.26">12.26</A> <A HREF="FAQ.html#toc12.26">What are private and public keys?</A>
</H2>

<P><EM>keys</EM> refers to the database keys which Squid uses to index
cache objects.  Every object in the cache--whether saved on disk
or currently being downloaded--has a cache key.  For Squid-1.0 and
Squid-1.1 the cache key was basically the URL.  Squid-2 uses
MD5 checksums for cache keys.</P>

<P>The Squid cache uses the notions of <EM>private</EM> and <EM>public</EM>
cache keys.  An object can start out as being private, but may later be
changed to public status.  Private objects are associated with only a single
client whereas a public object may be sent to multiple clients at the
same time.  In other words, public objects can be located by any cache
client.  Private keys can only be located by a single client--the one
who requested it.</P>

<P>Objects are changed from private to public after all of the HTTP
reply headers have been received and parsed.  In some cases, the
reply headers will indicate the object should not be made public.
For example, if the <EM>no-cache</EM> Cache-Control directive is used.</P>

<H2><A NAME="ss12.27">12.27</A> <A HREF="FAQ.html#toc12.27">What is FORW_VIA_DB for?</A>
</H2>

<P>We use it to collect data for 
<A HREF="http://www.ircache.net/Cache/Plankton/">Plankton</A>.</P>

<H2><A NAME="ss12.28">12.28</A> <A HREF="FAQ.html#toc12.28">Does Squid send packets to port 7 (echo)?  If so, why?</A>
</H2>

<P>It may.  This is an old feature from the Harvest cache software.
The cache would send ICP ``SECHO'' message to the echo ports of
origin servers.  If the SECHO message came back before any of the
other ICP replies, then it meant the origin server was probably
closer than any neighbor cache.  In that case Harvest/Squid sent
the request directly to the origin server.</P>

<P>With more attention focused on security, many administrators filter
UDP packets to port 7.  The Computer Emergency Response Team (CERT)
once issued an advisory note (
<A HREF="http://www.cert.org/advisories/CA-96.01.UDP_service_denial.html">CA-96.01: UDP Port Denial-of-Service Attack</A>) that says UDP
echo and chargen services can be used for a denial of service
attack.  This made admins extremely nervous about any packets
hitting port 7 on their systems, and they made complaints.</P>

<P>The <EM>source_ping</EM> feature has been disabled in Squid-2. 
If you're seeing packets to port 7 that are coming from a
Squid cache (remote port 3130), then its probably a
very old version of Squid.</P>

<H2><A NAME="ss12.29">12.29</A> <A HREF="FAQ.html#toc12.29">What does ``WARNING: Reply from unknown nameserver [a.b.c.d]'' mean?</A>
</H2>

<P>It means Squid sent a DNS query to one IP address, but the response 
came back from a different IP address.  By default Squid checks that
the addresses match.  If not, Squid ignores the response.</P>

<P>There are a number of reasons why this would happen:
<OL>
<LI>Your DNS name server just works this way, either becuase
its been configured to, or because its stupid and doesn't
know any better.</LI>
<LI>You have a weird broadcast address, like 0.0.0.0, in
your <EM>/etc/resolv.conf</EM> file.</LI>
<LI>Somebody is trying to send spoofed DNS responses to
your cache.</LI>
</OL>
</P>

<P>If you recognize the IP address in the warning as one of your
name server hosts, then its probably numbers (1) or (2).</P>

<P>You can make these warnings stop, and allow responses from
``unknown'' name servers by setting this configuration option:
<PRE>
        ignore_unknown_nameservers off
</PRE>
</P>

<H2><A NAME="ss12.30">12.30</A> <A HREF="FAQ.html#toc12.30">How does Squid distribute cache files among the available directories?</A>
</H2>

<P><EM>Note: The information here is current for version 2.2.</EM></P>
<P>See <EM>storeDirMapAllocate()</EM> in the source code.</P>

<P>When Squid wants to create a new disk file for storing an object, it
first selects which <EM>cache_dir</EM> the object will go into.  This is done
with the <EM>storeDirSelectSwapDir()</EM> function.  If you have <EM>N</EM>
cache directories, the function identifies the <EM>3N/4</EM> (75%)
of them with the most available space.  These directories are
then used, in order of having the most available space.  When Squid has
stored one URL to each of the 
<EM>3N/4</EM> <EM>cache_dir</EM>'s, the process repeats and 
<EM>storeDirSelectSwapDir()</EM> finds a new set of <EM>3N/4</EM>
cache directories with the most available space.</P>

<P>Once the <EM>cache_dir</EM> has been selected, the next step is to find
an available <EM>swap file number</EM>.  This is accomplished
by checking the <EM>file map</EM>, with the <EM>file_map_allocate()</EM>
function.  Essentially the swap file numbers are allocated
sequentially.  For example, if the last number allocated 
happens to be 1000, then the next one will be the first
number after 1000 that is not already being used.</P>

<H2><A NAME="ss12.31">12.31</A> <A HREF="FAQ.html#toc12.31">Why do I see negative byte hit ratio?</A>
</H2>

<P>Byte hit ratio is calculated a bit differently than
Request hit ratio.  Squid counts the number of bytes read
from the network on the server-side, and the number of bytes written to
the client-side.  The byte hit ratio is calculated as
<PRE>
        (client_bytes - server_bytes) / client_bytes
</PRE>

If server_bytes is greater than client_bytes, you end up
with a negative value.</P>

<P>The server_bytes may be greater than client_bytes for a number
of reasons, including:
<UL>
<LI>Cache Digests and other internally generated requests.
Cache Digest messages are quite large.  They are counted
in the server_bytes, but since they are consumed internally,
they do not count in client_bytes.</LI>
<LI>User-aborted requests.  If your <EM>quick_abort</EM> setting
allows it, Squid sometimes continues to fetch aborted
requests from the server-side, without sending any
data to the client-side.</LI>
<LI>Some range requests, in combination with Squid bugs, can
consume more bandwidth on the server-side than on the
client-side.  In a range request, the client is asking for
only some part of the object.  Squid may decide to retrieve
the whole object anyway, so that it can be used later on.
This means downloading more from the server than sending
to the client.  You can affect this behavior with
the <EM>range_offset_limit</EM> option.</LI>
</UL>
</P>

<H2><A NAME="ss12.32">12.32</A> <A HREF="FAQ.html#toc12.32">What does ``Disabling use of private keys'' mean?</A>
</H2>

<P>First you need to understand the 
<A HREF="#pub-priv-keys">difference between public and private keys</A>.</P>

<P>When Squid sends ICP queries, it uses the ICP <EM>reqnum</EM> field
to hold the private key data.  In other words, when Squid gets an
ICP reply, it uses the <EM>reqnum</EM> value to build the private cache key for
the pending object.</P>


<P>Some ICP implementations always set the <EM>reqnum</EM> field to zero
when they send a reply.   Squid can not use private cache keys with
such neighbor caches because Squid will not be able to
locate cache keys for those ICP replies.  Thus, if Squid detects a neighbor
cache that sends zero reqnum's, it 
disables the use of private cache keys.</P>

<P>Not having private cache keys has some important privacy
implications.  Two users could receive one response that was
meant for only one of the users.  This response could contain
personal, confidential information.  You will need to disable
the ``zero reqnum'' neighbor if you want Squid to use private
cache keys.</P>

<H2><A NAME="ss12.33">12.33</A> <A HREF="FAQ.html#toc12.33">What is a half-closed filedescriptor?</A>
</H2>

<P>TCP allows connections to be in a ``half-closed'' state.   This
is accomplished with the <EM>shutdown(2)</EM> system call.  In Squid,
this means that a client has closed its side of the connection for
writing, but leaves it open for reading.  Half-closed connections
are tricky because Squid can't tell the difference between a
half-closed connection, and a fully closed one.</P>
<P>If Squid tries to read a connection, and <EM>read()</EM> returns
0, and Squid knows that the client doesn't have the whole
response yet, Squid puts marks the filedescriptor as half-closed.
Most likely the client has aborted the request and the connection
is really closed.  However, there is a slight chance that
the client is using the <EM>shutdown()</EM> call, and that it
can still read the response.</P>
<P>To disable half-closed connections, simply put this in
squid.conf:
<PRE>
        half_closed_clients off
</PRE>

Then, Squid will always close its side of the connection
instead of marking it as half-closed.</P>

<H2><A NAME="ss12.34">12.34</A> <A HREF="FAQ.html#toc12.34">What does --enable-heap-replacement do?</A>
</H2>

<P>Squid has traditionally used an LRU replacement algorithm.  As of
<A HREF="/Versions/v2/2.3/">version 2.3</A>, you can use
some other replacement algorithms by using the <EM>--enable-heap-replacement</EM>
configure option.  Currently, the heap replacement code supports two
additional algorithms: LFUDA, and GDS.</P>
<P>With Squid version 2.4 and later you should use this configure option:
<PRE>
./configure --enable-removal-policies=heap
</PRE>
</P>
<P>Then, in <EM>squid.conf</EM>, you can select different policies with the
<EM>cache_replacement_policy</EM> option.  See the <EM>squid.conf</EM> comments
for details.</P>
<P>The LFUDA and GDS replacement code was contributed by John Dilley and others
from Hewlett-Packard.  Their work is described in these papers:
<OL>
<LI>
<A HREF="http://www.hpl.hp.com/techreports/1999/HPL-1999-69.html">Enhancement and Validation of Squid's Cache Replacement Policy</A>
(HP Tech Report).</LI>
<LI>
<A HREF="http://workshop.ircache.net/Papers/dilley-abstract.html">Enhancement and Validation of the Squid Cache Replacement Policy</A>
(WCW 1999 paper).</LI>
</OL>
</P>

<H2><A NAME="ss12.35">12.35</A> <A HREF="FAQ.html#toc12.35">Why is actual filesystem space used greater than what Squid thinks?</A>
</H2>

<P>If you compare <EM>df</EM> output and cachemgr <EM>storedir</EM> output,
you will notice that actual disk usage is greater than what Squid
reports.  This may be due to a number of reasons:
<UL>
<LI>Squid doesn't keep track of the size of the <EM>swap.state</EM>
file, which normally resides on each <EM>cache_dir</EM>.</LI>
<LI>Directory entries and take up filesystem space.</LI>
<LI>Other applications might be using the same disk partition.</LI>
<LI>Your filesystem block size might be larger than what Squid
thinks.  When calculating total disk usage, Squid rounds
file sizes up to a whole number of 1024 byte blocks.  If
your filesystem uses larger blocks, then some "wasted" space
is not accounted.</LI>
</UL>
</P>

<H2><A NAME="ss12.36">12.36</A> <A HREF="FAQ.html#toc12.36">How do <EM>positive_dns_ttl</EM> and <EM>negative_dns_ttl</EM> work?</A>
</H2>

<P><EM>positive_dns_ttl</EM> is how long Squid caches a successful DNS
lookup. Similarly, <EM>negative_dns_ttl</EM> is how long Squid caches
a failed DNS lookup.</P>
<P><EM>positive_dns_ttl</EM> is not always used.  It is NOT used in the following
cases:
<UL>
<LI>Squid-2.3 and later versions with internal DNS lookups.  Internal
lookups are the default for Squid-2.3 and later.</LI>
<LI>If you applied the ``DNS TTL'' 
<A HREF="FAQ-2.html#dns-ttl-hack">patch</A>
for BIND.</LI>
<LI>If you are using FreeBSD, then it already has the DNS TTL patch
built in.</LI>
</UL>
</P>

<P>Let's say you have the following settings:
<PRE>
positive_dns_ttl 1 hours
negative_dns_ttl 1 minutes
</PRE>

When Squid looks up a name like <EM>www.squid-cache.org</EM>, it gets back
an IP address like 204.144.128.89.  The address is cached for the
next hour.  That means, when Squid needs to know the address for 
<EM>www.squid-cache.org</EM> again, it uses the cached answer for the
next hour.  After one hour, the cached information expires, and Squid
makes a new query for the address of <EM>www.squid-cache.org</EM>.</P>

<P>If you have the DNS TTL patch, or are using internal lookups, then
each hostname has its own TTL value, which was set by the domain
name administrator.  You can see these values in the 'ipcache'
cache manager page.  For example:
<PRE>
 Hostname                      Flags lstref    TTL N
 www.squid-cache.org               C   73043  12784  1( 0)  204.144.128.89-OK 
 www.ircache.net                   C   73812  10891  1( 0)   192.52.106.12-OK 
 polygraph.ircache.net             C  241768 -181261  1( 0)   192.52.106.12-OK 
</PRE>

The TTL field shows how how many seconds until the entry expires.
Negative values mean the entry is already expired, and will be refreshed
upon next use.</P>

<P>The <EM>negative_dns_ttl</EM> specifies how long to cache failed DNS lookups.
When Squid fails to resolve a hostname, you can be pretty sure that
it is a real failure, and you are not likely to get a successful
answer within a short time period.  Squid retries its lookups 
many times before declaring a lookup has failed.
If you like, you can set <EM>negative_dns_ttl</EM> to zero.</P>

<H2><A NAME="ss12.37">12.37</A> <A HREF="FAQ.html#toc12.37">What does <EM>swapin MD5 mismatch</EM> mean?</A>
</H2>

<P>It means that Squid opened up a disk file to serve a cache hit, but
it found that the stored object doesn't match what the user's request.
Squid stores the MD5 digest of the URL at the start of each disk file.
When the file is opened, Squid checks that the disk file MD5 matches the
MD5 of the URL requested by the user.  If they don't match, the warning
is printed and Squid forwards the request to the origin server.</P>
<P>You do not need to worry about this warning.  It means that Squid is 
recovering from a corrupted cache directory.</P>

<H2><A NAME="ss12.38">12.38</A> <A HREF="FAQ.html#toc12.38">What does <EM>failed to unpack swapfile meta data</EM> mean?</A>
</H2>

<P>Each of Squid's disk cache files has a metadata section at the beginning.
This header is used to store the URL MD5, some StoreEntry data, and more.
When Squid opens a disk file for reading, it looks for the meta data
header and unpacks it.</P>
<P>This warning means that Squid couln't unpack the meta data.  This is
non-fatal bug, from which Squid can recover.  Perhaps
the meta data was just missing, or perhaps the file got corrupted.</P>
<P>You do not need to worry about this warning.  It means that Squid is 
double-checking that the disk file matches what Squid thinks should
be there, and the check failed.  Squid recorvers and generates
a cache miss in this case.</P>

<H2><A NAME="ss12.39">12.39</A> <A HREF="FAQ.html#toc12.39">Why doesn't Squid make <EM>ident</EM> lookups in interception mode?</A>
</H2>

<P>Its a side-effect of the way interception proxying works.</P>
<P>When Squid is configured for interception proxying, the operating system
pretends that it is the origin server.  That means that the "local" socket
address for intercepted TCP
connections is really the origin server's IP address.  If you run
<EM>netstat -n</EM> on your interception proxy, you'll see a lot of 
foreign IP addresses in the <EM>Local Address</EM> column.</P>
<P>When Squid wants to make an ident query, it creates a new TCP socket
and <EM>binds</EM> the local endpoint to the same IP address as the
local end of the client's TCP connection.  Since the local address
isn't really local (its some far away origin server's IP address),
the <EM>bind()</EM> system call fails.  Squid handles this as a failed
ident lookup.</P>
<P><I>So why bind in that way? If you know you are interception proxying, then why
not bind the local endpoint to the host's (intranet) IP address? Why make
the masses suffer needlessly?</I></P>
<P>Because thats just how ident works.  
Please read 
<A HREF="ftp://ftp.isi.edu/in-notes/rfc931.txt">RFC 931</A>,
in particular the RESTRICTIONS section.</P>

<H2><A NAME="ss12.40">12.40</A> <A HREF="FAQ.html#toc12.40">dnsSubmit: queue overload, rejecting blah</A>
</H2>

<P>This means that you are using external <EM>dnsserver</EM> processes
for lookups, and all processes are busy, and Squid's pending queue
is full.  Each <EM>dnsserver</EM> program can only handle one request
at a time.  When all <EM>dnsserver</EM> processes are busy, Squid queues
up requests, but only to a certain point.</P>
<P>To alleviate this condition, you need to either (1) increase the number
of <EM>dnsserver</EM> processes by changing the value for <EM>dns_children</EM>
in your config file, or (2) switch to using Squid's internal DNS client
code.</P>
<P>Note that in some versions, Squid limits <EM>dns_children</EM> to 32.  To
increase it beyond that value, you would have to edit the source code.</P>

<H2><A NAME="ss12.41">12.41</A> <A HREF="FAQ.html#toc12.41">What are FTP passive connections?</A>
</H2>

<P>by Colin Campbell</P>
<P>Ftp uses two data streams, one for passing commands around, the other for
moving data. The command channel is handled by the ftpd listening on port
21.</P>
<P>The data channel varies depending on whether you ask for passive ftp or
not. When you request data in a non-passive environment, you client tells
the server ``I am listening on &lt;ip-address&gt; &lt;port&gt;.'' The server then
connects FROM port 20 to the ip address and port specified by your client.
This requires your "security device" to permit any host outside from port
20 to any host inside on any port &gt; 1023. Somewhat of a hole.</P>
<P>In passive mode, when you request a data transfer, the server tells the
client ``I am listening on &lt;ip address&gt; &lt;port&gt;.'' Your client then connects
to the server on that IP and port and data flows.</P>




<HR>
<A HREF="FAQ-13.html">Next</A>
<A HREF="FAQ-11.html">Previous</A>
<A HREF="FAQ.html#toc12">Contents</A>
</BODY>
</HTML>