File: TUNING_NOTES

package info (click to toggle)
diablo 1.13-1
  • links: PTS
  • area: non-free
  • in suites: hamm
  • size: 804 kB
  • ctags: 875
  • sloc: ansic: 8,308; perl: 1,908; sh: 186; csh: 81; makefile: 67
file content (307 lines) | stat: -rw-r--r-- 14,958 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307

			KERNEL_NOTES FOR DIABLO

(0) Location of options

    Diablo compilation options mainly appear in two files: lib/config.h and
    lib/vendor.h.  lib/config.h is supposed to hold only permanent 
    configuration options.  The more advanced options are usually disabled
    unless it is possible to do preprocessor conditionals on the OS version.

    Generally speaking, any option overrides that you do should be done in
    lib/vendor.h
 
(I) Use of mmap()

    Diablo requires at least shared read-only file maps to work properly.
    This is known to work on Sun, Solaris, IRIX, and FreeBSD.  

    BSDI releaes including 3.0 are known to have serious problems with mmap()
    and it is not suggested that you run diablo on it.

    Once you get past shared read-only file maps, you get into shared 
    read-write file maps, shared read-write anonymous maps, and sys-v
    shared memory maps.  These are optional.  I believe the Sun, Solaris, 
    IRIX, and FreeBSD support shared r/w maps but SunOS does not support 
    anonymous maps (solaris does).  Most systems support sys-v shared memory.
    I have only tested advanced mmap features on FreeBSD.

    Diablo will work fine with systems which do not have a unified buffer
    cache for read+write mmaps, which means all mmap features will work
    with FreeBSD 2.2.x or greater just fine.

    Memory allocation features:

    USE_ANON_MMAP	Allows diablo to use an anonymous private r/w mmap
			to allocate memory.  This will cause the least 
			memory fragmentation.

    USE_FALL_MMAP	Diablo uses a temporary file private mmap which it
			then remove()s to allocate memory.  May or may not
			work well depending on how the filesystem works.

			The default is to simply use malloc().

    USE_SPAM_RW_MAP	Use a read+write mmap() for the spam cache file, 
			otherwise uses a read-only mmap and seek+write to
			write.

    USE_SPAM_SHM	Use sysv-shared memory to map the spam cache.  The
			spam cache will be read from its file into shared
			memory on diablo startup and written back on final
			exit.  This is the most efficient spam-cache memory
			option in diablo and should be used whenever possible.

    USE_PCOMMIT_RW_MAP	Use a read+write mmap() for the precommit cache,
			otherwise uses a read-only mmap and seek+write to
			write.

    USE_PCOMMIT_SHM	use sysv-shared memory to map the precommit cache.
			This can double dhistory lookup performance and lead
			to better stability under extreme loading conditions
			when used with DO_PCOMMIT_POSTCACHE.  This option is
			recommended.

    DO_PCOMMIT_POSTCACHE use the precommit cache to hold recent dhistory file
			hits.  Recommended only if USE_PCOMMIT_RW_MAP or
			USE_PCOMMIT_SHM is set.

(II) memory, disk, and cpu

    A 100 MIPS class cpu is suggested for up to 40 feeds, a 200 MIPS class cpu
    is suggested otherwise.  Nominally, a pentium-pro 200 running Linux or
    FreeBSD, a Sun-ultra running solaris, or a 150MHz R4400 or better SGI box
    running IRIX is suggested.  I use FreeBSD boxes.

    A minimum of 128MB of ram is required (mainly to maintain the dhistory 
    file efficiently).  If you have more then 30 feeds, 192MB of ram is
    suggested.  If you have more then 70 feeds, 256MB of ram is suggested.
    The more memory the merrier.

    The minimum recommended disk configuration is three fast 4G disks
    sd0 would be used as the root disk, but half of it (2G) would be the
    /news partition.  sd1 and sd2 would be striped together to make an 8G 
    spool.  A stripe size of 2048 sectors (1 MByte) is suggested.  NOTE that
    a large /news partition is required.  It must not only hold the dhistory
    file and a backup of the dhistory file, it must also hold outgoing queue
    files and not blow up if outgoing feeds have problems and start to
    back up.  /news/dqueue can easily take a gig all by itself.

    The nominally recommended disk configuration is two fast 2G disks and
    two or more fast 4G disks, with /news striped on the first two disks and
    the spool striped on the second two.

    An ultra-wide SCSI controller is recommended.  One will generally be
    sufficient, but if you intend to run more then 80 feeds you should
    consider having two.  UW is suggested for the transaction rate, not
    the disk throughput.  Well-cooled Seagate drives are recommended.

    The machine should not ever have to swap, but swap should be configured
    to allow the machine to retire idle processes.  I suggest configuring
    128MB of swap on every disk to spread any swap activity around.

(III) file descriptors, process limits, datasize resource limits

    Configure the system to support a minimum of 512 descriptors per process
    and at least 4096 descriptors for the system as a whole.  The system
    must support at least 512 processes per user and 1024 total processes.
    This may involve both kernel configuration and resource limit settings.

    The datasize limit should be at least 128MB.

(IV) NBUFs - kernel filesystem buffers

    On kernels for which filesystem buffers are static, configure a large
    number of buffers.  If you have 256MB of ram, I would dedicate half
    of it to filesystem buffers.

    On kernels which have a dynamic buffer cache (FreeBSD, for example), but
    do not have a unified buffer cache, NBUF should be confiured to at least
    6144 (around 24 MBytes of KVM) because it is implemented on top of the
    primary buffer cache, which is dynamic.  If you configure too much, you
    will reduce the system's ability to manage its memory.

    The typical FreeBSD kernel config line is:

	options "NBUF=6144"

(V) DHistory file tuning

    Diablo should be able to handle upwards of 3000 accepted articles/min
    and message-id history lookups (check/ihave) rates between 40,000 and
    100,000 lookups/minute.  The actual performance depends heavily on
    the amount of memory you have and the number of diablo processes 
    in contention with each other.

    Many kernels will bog down on internal filesystem locks as the number
    of incoming feeds rises.  You need to worry once you get over 35 or so
    simultanious diablo processes.   Adding memory or reducing the size of
    the dhistory file will help here.

    The dhistory file defaults to a 14 day retention and will stabilize
    at between 350 and 400 MBytes given an article rate of 800,000 articles/day
    (a full feed as of this writing).  You can compile diablo with a lower
    retention by setting either USE_SHORT_REMEMBER in lib/vendor.h or
    setting a specific REMEMBERDAYS in lib/vendor.h.  USE_SHORT_REMEMBER
    sets the retention to 7 days and the dhistory file will stabilize at
    between 175 and 200 MBytes.

    The DHistory file hash table size is programmable, but not dynamic.
    The default is 4 million entries.  You can change this with the -h 
    option in diload.  For example, '-h 8m'.  The hash table size must
    be a power of 2.  The new hash table size will then take effect when
    you next run biweekly.trim.  Either 4m or 8m is recommended.  NOTE:
    if you make a mistake specifying the hash table size, you can blow 
    up the news system so be careful.

(VI) Tuning outgoing feeds to INN

    Please examine the samples/dnewsfeeds file.  Generally speaking, you need
    to tune any outgoing feeds to INN reader boxes.

    You want to do two things:  First, you want to make sure the spam filter
    is configured properly and turned on.  The spam filter is turned on by
    default in Diablo 1.12 or greater.  The sample dnewsfeeds file contains a
    spam filter starter which you should use.

    Second, you should consider cutting control messages in front of articles
    and then delaying non-control messages by 5 minutes.  This will allow
    cancel controls to leap ahead of articles and reduce INN's article write
    overhead (which is usually the big bottleneck in INN).

    Typically, you separate control messages out by creating two separate
    feeds to your reader box.  The first one has a 'delgroupany control.*',
    and the second one has a 'requiregroup control.*'.  Taking the example
    from the sample dnewsfeeds file:

	# dnewsfeeds
	#
	label   nntp2a
	    alias       nntp2.best.com
	    ... other add and delgroups ...
	    delgroupany control.*
	end

	label   nntp2c
	    alias       nntp2.best.com
	    ... other add and delgroups ...
	    requiregroup control.*
	end

    Then, in dnntpspool.ctl you program the normal feed for queue-delayed,
    to delay it by 5 minutes (assuming you run dspoolout from cron every 5
    minutse), and you program the control feed as realtime.  Also, if you
    don't mind slightly longer delays, q2 may be a better choice then q1.

	# dnntpspool.ctl
	#
	nntp2a          oldnntp.best.com                500     n4 q1
	nntp2c          nntp1x.ba.best.com              500     n4 realtime

(VII) Tuning dexpire

    There are two cron jobs that deal with dexpire.  The first is called
    quadhr.expire and nominally runs dexpire every four hours (6 times a day).
    The second is called hourly.expire and attempts to rerun dexpire if
    the quadhr cron fails.

    DExpire in Diablo is very fast.  Since diablo stores multiple-articles
    per spool file, DExpire is able to free up disk space very quickly and
    you should not be scared of running it often.  DExpire's biggest hog
    is that it must scan the dhistory file.  Unlike INN's expire, dexpire
    does not rewrite the dhistory file.  Instead, it expires entries in
    place which is considerably faster.

    The sample expiration cron jobs adm/quadhr.expire and adm/hourly.expire
    set a free space target of 2 gigabytes.  This is the suggested free space
    target if you run expire every 4 hours and is designed to deal with
    large influxes of data that may occur in a 4 hour period.  You can run
    a tighter free space target if you run dexpire more often.  You can
    probably get away with a 1 gigabyte (1000 megabyte) free space target
    if you run dexpire every 2 or 3 hours, but I suggest leaving the free
    space target alone.

(VIII) Typical Performance from news1.best.com

    news1.best.com is a FreeBSD 2.2.x box running on a PPro 200 with 192 MB
    of ram, one 2940UW SCSI controller, and three 4G Seagate ST34371W's.
    One 4G.  It is partitioned as follows:

	Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
	/dev/sd0a      127151    49473    67506    42%    /
	/dev/sd0d       63567     1369    57113     2%    /var
	/dev/sd0e      465940    43490   385175    10%    /var/log
	/dev/sd0f      232474        5   213872     0%    /var/tmp
	/dev/sd0g     1017327   432274   503667    46%    /usr
	/dev/sd0h     1705391   720650   848310    46%    /news	<--- too small
	/dev/ccd0c    8176355  5596859  1925387    74%    /news/spool/news
	procfs              4        4        0   100%    /proc

    The ccd partition is configured with a 4M stripe, designed to
    optimally handle a large number of diablo processes each writing
    to its own private, but large, spool file. 

	ccd0    8192    none    /dev/sd1d       /dev/sd2d

    The machine is currently configured with 95 feeds, of which 15 are
    'official' fully transited backbone feeds and another 10 are
    fully transited backup backbone feeds.  Another 20 send me message-id's
    equivalent to nearly full feeds.  Most of the remainder are mainly 
    outgoing feeds to T1 customers and their incoming component is minor.

    When news1.best.com is taken down for 30 minutes, then brought
    up again, it gets pounded by about half of its feeds and is 
    able to put away around 25 articles/sec and around 500 
    message-id lookups/sec.  What this means, basically, is that
    although the machine is able to catch up in real articles,
    many of the feeds continue to get behind for a short period of time
    (500/45 = 11 checks/sec/full-incoming-feed, not quite enough).

    The reason the check rate is so low is basically due to the load on
    the system.  90+ diablo processes all pounding away on the caches
    and the disks reduces efficiency all around.  Half the feeds would
    result in almost tripple the efficiency due not only to the lower
    level of pounding, but also due to the greater amount of memory available.
    The real issue is one of message-id load.  I run news1.best.com
    with a high message-id load on purpose... most news admins do not need
    45 full incoming message-id loads to get good news coverage... four
    or five will do just as well. 

    A simple rule of thumb for a news admin is to take full feeds only
    from half a dozen or so sources and ask the remaining sources to only
    send you locally posted articles.

    In anycase, with news1.best.com, the caches start to recover once the
    articles have begun to catchup and get back in synch.  The message-id
    lookup capability increases from 500/sec to 10000/sec and the incoming
    feeds catch up very quickly after that.

    Disk I/O is limited by seeking, so the transfers/sec statistics is often
    more useful then throughput statistics.  Once caught up news1.best.com
    stabilizes at around 30 tps on each of its three disks.  When catching up,
    under its heaviest load, sd0 hits around 90 tps which is basically 
    saturation, while sd1 and sd2 stabilize at between 60 and 70 tps.  The
    disks are theoretically capable of around 100 tps (averaged).

    There a number of ways to reduce the dhistory file load.  Reducing the
    number of full incoming feeds to a reasonable number (4 or 5) is one
    way.  Another way is to stripe /news AND the spool rather then just the
    spool.  A third way is to simply pack in more memory for better caching.
    A fourth way is to reduce the default history retention (see the release
    notes for setting REMEMBER_DAYS) from 14 days to 9 to significantly 
    reduce the size of the dhistory file.  Probably the best way to reduce
    the dhistory file load is with better management of incoming feeds, only
    a few actually need to be full feeds.

    After you handle the dhistory file load, tuning realtime vs non-realtime
    feeds comes next.  realtime feeds should only be used under certain
    conditions.  If you are a large ISP providing feeds to your T1 customers,
    making those feeds realtime gets news to them rather then them getting it
    over your internet backhaul from someone else.  If you peer at a MAE,
    where you do not pay on a bandwidth basis, making feeds that go over that 
    link in realtime will reduce the load on other feeds that go over more
    expensive links, especially if your MAE peers return the favor.  Local
    feeds to newsreader boxes do not have to be realtime, nor do most other
    feeds.  Why make a feed over a costly internet backhaul realtime when all
    it does is increase your outgoing bandwidth ?