1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
|
README FILE FOR DIABLO RELEASE 1.12
**** NOTICE NOTICE NOTICE
**** NOTICE NOTICE NOTICE DHISTORY FILE RELOAD REQUIRED ****
**** IF UPGRADING FROM 1.07 OR LOWER. NO RELOAD IS ****
**** REQUIRED IF UPGRADING FROM 1.08 or higher ****
---
** UPGRADING TO 1.xx FROM 1.07 or earlier requires the dhistory file to be
reload. Upgrading from 1.08 to 1.xx does not require the dhistory file
to be reloaded.
READ THE INSTALL INSTRUCTIONS *CAREFULLY* AND ALSO READ THE SECTION AT THE
END OF THE **INSTALL** FILE ENTITLED: 'UPGRADING TO 1.xx' if you are
ugprading from 1.07 or earlier to 1.08 or later.
---
DIABLO is a news transit server. It is designed to replace INN on a
newsfeeds machine. It is NOT currently designed to replace INN on a
newsreader machine. That is, Diablo only understands ihave,
mode stream, and related commands. It does not understand mode
reader NNTP commands, and its spool file format is not compatible with
INN. Diablo stores files in a spool, expires them, and maintains a
dhistory file, but has no concept of an active file. Articles are
named by message-id, so there is no link() problem.
Typically, anyone taking a full feed these days must dedicate a
machine to it that is separate from the newsreader machine that your
users use to read news. DIABLO is designed to replace the dedicated
newsfeeds machine.
DIABLO solves most of the problems INN has dealing with multiple
incoming and outgoing feeds and will typically increase the performance
of your newsfeeds machine by 5x. Diablo has been successfully deployed
at BEST Internet, whos newsfeeds machine is configured as follows:
* Diablo
* 128MB ram
* pentium pro 200 running FreeBSD
* 10BaseT into a FDDI-backed etherswitch
* three 4G ultra-wide barracuda disks (two striped to make the spool)
* 9+ fully transited full feeds
* 60+ outgoing feeds, including two to local newsreader machines
Switching from INN1.4unoff4 to Diablo yielded a 5x to 10x performance
increase in everything except disk I/O. Disk I/O performance increased
about 4x, mainly due to the forking nature of the Diablo server. At
peak, the news machine runs with a typical cpu load of 0.20, a
typical network aggregate of 300 to 700 KBytes/sec (that's BYTES/sec),
and a typical (estimated) I/O saturation of 20%, assuming the
system is seek-limited which it is pretty much. At peak,
FreeBSD typically uses around half the available memory for its buffer
cache.
OS REQUIREMENTS
You must be running a UNIX-compatible operating system that
supports shrared+read-only mmap()'s and POSIX fcntl locks.
flock() (or POSIX fcntl locks). Diablo will not compile otherwise.
As a matter of principle, I am requiring a minimum of ANSI-C
level compilation (e.g. prototypes must be supported), flock()/fcntl()
on local filesystems, and shared read-only mmap()'s.
Systems known to work: Linux, FreeBSD (2.2.2 or greater suggested),
SunOS, and Solaris. AIX is a probable.
Systems with problems: Alpha's (porting issues), BSDI releases prior
to 3.0 but I am also getting problem reports w/ BSDI 3.0, so for now
it doesn't work with BSDI (mmap() issues).
WHO SHOULD RUN DIABLO
If you are running separate newsfeed and newsreader machines, then
diablo is for you. Even more so if you are dealing with a lot of
feeds.
Note that Diablo is not able to supply slave feeds as INN can, so you
cannot use diablo to slave multiple INN boxes.
WHERE TO GET DIABLO
http://www.backplane.com/diablo/
REPORTING BUGS
send the bug to: diablo-bugs@backplane.com
send non-bug stuff to: diablo@backplane.com
email to the author: dillon@backplane.com (Matthew Dillon)
DISCUSSION
I was thinking news.software.nntp for now. I do not personally
like mailing lists as they are too difficult to read when the
posting rate goes up, even when digested.
MACHINE CONFIGURATION AND LOADING CONSIDERATIONS
Load point: network
A full feed runs about 45KBytes/sec. 25 full outgoing feeds, or 70
mixed feeds will fill up a 10BaseT ethernet. The number of incoming
feeds is usually irrellevant, but each full incoming feed should
be assumed to generate around 10 history file hits/sec.
You MUST switch your ethernet, whether it be 10 or 100BaseT, and *NOT*
hub it. This is absolutely necessary. Due to the streaming nature
of the connections and largish packets, you can physically max out
the wire over a switched connection without encountering collision
problems.
Load point: cpu
A pentium-pro 200 on a box running FreeBSD will run out of suds at
around 150th feed (mixed feeds). A pentium 90 can support around 60
feeds.
Load point: memory & I/O
(typical processes)
# of (mixed) feeds diablo dnewslink memory /news spool
# of striped 4G disks
1-3 6 4 92 MB 1 2
4-30 60 60 128 MB 1 2
31-60 90 (1) 90 192 MB 1 (2) 2
61-100 140 140 256 MB 2 3
note (1): When a news box is in catchup mode after being down for
a while, the number of incoming diablo server processes
will usually bloat due to remote feeders running more
connections in parallel. In this case, as many as 120
diablo processes may end up running.
If you have a lot of incoming feeds, take note that diablo
can typically take a full feed from each with only a single
connection per feed. A lot of bloat can be gotten rid of
by asking your feeds to make only one or two simultanious
connections to you, rather then the 5 or 6 some feeds like
to make.
note (2): As you approach the top-end of 60 feeds for this
configuration, I/O on the dhistory file may start to
saturate a non-striped disk when catching up.
Disk I/O, in general, is going to be seek-limited. Diablo can handle
up to around 60 feeds with a single fast 4G /news disk and an 8G
spool made up of two 4G drives striped together. After that point,
you may need a striped /news disk (two 2G disks striped together).
If you require longer term spool storage, I recommend you stick with
4G disks until you have four or more spindles, then go with fast 9G
disks (not hulkers... e.g. use something like the new seacrate 9G
disks). for example, four 4G disks striped together to make one
16G spool, or four 9G disks striped together to make one 32G spool.
The ultimate bottleneck will almost certainly be history file lookups
and appends, where one has many, many processes trying to access the
same inode. I already see FreeBSD kernel begin to limit out with
120 incoming connections. Memory and buffer cache tuning will help
this situation to a degree, and striping /news will also help. If you
actually approach this limit, you may want to consider increasing
the history file hash table size from 4 million entries (16 MByte
memory map) to 8 million entries (32 mbyte memory map), which will
shift more of the burden to the VM system and away from the I/O
syscalls.
Diablo does not support NFS. It uses fcntl() locks very heavily and
NFS is simply too slow. However, diablo will generally work well on
hardware SCSI-based RAID systems. It might even work well on modern
RAID 5 systems, but I would be extremely careful with anything beyond
RAID 1.
USE OF REALTIME FEEDS AND FEED DELAYS
If you have several outgoing feeds, you should consider using the
realtime and delay ( d# ) options in dnntpspool.ctl. all of your
local and internal feeds should be realtime. Cheap external paths
to the internet can also be realtime. To reduce the cost of running
outgoing feeds over your internet transit, you may wish to weight
the feeds according to cost. For example, our MAE-WEST connection
is a lot cheaper then our MCI T3, so I run outgoing feeds with
MAE-WEST destinations in realtime and run outgoing feeds which go
via MCI in batch mode with a 10 second delay. This way the articles
may actually propogate to the more expensive destinations via other
means prior to my actually attempting to send them direct.
Likewise, if you have T1 and frame customrs, it is usually cheaper
to supply them with a newsfeed yourself rather then force them to
go to someone over the internet. This way they are not eating your
transit bandwidth on newsfeeds. A realtime feed to those people is
best.
CATCHING UP AFTER BEING DOWN
The key item to monitor when catching up on incoming feeds after
being down for a while is the incoming article rate. Diablo will
generate a log line for every 1024 articles received that looks like
this:
Jun 24 11:03:59 news1 diablo[18153]: DIABLO uptime=7:46 arts=241.000K tested=0 bytes=1.842G fed=12.613M
You can calculate the article rate by looking at the delta activity
from two log lines that are around an hour apart from each other.
If the article rate is above 9 articles/sec, diablo is catching up
reasonably well.. as of today, a full feed is around 5 articles/sec.
With a moderate number of incoming feeds, diablo can do around 30
articles/sec. If you have a huge number of incoming feeds that are
all in catchup, in-kernel filesystem locking will begin to interfere
with the history file lookups and updates. Diablo will be able to
maintain a reasonable history file write transaction rate, but the
lookup rate will suffer.
This causes diablo to catch up on articles first without appreciably
reducing the backlog at remote sites due to slow check-responses.
Once it passes a certain threshold, however, and the load on the
history file turns to mostly-read rather then read/write, the
transaction rate will increase dramatically and diablo will generally
be able to cleanup the backlogs very quickly after that.
|