Bug in spam cache file descriptor handling fixed... fcntl locks
require us to obtain discrete descriptors by calling open() after
any fork rather then before or they will not work properly if
a process gets killed. Due to the stability of diablo, this should
not have caused any problems in 1.12, but needed to be fixed anyway.
Bug in dnewslink quit code.. it would send the quit twice at the
end which would result in an <interrupt/error> in the logs. Apart
from the bogus log message, the bug had no other effect. This threw
a few people off who were looking for errors when, in fact, there
Bug in dnewslink reconnect code. When moving on to a new batchfile,
does not fix StreamPend when pending streaming objects are refiled
due to a timeout or remote close. Would result in dnewslink exiting
after processing one queue file rather then going on to the next
Bug in dnewslink close/reopen (for logging) code... it would again
loose track of StreamPend. It would also lose track of the number
of potential receive bytes (which it calculates to guarentee that the
server does not block writing responses back to the dnewslink client).
Iain's latest diablo-stats included
The spam cache is turned on by default, for real this time.
Added USE_PCOMMIT_SHM, USE_PCOMMIT_RW_MAP, DO_PCOMMIT_POSTCACHE,
USE_SPAM_SHM, and USE_SPAM_RW_MAP. *_RW_MAP causes the diablo to
use a shared r/w map rather then a read-only/lseek+write() mmap for
the precommit and/or spam caches. << this will significantly improve
the performance and stability of diablo >>. If you set
USE_PCOMMIT_SHM, Diablo will use SYSV shared memory rather then a file
mmap for the precommit cache. If you set USE_SPAM_SHM, Diablo will
use SYSV shared memory rather then a file mmap for the spam cache, but
it will not be non-volatile storage so if you kill and restart diablo,
the spam cache will get reset. If you have sysv shared memory, you
want to set USE_PCOMMIT_SHM and DO_PCOMMIT_POSTCACHE at the very least.
When shared memory is used for the spam filter, the spam.cache file
(if it exists) will be loaded into shared memory when diablo is started
and written out when the master diablo server exits.
DO_PCOMMIT_POSTCACHE tells diablo to use the precommit cache to also
cache dhistory lookups and commits. It is not suggested that you
use this feature unless you also set USE_PCOMMIT_SHM or at least
set USE_PCOMMIT_RW_MAP. Many lookups will hit the precommit cache
and thus avoid hitting the dhistory file, greatly reducing internal
kernel filesystem lock contention and disk I/O on the dhistory file
The FreeBSD 2.2.x, IRIX, and Solaris config automatically
turn on the new precommit features.
Casts the pointer argument to setsockopt() to void * to avoid
compiler warnings on solaris (which still uses arcane 'char *' in
its prototypes rather then 'void *').
The bytes= logging for the marks by dnewslink was incorrect, it was
logging cumulative bytes rather then deltas for the marks.
Added 'rtflush' to dnewsfeeds. This option causes the queue file to
be flushed on every line rather then buffered. Useful when used with
'realtime' in dnntpspool.ctl.
Added Path: name checking. If the first element of the Path:
received by an article does not match any 'alias' statements
for the incoming connection, the IP address is prepended
to the path: with .MISMATCH appended.
* >>> NOTE <<< you should grep through newly created spool
directories every so often looking for .MISMATCH in the spool
files to locate incoming feeds with improperly configured
'alias's (in dnewsfeeds). When I turned this feature on,
I found that four of my 80+ feeds were misconfigured.
DNewsfeeds file processing moved to the main diablo server and
removed from the children, saving parsing and memory overhead
of around 45K per child in heavy loaded diablo systems (x 100
processes = 4.5 MBytes saved).
Queue-delay 'q#' option added to dnntpspool.ctl. This allows you
to tell diablo to purposefully delay N queue files before transfering
a feed to a destination, thus introducing a feed delay on purpose.
This feature can be used to hold articles while allowing control
messages to propogate, getting the cancels in front of the articles
Diablo now uses it's own memory allocation code in order to better
manage memory. This has simplified certain memory management
operations, such as when the parent forks and needs to deallocate
memory pools that the child will not use.
Minor fix to dnewslink: Now exits if it gets a 400 return
code (ERR_GOODBYE) rather then retry the connection.
Reverse dns lookup now uses case insensitive compare against forward
The post-fork openlog() for syslog was being called prior to the
mass file descriptor closures. Moved so it is called after. Doh!
Beta 64 bit support (e.g. linux on 64 bit alphas and such)
Diablo V1.11 -- NOTE: new pseudo groups control.* added to the group list
for control messages. This requires some minor
modifications to your dnewsfeeds file so you do not get
all control messages when taking a partial feed. See
Added control.* pseudo groups. If an article is a control message,
the control message type is appended to the list of newsgroups
as 'control.MESSAGETYPE'. For example, a cancel is appended to
the list of newsgroups as 'control.cancel'.
This means that if you do an 'addgroup *' then use delgroup to remove
the groups you don't want, YOU WILL STILL RECEIVE CONTROL MESSAGES
FOR ALL GROUPS. The solution is to put a 'delgroup control.*' after
the 'addgroup *' for all of your normal feeds. If you do this,
only control messages for the groups that pass the filter will be
If you use the more standard 'delgroup *' followed by 'addgroup ...'
lines for the groups you want, the delgroup covers it and no
modifications are required for that feed.
I have added another filter command called 'requiregroup'. It is
similiar to addgroup, delgroup, and delgroupany. What it does,
however, is require that the specified group BE in the Newsgroups
list. This allows you to create a secondary feed to your newsreader
boxes containing ONLY the control messages that also pass your normal
group filters by appending 'requiregroup control.*' to the end
of your addgroup/delgroup filter commands. Please see the example in
the samples/dnewsfeeds file for more information. Note that control
bypasses generally require two dnewsfeeds labels, one for non-control
messages and one for control messages.
Changed a bunch of printf's for stdout to fprintf's for stderr.
samples/dnewsfeeds incorrectly described the 'alias' command in
the comments, fixed.
Added Iain's bytes= stats to diablo and his latest stats stuff
to contrib. Also added -h option to diablo (see man page).
'T' and 'R' parameters in dnntpspool.ctl now work, allowing you to
set the transmit and receive buffer sizes on a connection-by-connection
dnewslink now calls the socketopt to set the transmit/receive buffer
sizes after the connect as well as before, in case the connect()
call overrides the first ones (which it does on FreeBSD and linux
boxes, since the route table dictates nominal buffering parameters)
Ability to set the dhistory hash table size in the diload command.
The default is 4 million entries, equivalent to the '-h 4m' option
to diload. Each hash table entry is 4 bytes, so 4 million entries
results in a 16MByte hash table. The hash table size must be a power
of 2, so the next logical step is -h 8m (32 MBytes) or
-h 16m (64 MBytes). If your news box has a lot of memory, changing
your biweekly.atrim script (see the adm directory for a sample) to
generate a larger hash table will greatly reduce the load on the
'-n' and '-f configfile' options added to dspoolout. Minor adjustments
made to realtime code to handle a race condition between diablo
creating the realtime spoolfile and dspoolout trying to open it.
Minor adjustments made to realtime code in dnewslink to handle a
race condition between diablo creating the realtime spoolfile and
dnewslink closing out the previous one and openning the next one.
DNewslink now sends the 'quit' command at the end of the session
and waits for a response.
Fixed another session reporting bug, diablo was not logging the
correct number of elapsed seconds (non critical).
New options added for lib/vendor.h. You can set the USE_SHORT_REMEMBER
define to 1 to use a shorter history retention for didump -x & diablo,
or you can leave that option commented out and specify a specific
REMEMBERDAYS. The default remember is 14 days, the default short
remember is 7 days. Diablo adds a bit of slop internally to deal with
Date: conversion errors. Usually the bottleneck that develops
first is in access to the dhistory file. A shorter remember/retention
can significantly improve access times.
/* #undef USE_SHORT_REMEMBER */
/* #define USE_SHORT_REMEMBER 1 */
/* #define REMEMBERDAYS 9 */
NOTE(!) always be sure to recompile diablo completely when making
#define option changes. 'xmake clean; xmake'. The history retention
is especially sensitive since it must be properly compiled into both
the DIDUMP and DIABLO binaries. If you update one but not the other
it is possible to get into article transfer loops with your peers.
Diablo now identifies itself on 502's rather then using the same
error essage as INN.
The ME label in dnewsfeeds must be renamed to DEFAULT, which is
more appropriate to how it works. A new GLOBAL label is now optional.
See samples/dnewsfeeds for more information.
SPAM FILTER - based on NNTP-Posting-Host: header frequency in messages.
Defaults to ON. See the sample dnewsfeeds file for examples of
how to program the spam filter. The -S0 option to diablo will turn
it off, or you can simply 'delspam *' in the GLOBAL label to turn it
off. To use this filter, you need to specify 'delspam' lines in
the GLOBAL filter to handle non-spam sources that exceed the rate
limit (which defaults to 16 postings within 16 minutes). The
samples/dnewsfeeds file contains a simple filter set. You will
also want to 'delspam *' for the incoming feeds from your leaf
node shell machines.
However, since this sort of rate filter may become more common-place
on the internet, another solution will have to be found for
shell-based news readers if the posting rate exceeds 16 postings
per minute. This spam filter is NOT PERFECT. There are plenty of
cases where it can potential filter non-spam, but it works well
enough for us that I've installed it on BEST's newsfeeds box.
NEAR-REALTIME OUTGOING FEEDS - dspoolout/dnewslink now have the
capability to 'hang' on the diablo's outgoing feed file, initiating
transactions with the destination host as the diablo server makes
data available. Since the diablo server buffers feed data, this
is considered to be only near-realtime. The lag is around 5 seconds
for a full feed. You can also explicitly flush diablo's queue files
with 'dicmd flush' from a cron job to support slower n.r.t. feeds.
'dicmd flush' is considerably cheaper to execute then 'dspoolout'.
dspoolout would then only need to be run once every 30 minutes or so.
The original queue file mechanism still works and runs in parallel
as a failsafe, resulting in double the message-id load on the
destinations designated for realtime operation. See the dspoolout
manual page for more information.
dspoolout was not handling the min-flush-seconds option exactly right.
It should now do a better job with it, only using it if the queue
files are truely backed up.
dnewslink now turns on SO_KEEPALIVE to prevent infinite hangs.
Diablo already did this.
Fixed minor bug in Date: parsing on incoming feeds. It prevented
the article-too-old stuff from working properly.
Fixed minor bug in header scanning - Diablo was not detecting the
end-of-headers blank line properly and scanning headers on into
the body of the article.
A bottleneck for dhistory file appends has been removed. Previously,
an exclusive lock was used to append to the file (O_APPEND is not
dependable). Now we 'allocate' space with a record lock in a
non-blocking fashion. History adds are now much faster under extreme
diload throws away history file entries with zero'd gm timestamps or
zero'd hash codes.
The size of the send and receive tcp socket buffers can be specified
in dspoolout, dnewslink, and diablo now with command line options
(-T and -R)
Diablo now properly logs the elapsed time on disconnect.
NOTE: addition to adm/biweekly.atrim sample, the dhistory.bak
file is now removed before the new dhistory file is generated to
make more space on the disk. This allows a 1.5G /news partition to
continue to be used in the face of growing history files.
'dicmd status' added, returns diablo's current status
The unix domain socket is created after diablo switches to the 'news'
user rather then before.
-M option added to diablo. This limits the maximum number of
simultanious connections allowed from EACH remote host. The limit
cannot yet be set on a per-feed basis. This option is designed
to prevent system failures from out of control remotes.
Major bug in dnewslink found by Michael S. McMahon. When dnewslink
gets interrupted and refiles pending stream transactions back to
the batch file, it was not refiling the offset,size part! A rerun
would thus result in the entire article spool file (encompassing many
articles) to be sent as a single article. Ouch.
Precommit caching added. Diablo now generates and maintains a
pcommit.cache file which it mmap's. This file contains a
static 4096-entry hash table. Message-id's for check and ihave
commands are entered into the hash table and timeout after 30 seconds.
hash table collisions are simply overwritten. During the time which
a message-id is active, other check/ihave commands for the id will
return a DUPlicate response code. The precommit caching will get
rid of 98% of the article collisions when you have several incoming
feeds that are mostly synchronized with each other. In this situation,
the precommit caching will reduce your incoming network load by at
least a factor of 2 as well as reduce the disk write load.
Three new statistical elements have been added: predup, posdup, and
pcoll. (predup and posdup were actually added in V1.08). predup
counts the number of history collisions that occur as of the
beginning of a 'takethis' command. posdup counts the number of history
collisions that occur as of the article commit after a 'takethis'
command, and pcoll counts the number of pcommit cache collisions
which cause check/ihave to return a DUPlicated status.
The random seed used to generate file names was not being re-randomized
after a fork. Oops! It is now.
Replaced stdio routines used for socket input by diablo with custom
routines, removing the fileno(fi) = -1 problem and working around
apparent stdio bugs in SunOs, Solaris, and IRIX.
Article reception can deal with 8-bit-clean data now, as long as it
is properly '.' escaped. This includes removing the CR before an LF
for storage and adding it back in for retransmission.
Added 'd' option to dnntpspool.ctl, allowing one to specify a startup
delay for the dnewslink's related to a particular site.
statvfs fixes made for sun/solaris. I got several diffs and chose
the easiest one. It may still not be entirely compatible with
all sunos/solaris implementations.
The control file dspoolout uses (dnntpspool.ctl) can now be overriden
on the command line.
You can now specify the outbound (bind) ip address for dnewslink...
useful if you have multiple interfaces and want to split outgoing
traffic amoungst them, or if you have multiple interfaces and your
peers expect the news to come in from a particular IP. This can
also be specified in dnntpspool.ctl. DSpoolout will pass it on to
DNewslink. See samples/dnntpspool.ctl for an example.
New version of Iain's stats package.
Diablo V1.08 *** NOTE ** DHISTORY FILE RELOAD REQUIRED IF UPGRADING ***
** HISTORY AND SPOOL FILE FORMAT CHANGE. The format change requires
that you ensure that ALL of your diablo binaries are replaced. You
must regenerate your dhistory file
Major stability fixes to dhistory file operation... I have found that
if the /news partition runs out of space or many processes are
write()ing to dhistory in O_APPEND mode, that O_APPEND writes aren't
as atomic as I thought they were. dhistory appends are now
serialized and *ACTIVELY* realigned if their alignment gets screwed
up. This should make the dhistory file much, much, much more robust.
Major stability fixes to dqueue files. Feeds no longer must be
flushed on fork, and files are written in multiples of one line.
If a write files, the file is truncated to remove the partial line
in order to prevent corruption if a disk fills up.
Stability fixes to article files... now does a sanity check (looks
for the \0 terminator for the article in a multi-article file) to
catch corrupted files.
didump -x does a better job filtering out bogus dhistory entries.
Major, Major, Major changes in the spool file format, dhistory file
format, and queue file format. Read the section at the end of the
INSTALL file for instructions on how to upgrade without blowing up
your news server.
* spool files may now contain multiple articles separated by \0.
spool files are now named solely based on their gmtime and
* queue files include offset/length pairs to allow dnewslink to
access the new spool files.
* The dhistory file now has a proper header structure, and History
entries now contain two additional fields (offset, bytes) .. used
to store the offset,length information for article lookups.
The dynamic pipe sizing has been removed from dnewslink. It
doesn't work very well when feeds get behind and just slows
things down more.
dexpire is temporarily braindead... it now expires in straight FIFO
fashion, so most of dexpire.ctl is no longer relevant (except for
expirations of 0 which cause an article to be rejected)... I have
a medium term plan to make expiration specifications work again, but
it will be a few releases at the very least.
diablo, dnewslink, and dexpire now use chdir() caching to reduce
the path lengths for remove(), open(), etc... which is a big win
for the kernel.
diablo now records rejection statistics properly. It previously
did not count.
diablo now records and returns the errno error string on fatal
addition of 'delgroupany' command to dnewsfeeds file. Similar
to 'delgroup', except acts the same as INN's '@group'... i.e.
if the group appears in the Newsgroups line at all, the article
is rejected, even if there are other groups in the line that pass
addition of 'maxpath' command to dnewsfeeds file. Allows you
to filter feed based on the number of elements in the Path:
addition of the 'groupdef' command and 'groupref' command (see the
sample dnewsfeeds file). Rather then repeat the same group access
list over and over again in each feed, you can now collect them
together in one place and reference them from your feeds.
statvfs support added for SUN.
Diablo V1.07 - ** NOTE ** YOU MUST READ THE 'INSTALL' FILE IF UPGRADING
TO THIS RELEASE FROM 1.05 OR LOWER !!!
The diablo server now has the ability to bind to a specific host and/or
diablo converts the reverse-lookup hostname to lowercase before
comparing against hosts in the diablo.hosts file.
'nostream' option now available for dnntpspool.ctl for dnewslink runs.
Diablo now rejects articles when the message-id in the article
body does not match the message-id in the ihave or takethis header,
and logs the mismatch. Diablo previously stored the ihave/takethis
message-id into the history file without checking it against the
message-body. If a mismatch occurs, the article is rejected.
(should I do a 400+exit instead?)
Misc small bug fixes to diablo
Diablo now returns a 400 error code and exits if a file error occurs.
It previously returned a 431 error code, which was incorrect. 400
isn't exactly correct either, since it requires diablo to exit. We
need another return code to allow the server to indicate a problem yet
not exit (i.e. let the client decide to exit).
Diablo V1.06 - ** NOTE ** YOU MUST READ THE 'INSTALL' FILE IF UPGRADING
TO THIS RELEASE!!!!
The default hash table size for dhistory will become USE_LARGE_HASH
in this upcoming release. If you wish to maintain a small hash table
size (e.g. you are taking a partial feed), you must set the
USE_SMALL_HASH define to 1 in lib/vendor.h
dnewslink will still attempt to send news if the (nntp) server
responds with code 201 rather then code 200.
** It is suggested that the dclean admin script be run twice a week
rather then once a day due to it's load on the history file
dnewslink now dynamically adjusts the size of the pipe for streaming
feeds. If dnewslink detects an inordinate number of
check/ok/takethis/reject collisions, it reduces the size of the
pipeline. The pipeline is also dynamically adjusted back up when
the collisions subside.
More /bin/sh script fixes in contrib/XMakefile... improper semicolons
Fixed bug in dexpire that caused it to loop endlessly without getting
much work done.
Added parallel-tasking option, -p# (as in -p4) to dexpire, causing it
to fork N times in order to run multiple unlink()'s in parallel
without the processes stepping over each other's feet.
Switched from whole-file locks on dhistotry to record locks. Diablo
was having lock starvation problems due to the large number of
processes accessing the history file.
By using record locks, we lock a particular hash chain rather then
the entire file. This should make machines with a large number of
feeds more efficient.
EFFICIENCY NOTICE: We strongly suggest that you set the
USE_LARGE_HASH define to 1 in lib/vendor.h and follow the
instructions at the end of the INSTALL file to increase the
size of the dhistory file hash table. Doing so will yield a
major increase in efficiency on machines with limited buffer caches.
Specifically, a 128MB machine may not have enough free space for
a good buffer cache and the large number of seek/reads caused
by the smaller hash table can result in seek starvation on /news.
Fixed a 436 return code into a 431 return code for streaming
'try it again' responses.
Now uses floating point to store and log byte and article totals
Fixed bug in the article totaling (for logs) - Diablo was reporting
more received articles then were actually received in the DIABLO
Diablo incoming now logs every 1024 articles rather then every 1000
for a better looking K/M/G log line.
Added 'maxcross' and 'maxsize' filters for outgoing feeds.
Uses mmap() to allocate memory, either via MAP_ANON or MAP_PRIVATE,
allowing us to easily deallocate it from the page table on fork.
We use this for stdio buffers as well as for pipe line buffers.
More portability fixes: directory stuff, signal stuff, etc...
Fixed two file descriptor leaks in the main server. There are not
likely to be any more.
Made portability changes: uses sigaction() rather then signal(),
and uses fcntl() rather then flock() (though SUN should get it's
act together and implement flock().. it's trivial).
Moved some .SET's from XMakefile to XMakefile.inc
The sample crontab in adm/ ran the once-every-four-hours
expire once a minute instead, and I didn't notice!
dnewslink now reports local/remote latency properly. Well,
as well as can be done, anyway, and also reports the size
of the article in debug mode.