File: README.tcl_hook

package info (click to toggle)
inn2 2.2.2.2000.01.31-5
  • links: PTS
  • area: main
  • in suites: potato
  • size: 5,424 kB
  • ctags: 5,722
  • sloc: ansic: 61,219; perl: 9,939; sh: 5,644; makefile: 1,695; awk: 1,567; yacc: 1,548; lex: 249; tcl: 3
file content (86 lines) | stat: -rw-r--r-- 3,690 bytes parent folder | download | duplicates (14)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
Note, you need tcl 7.4. Rumour has it that 7.5 won't work.
---------------------------------------------------------------------------
Subject: TCL-based Filtering for INN 1.5
Date: Mon, 07 Feb 94 12:36:47 -0800
From: Bob Heiney <heiney@pa.dec.com>


Several times in the past few months, a site or two has started posting
the same article over and over again, but with a different message id. 
Usually this is caused by broken software (e.g. mail <-> news gateways,
which many have written, but few have written correctly). 
Occasionally, however, the reposting is intentional.  A recent example
would be the "Global Alert: Jesus Is Coming" message which was posted
to over 2200 newsgroups (each copy with its own message id).

I expect this to happen more often as the Internet continues its explosive
growth.  Although my site (decwrl) usually has enough excess capacity to
weather these problems, many other sites cannot.  One problem on
comp.sys.sgi.misc several months ago spewed 40MB of duplicate articles
before the offending sites were fixed, and this overflowed the spool at
many sites.  Even for sites with lots of resources, there's still no need
to propagate erroneous or malicious duplicates.

I wanted a way to protect my site that was highly specific, flexible, and
quick.

Examination of duplicated articles showed that although the message ids
were different, it was usually easy for a news admin to come up with a
few rules based on the headers of the article that could be used to
differentiate the duplicates from other articles.  (E.g. from
John.Doe@foo.com to comp.sys.sgi.misc with 'foobar' in the subject".) 
I concluded that modifying innd to let me say "kill things that look
like _this_" would solve my problem.

I also wanted to allow enough flexibilty in the design that I could
later work on automatic detection and elimination of excessive
duplicates (using a body checksum instead of headers).

Since I needed a fairly powerful language to do all this, and since the
world doesn't need yet another special language, my solution was to add TCL
support to INN.  I then modified "ARTpost" to call a TCL procedure which
could then accept or reject the article.  The TCL code has access to an
associative array called "Headers", which contains all of the articles
headers.  The TCL code may also call a 32-bit article-body checksum
procedure (this is to aid in future automatic detection of duplicates).

Here's what a sample TCL filter procedure looks like:

proc filter_news {} {
  global o Headers
  set sum [checksum_article]
  puts $o "$Headers(Message-ID) $sum"
  set newsgroups [split $Headers(Newsgroups) ,]
  foreach i $newsgroups {
    if {$i=="alt.test" && [string match "*heiney@pa.dec.com*" $Headers(From)]} {
      return "dont like alt.test from heiney"
    }
  }
  return "accept"
}

The above TCL code does a few things.  First it computes a 32-bit
checksum and writes it and the message ID to a file.  It then rejects
articles from me to alt.test.

The work I've done is totally integrated into the INN build and runtime
environments.  For example, to turn filtering off, you'd just type

	ctlinnd filter n

To reload the TCL code that does the filtering, you just say

	ctlinnd reload filter.tcl 'your comment here'

(You may specify TCL callbacks to be executed right before and/or right
after reloading, in case your filter is doing fancy stuff.)  See the
ctlinnd man page for more info.

Filtering capability that's this powerful can be used for many
purposes, some benign and useful (excessive duplicate detections,
on-the-fly statistics), others abusive.  I would ask that news admins
think carefully about any filtering they do.

/Bob