1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271
|
filtering in tin
0. Status
This is an overview of the new filtering capabilities of tin. This
document should be absorbed in the main documentation at some time, when
the rewrite is finished.
1. Introduction
Tin's filter mechanism has changed. Originally there were only two
possibilities:
1) kill an article matching a rule.
2) mark an article matching a rule.
This led (me) to constant confusion, as it seemed important which rule
came first in the filter file, but it wasn't. Then if an article was
selected for whatever reason it couldn't be killed even if it was Craig
Shergold telling you how to make money fast in a crosspost to alt.test.
This binary concept isn't modern anyway, so a much more up-to-date fuzzy
mechanism was necessary: scoring.
When using tin's new scoring mechanism you assign a "score" to each
filter rule. The scores of rules matching the current article are added
and the final score of the article decides if it is regular, marked hot
or killed. All in all very simple, one wonders why this hasn't already
been done.
The standard "kill" and standard "select" already in your filter-file have
the score SCORE_KILL and SCORE_SELECT (-> 4) respectively.
2. Changes to the filter-file format
Tin understands the additional "score" command in the filter-file now.
Old:
scope=*
type=0
case=0
subj=*$$$*
New:
group=*
type=0
case=0
score=-100
subj=*$$$*
#####
So you can give the individual rule a weight, based on your opinion
about the rule. E.g. if you want to be sure to never read a certain
individual again, you may give the rule a score of (-)9000. The sign is
not important by now, the direction (+ or -) is still determined by the
"type" of a rule. So the following
group=*
type=0
case=0
score=100
subj=*$$$*
#####
will be equivalent with the rule above for now. When tin writes it's
filter-file the signs are rewritten to match the types. This is intended
for backwards *and* future compatibility. A future version of tin will
delete the "type"-line and write only the score.
If you want only "classical" filtering and don't want to mess around
with score values, you can use the magic words "kill" and "hot" as score
values in your filter file. Example:
group=*
type=0
case=0
score=kill
subj=*$$$*
#####
These are handled as default values at program initialization time and
may be somewhat easier to remember.
You might have noticed by the examples above that tin inserts a line of
hashes between two rules now. This is *not* required, it just improves
human readability of the filter file.
3. Changes in the filter menu
The filter menu got more compact, it fits easily on small terminals like
a small xterm or a 640x200 CON: window now. Additionally you can enter a
score for the rule you are adding. It should be in the range from 1 to
SCORE_MAX. (-> 4.) otherwise it will be defaulted to SCORE_DEFAULT.
(-> 4.)
4. Internal defaults
There are some constants defined in tin.h, search for SCORE in your
favorite editor.
SCORE_MAX is the maximum the scoring of an article can reach. Any value
in excess is cut to SCORE_MAX, equally with negative sign.
recommended: 10000
SCORE_DEFAULT is the default score, also used for quick kills and
selects.
recommended: 100
SCORE_KILL is the default score given for any kill rule, if no other
is specified.
recommended: -SCORE_DEFAULT
SCORE_SELECT is the default score for any auto-selection rule, if no
other is specified.
recommended: SCORE_DEFAULT
SCORE_LIM_KILL and SCORE_LIM_SEL are the limits that must be crossed to
mark an article as killed or selected.
WARNING: These will only be used as default in future releases, the
limits will be configurable at runtime.
recommended when used with values given above: -50/+50.
5. Overview of "filter"-commands
Everything here is also described in the file ~/.tin/filter, but
some things are more concise there.
command=value
possible "command"s sorted by field of use:
scope selection:
scope=pattern
group=newsgroup
was the "old" format. It is still understood, when tin reads the
filter-file. The subdivision in local and global filter was removed, so
when tin writes it, it is rewritten to:
group=newsgroup_pattern_list
newsgroup_pattern_list is a comma-separated list of newsgroup_patterns
newsgroup_patterns can be a pattern (wildmat-style) or !pattern,
negating the match of pattern. Yes, this is the same format as for the
AUTO(UN)SUBSCRIBE environment variable
Tin doesn't rework your filter file, the new pattern matching is only
used when you enter new entries by hand.
additional info:
type=num num: 0=kill, 1=autoselect
case=num num: 0=casesensitive, 1=caseinsensitive
score=num num: score value of rule, can now also be one of the magic words
"kill" or "hot", which are equivalent to
SCORE_KILL and SCORE_SELECT respectively.
time=num num: time_t value; when rule expires.
matches: matched to:
subj=pattern Subject:
from=pattern From:
msgid=pattern Message-Id: *AND* full References:
msgid_last=pattern Message-Id: and last last Reference:s entry only
msgid_only=pattern Message-Id:
lines=num Lines: ; <num matches less than, >num matches more than.
xref=pattern Xref: ; filter crossposts to groups matching pattern
xref_max=num } Xref: ; potentially obsolete, collects
xref_score=num,pattern } scores for crossposting in other
} newsgroups
When you are using wildmat pattern-matching, patterns in ~/.tin/filter
should be delimited with "*", verbatim wildcards in patterns must be
escaped with "\". When using the built-in filter-file functions, tin tries
to take care of it for itself, except when you are entering text in the
builtin kill/hot-menu. Then you have to quote manually because tin
doesn't know if e.g. "\[" is already quoted or not.
6. EXAMPLES
6.1 WILDMAT EXAMPLES
none given, too simple, find out yourself ,-)
6.2 REGEXP EXAMPLES
### this kills all articles about CNews, DNEWS or diablo
### in news.software.* but not in news.software.readers
group=news.software.*,!news.software.readers
type=0
case=1
score=kill
subj=([cd]news|diablo)
### this should mark all articles about tin, rtin, tind, ktin or cdtin as hot
group=*
type=1
case=1
score=hot
subj=\b((cd)|[rk]?)?tin(d|pre)?[-\.0-9]*\b
### mark own articles and followups to own articles as hot in all groups
### except local ones
### match From: (a bit complex) and/or
### Message-ID: (I'm the only user who's posting on this server)
group=*,!akk.*,!tin.*
type=1
case=1
score=hot
from=urs@(.*\.)?((akk\.uni-karlsruhe|dana)\.de|(karlsruhe|tin|akk)\.org|ka\.nu|argh\.net)
msgid=@akk3\.akk\.uni-karlsruhe\.de>
### stupid ppl. sometimes read control.cancel to see if there are any
### forged cancels around... the next rule helps you a bit
### ignore know despammers and net.* cancels
group=control.cancel
type=0
case=1
score=kill
from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net)
msgid_only=<net-monitor-cancel
### this might help when reading alt.*
### ignore all postings with $$$ or *** or !!!
### ignore all postings shorter then 3 lines
### ignore all postings crossposted into more then 10 groups
group=alt.*,
type=0
case=1
score=-200
subj=[\$\*\!]{3,}
lines=<3
xref_max=10
#####
### mark own articles and direct replies based on message-id
### use 2*hot as score to unkill otherwise killed articles
group=*
type=1
case=1
score=200
msgid_last=doeblitz\.ts\.rz\.tu-bs\.de
#####
### unmark own articles based on message-id
### -> only f'ups to own articles keep marked hot
group=*
type=0
case=1
score=-200
msgid_only=doeblitz\.ts\.rz\.tu-bs\.de
7. TODO
- make the time value in the filter file more human readable.
- convert SCORE_LIM_* to global variables
- rewrite filtering order to get optimal performance
- filtering on arbitrary header lines
- convert .tin/filter, make "type" obsolete, maybe completely new file
format
- throw out "xref*" when it is obsolete for sure
- maybe autoconf'able values?
|