1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364
|
* How to use scoring with the slrn newsreader
** Origin of this file
This text was retrieved from [[https://web.archive.org/web/20210731115453/https://www.slrn.org/docs/score.html][wayback machine]] as the slrn site appears to be
defunct, and reformatted (not very well) with gitlab markdown. This
describes the format of the score file as used by that newsreader and
may not be entirely correct for pan.
Original authors:
- John E. Davis [[mailto:davis@space.mit.edu][davis@space.mit.edu]]
- Thomas Schultz [[mailto:tststs@gmx.de][tststs@gmx.de]]
- Peter J Ross [[mailto:peadar.ruadh@gmail.com][peadar.ruadh@gmail.com]]
Version 0.9.9p1, October 2008.
* An introduction to scoring
:PROPERTIES:
:CUSTOM_ID: an-introduction-to-scoring
:END:
** General description of scoring
:PROPERTIES:
:CUSTOM_ID: general-description-of-scoring
:END:
slrn awards an article points by giving it a score. If the score for the
article is less than max_low_score (zero by default), the article is
marked as read. If the score is less than or equal to kill_score (-9999
by default), the article is killed. If the score is greater than or
equal to min_high_score (1 by default), the article is marked as
important. The purpose of the score file is to define the set of tests
that an article must go through to determine the score.
Although the score may be based on ANY header item, it is recommended
that one stick with the information found in the news overview data when
scoring in slrn (slrnpull gets full headers anyways, so it does not make
a difference there). The news overview file typically contains:
- Subject
- From
- Date
- Message-ID
- References
- Bytes
- Lines
plus any header that your news admin decided to include here (usually,
=Xref= is one of them). slrn offers two special keywords that also allow
efficient scoring: =Newsgroup= (the newsgroup that the article is part
of) and =Age= (the age of the article in days).
** Location of the scorefile
:PROPERTIES:
:CUSTOM_ID: location-of-the-scorefile
:END:
The location of the scorefile is specified with the scorefile
configuration variable. E.g.,
#+begin_example
set scorefile "scores/default.score"
#+end_example
Note: The file must be created before it can be used.
Tip: Using the =.score= filename suffix may enable appropriate syntax
highlighting in some editors, such as vim.
** Editing the scorefile
:PROPERTIES:
:CUSTOM_ID: editing-the-scorefile
:END:
The scorefile may be edited while slrn isn't running, or from within
slrn with the create_score interactive function in article mode. By
default, this function is bound to the =K= key. Using =K= alone gives an
interactive choice of popular scoring options. To edit the scorefile
with a text editor use a prefix argument (=ESC 1 K=); you'll be
presented with a commented-out scorefile entry for editing.
Tip: When learning scoring, reading the commented-out lines that slrn
adds to the scorefile when creating scores interactively is likely to be
as helpful as reading documentation.
* An explanation of the scorefile format
:PROPERTIES:
:CUSTOM_ID: an-explanation-of-the-scorefile-format
:END:
The format of the file is very simple (See below for an explicit
example). The file is divided into sections delimited by a newsgroup or
newsgroups enclosed in square brackets, e.g.,
#+begin_example
[rec.crafts.*, rec.hobbies.*]
#+end_example
The name may contain the =*= wild card character.
Comments begin with the =%= character. Leading whitespace is ignored.
Each section consists of comment lines, empty lines or keyword lines.
Only the keyword lines are meaningful and all leading whitespace is
ignored. A keyword line begins with the name of the keyword followed
immediately by one or two colons and one space. The rest of the line
usually consists of a regular expression. The keyword may be prefixed by
the =~= character to signify that the regular expression should not
match the object specified by the keyword.
A group of keywords defines a test that is given to the header of the
article. The =Score= keyword is used to assign a score to the header. If
it is followed by a single colon, the score is only given if all tests
are passed (logical AND); two colons indicate that the score should be
awarded if any of the tests are passed (logical OR). The score can be
any positive or negative integer. If the numerical value of the score is
prefixed by an equals sign, score processing for the header is stopped
and the header will be given the score for that test.
Note: The =Score= keyword also serves to delimit tests. You can
optionally add a comment behind the score, which will then be used as
the name of the scorefile entry and displayed when using view_scores
(=v=) in article mode. Here is an example of this:
#+begin_example
Score: 100 % optional name here
#+end_example
All keywords except for =Score= and =Expires= may be prefixed by the =~=
character. If the =Expires= keyword appears, it must immediately follow
the =Score= keyword. The =Expires= keyword may be used to indicate that
the test is no longer to be applied on the date specified by the
keyword. For example,
#+begin_example
Expires: 4/1/2010 (or: 1-4-2010)
#+end_example
implies that the given test is no longer valid on or after April
first 2010. As the example indicates, the date must be specified using
either the format MM/DD/YYYY or DD-MM-YYYY. Note: DO NOT CONFUSE THIS
WITH THE EXPIRES HEADER KEYWORD.
The =Lines=, =Bytes=, =Age= and =Has-Body= keywords are also special.
Their value is not a regular expression, rather, a simple integer.
=Lines= and =Bytes= may be used to kill articles which contain too many
or too few lines / bytes. For example,
#+begin_example
Score: -100
Bytes: 20480
#+end_example
assigns a score of -100 to articles that are larger than 20 kB. Please
keep in mind that =Bytes:= is only available when getting overview data
and will otherwise (e.g. in slrnpull) be set to 0.
Similarly, the test
#+begin_example
Score: -100
~Lines: 3
#+end_example
assigns a score to articles that have less than or equal to 3 lines.
=Age= can be used to score articles which are younger than N days. For
example:
#+begin_example
Score: 10
Age: 7
#+end_example
adds 10 points to the score of each article that is at most one week
old. You can use negation (=~=) to score articles that are older than N
days.
=Has-Body= can be used when reading offline in combination with
slrnpull: You can tell slrnpull to download only article headers by
default and fetch article bodies on request. In this case, you can use a
rule like
#+begin_example
Score: 20
Has-Body: 1
#+end_example
to give each article that does have a body 20 points. You can invert
this (i.e. score articles without bodies) either by using negation (=~=)
or by writing =Has-Body: 0=. Values other than 0 or 1 have no meaning.
Finally a score file may include other score files via the =include=
statement. The syntax is simple:
#+begin_example
include FILE
#+end_example
The name of the file is considered to be relative to the directory of
the file including it, unless an absolute path is specified. For
instance, suppose =/home/john/News/Score= contains
#+begin_example
include /usr/local/share/slrn/score
include score_spam
#+end_example
and =/usr/local/share/slrn/score= contains the line:
#+begin_example
include score_spam
#+end_example
In the first instance, =score_spam= will be read from the directory
=/home/john/News= but in the second instance it will be read from
=/usr/local/share/slrn=.
* A sample slrn score file
:PROPERTIES:
:CUSTOM_ID: a-sample-slrn-score-file
:END:
#+begin_example
[news.software.readers]
Score: =1000
% All slrn articles are good
Subject: slrn
Score: 1000
% This is someone I want to hear from
From: davis@space.mit.edu
Score: -9999
Subject: <agent>
[comp.os.linux.*]
Score: -10
Expires: 1/1/2010
Subject: swap
Score: 20
Subject: SunOS
Score: 50
From: Linus
% Kill all articles cross posted to an advocacy group
Score: -9999
Xref: advocacy
~From: Linus
% This person I want nothing to do with unless he posts about
% 'gizmos' but only in comp.os.linux.development.*
Score: -9999
From: someone@who.knows.where
~Subject: gizmo
~Newsgroup: development
[~misc.invest.*, misc.taxes]
Score:: -9999
Subject: Earn Money
Subject: Earn $
#+end_example
* Explanatory notes for the sample scorefile
:PROPERTIES:
:CUSTOM_ID: explanatory-notes-for-the-sample-scorefile
:END:
This file consists of three sections. The first section defines a set of
tests applied to the news.software.readers newsgroups. The second
section applies to the comp.os.linux newsgroups. The final section
applies to ALL newsgroups EXCEPT misc.invest.* and misc.taxes (see
below).
The first section consists of three tests. The first test applies a
score of 1000 to any subject that contains the string =slrn=. The second
test applies to the =From=. It says that any article from
davis@space.mit.edu has its score increased by 1000. The third test
reduces by -9999 the score of any article whose subject contains the
word =agent=. Since tests are applied in order, if an article contains
both =slrn= and =agent=, it will be given a score of 1000 since the
value is prefixed with an equal sign.
The second section is more complex. It applies to the comp.os.linux
newsgroups and consists of 5 tests. The first three are simple: -10
points are given if the subject contains =swap=, 20 if it contains
=SunOS=, and 50 if the article is from someone named =Linus=. This means
that if Bill@Somewhere writes an article whose subject is
=Swap, Swap, Swap=, the article is given -10 points. However, if Linus
writes an article with the same title, it is given -10 + 50 = 40 points.
Note that the first test expires at the beginning of 2010.
The fourth test kills all articles that were cross-posted to an advocacy
newsgroup UNLESS they were posted by Linus. Note that if a keyword
begins with the =~= character, the effect of the regular expression is
reversed.
The fifth test serves to filter out posts from someone@who.knows.where
unless he posts about 'gizmos' in one of the comp.os.development
newsgroups. Again note the =~= character.
The final section of the score file begins with the line
#+begin_example
[~ misc.invest.*, misc.taxes]
#+end_example
If the first character following the opening square bracket is =~=, then
the newsgroup or newsgroups contained in the brackets are NOT to be
matched. That is, the =~= character is used to denote the boolean NOT
operation.
For writing even more complex entries, slrn now allows the grouping of
scorefile rules. Here is a simple example:
#+begin_example
Score:: -1000
~Subject: c[a-z]
{:
Subject: ^Re:
~Subject: ^Re:.*c[a-z]
}
#+end_example
Lines enclosed in curly braces are grouped; the initial brace is
followed by one or two colons that indicate whether only one (=::=) or
all of the lines (=:=) inside the group need to match for the group to
pass.
As the result, the example kills subject header lines that do not
contain lowercase characters, not counting an initial =Re:=.
* IMPORTANT - pan extensions/clarifications
:PROPERTIES:
:CUSTOM_ID: important---pan-extensionsclarifications
:END:
** Comments
:PROPERTIES:
:CUSTOM_ID: comments
:END:
=#= is allowed as a comment line marker.
** Expiry
:PROPERTIES:
:CUSTOM_ID: expiry
:END:
Rule expiry is evaluated at the time the scorefile is read, not at the
time the rules are evaluated. this may be of importance if you leave pan
running for a long time
** Keywords
:PROPERTIES:
:CUSTOM_ID: keywords
:END:
Keywords are not case sensitive, so =Score: =1000= and =score: =1000=
will both set articles matching the following rule to a score of 1000.
** Keyword matches
:PROPERTIES:
:CUSTOM_ID: keyword-matches
:END:
For keyword matches (such as Subject:), it is possible to specify the
rule with =.
- With a =:= rule (e.g. =Subject: ^Re=), the match is case insensitive.
- With a === rule (e.g. =Subject= ^Re=), the match is case sensitive.
Historical note: Compatible with XNews
|