File: Junk.html

package info (click to toggle)
mh-e 8.0.3-3
  • links: PTS
  • area: main
  • in suites: lenny
  • size: 8,976 kB
  • ctags: 4,351
  • sloc: lisp: 18,183; makefile: 459; sh: 97
file content (366 lines) | stat: -rw-r--r-- 19,350 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
<html lang="en">
<head>
<title>Junk - The MH-E Manual</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The MH-E Manual">
<meta name="generator" content="makeinfo 4.8">
<link title="Top" rel="start" href="index.html#Top">
<link rel="prev" href="Sequences.html#Sequences" title="Sequences">
<link rel="next" href="Miscellaneous.html#Miscellaneous" title="Miscellaneous">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This is version 8.0.3 of `The MH-E
Manual', last updated 2006-11-12.

Copyright (C) 1995, 2001, 2002, 2003, 2005, 2006 Free
Software Foundation, Inc.

     The MH-E manual is free documentation; you can redistribute it
     and/or modify it under the terms of either:

       a. the GNU Free Documentation License, Version 1.2 or any later
          version published by the Free Software Foundation; with no
          Invariant Sections, no Front-Cover Texts, and no Back-Cover
          Texts.

       b. the GNU General Public License as published by the Free
          Software Foundation; either version 2, or (at your option)
          any later version.
     The MH-E manual is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
     General Public License or GNU Free Documentation License for more
     details.

     The GNU General Public License and the GNU Free Documentation
     License appear as appendices to this document. You may also
     request copies by writing to the Free Software Foundation, Inc.,
     51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
   -->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
  pre.display { font-family:inherit }
  pre.format  { font-family:inherit }
  pre.smalldisplay { font-family:inherit; font-size:smaller }
  pre.smallformat  { font-family:inherit; font-size:smaller }
  pre.smallexample { font-size:smaller }
  pre.smalllisp    { font-size:smaller }
  span.sc    { font-variant:small-caps }
  span.roman { font-family:serif; font-weight:normal; } 
  span.sansserif { font-family:sans-serif; font-weight:normal; } 
--></style>
</head>
<body>
<div class="node">
<p>
<a name="Junk"></a>
Next:&nbsp;<a rel="next" accesskey="n" href="Miscellaneous.html#Miscellaneous">Miscellaneous</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="Sequences.html#Sequences">Sequences</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="index.html#Top">Top</a>
<hr>
</div>

<h2 class="chapter">19 Dealing With Junk Mail</h2>

<p><a name="index-Marshall-Rose-1972"></a><a name="index-junk-mail-1973"></a><a name="index-spam-1974"></a>
Marshall Rose once wrote a paper on MH entitled, <cite>How to process
200 messages a day and still get some real work done</cite>. This chapter
could be entitled, <cite>How to process 1000 spams a day and still get
some real work done</cite>.

   <p><a name="index-blacklisting-1975"></a><a name="index-ham-1976"></a><a name="index-viruses-1977"></a><a name="index-whitelisting-1978"></a><a name="index-worms-1979"></a>
We use the terms <dfn>junk mail</dfn> and <dfn>spam</dfn> interchangeably for
any unwanted message which includes spam, <dfn>viruses</dfn>, and
<dfn>worms</dfn>. The opposite of spam is <dfn>ham</dfn>. The act of classifying
a sender as one who sends junk mail is called <dfn>blacklisting</dfn>; the
opposite is called <dfn>whitelisting</dfn>.

     
<a name="index-J-_003f-1980"></a>
<a name="index-mh_002dprefix_002dhelp-1981"></a>
<dl><dt><kbd>J ?</kbd><dd>Display cheat sheet for the commands of the current prefix in
minibuffer (<code>mh-prefix-help</code>). 
<!--  -->
<a name="index-J-b-1982"></a><a name="index-mh_002djunk_002dblacklist-1983"></a><br><dt><kbd>J b</kbd><dd>Blacklist range as spam (<code>mh-junk-blacklist</code>). 
<!--  -->
<a name="index-J-w-1984"></a><a name="index-mh_002djunk_002dwhitelist-1985"></a><br><dt><kbd>J w</kbd><dd>Whitelist range as ham (<code>mh-junk-whitelist</code>). 
<!--  -->
<br><dt><code>mh-spamassassin-identify-spammers</code><dd>Identify spammers who are repeat offenders. 
</dl>

   <p><a name="index-g_t_0040samp_007bmh_002djunk_007d-customization-group-1986"></a><a name="index-customization-group_002c-_0040samp_007bmh_002djunk_007d-1987"></a>
The following table lists the options from the `<samp><span class="samp">mh-junk</span></samp>'
customization group.

     <dl>
<dt><code>mh-junk-background</code><a name="index-mh_002djunk_002dbackground-1988"></a><dd>If on, spam programs are run in background (default: `<samp><span class="samp">off</span></samp>'). 
<!--  -->
<br><dt><code>mh-junk-disposition</code><a name="index-mh_002djunk_002ddisposition-1989"></a><dd>Disposition of junk mail (default: `<samp><span class="samp">Delete Spam</span></samp>'). 
<!--  -->
<br><dt><code>mh-junk-program</code><a name="index-mh_002djunk_002dprogram-1990"></a><dd>Spam program that MH-E should use (default: `<samp><span class="samp">Auto-detect</span></samp>'). 
</dl>

   <p><a name="index-SpamProbe-1991"></a><a name="index-Spamassassin-1992"></a><a name="index-bogofilter-1993"></a><a name="index-spam-filters_002c-SpamProbe-1994"></a><a name="index-spam-filters_002c-Spamassassin-1995"></a><a name="index-spam-filters_002c-bogofilter-1996"></a>
MH-E depends on <a href="http://spamassassin.apache.org/">SpamAssassin</a>,
<a href="http://bogofilter.sourceforge.net/">bogofilter</a>, or
<a href="http://spamprobe.sourceforge.net/">SpamProbe</a> to throw the dreck
away. This chapter describes briefly how to configure these programs
to work well with MH-E and how to use MH-E's interface that provides
continuing education for these programs.

   <p><a name="index-mh_002djunk_002dprogram-1997"></a>
The default setting of the option <code>mh-junk-program</code> is
`<samp><span class="samp">Auto-detect</span></samp>' which means that MH-E will automatically choose one
of SpamAssassin, bogofilter, or SpamProbe in that order. If, for
example, you have both SpamAssassin and bogofilter installed and you
want to use bogofilter, then you can set this option to
`<samp><span class="samp">Bogofilter</span></samp>'.

   <p><a name="index-mh_002djunk_002dblacklist-1998"></a><a name="index-J-b-1999"></a><a name="index-mh_002djunk_002ddisposition-2000"></a>
The command <kbd>J b</kbd> (<code>mh-junk-blacklist</code>) trains the spam
program in use with the content of the range (see <a href="Ranges.html#Ranges">Ranges</a>) and then
handles the message(s) as specified by the option
<code>mh-junk-disposition</code>. By default, this option is set to
`<samp><span class="samp">Delete Spam</span></samp>' but you can also specify the name of the folder
which is useful for building a corpus of spam for training purposes.

   <p><a name="index-mh_002djunk_002dwhitelist-2001"></a><a name="index-J-w-2002"></a>
In contrast, the command <kbd>J w</kbd> (<code>mh-junk-whitelist</code>)
reclassifies a range of messages (see <a href="Ranges.html#Ranges">Ranges</a>) as ham if it were
incorrectly classified as spam. It then refiles the message into the
<samp><span class="file">+inbox</span></samp> folder.

   <p><a name="index-g_t_0040samp_007b_002aMH_002dE-Log_002a_007d-2003"></a><a name="index-buffers_002c-_0040samp_007b_002aMH_002dE-Log_002a_007d-2004"></a><a name="index-call_002dprocess-2005"></a><a name="index-mh_002djunk_002dbackground-2006"></a>
By default, the programs are run in the foreground, but this can be
slow when junking large numbers of messages. If you have enough memory
or don't junk that many messages at the same time, you might try
turning on the option <code>mh-junk-background</code>. <a rel="footnote" href="#fn-1" name="fnd-1"><sup>1</sup></a>

   <p>The following sections discuss the various counter-spam measures that
MH-E can work with.

   <p><a name="index-g_t_0040file_007b_002eprocmailrc_007d-2007"></a><a name="index-files_002c-_0040file_007b_002eprocmailrc_007d-2008"></a>

<h4 class="subheading">SpamAssassin</h4>

<p><a name="index-Spamassassin-2009"></a><a name="index-spam-filters_002c-Spamassassin-2010"></a>
SpamAssassin is one of the more popular spam filtering programs. Get
it from your local distribution or from the
<a href="http://spamassassin.apache.org/">SpamAssassin web site</a>.

   <p>To use SpamAssassin, add the following recipes to <samp><span class="file">~/.procmailrc</span></samp>:

   <p><a name="index-g_t_0040command_007bspamc_007d-2011"></a><a name="index-g_t_0040samp_007bX_002dSpam_002dLevel_003a_007d-header-field-2012"></a><a name="index-g_t_0040samp_007bX_002dSpam_002dStatus_003a_007d-header-field-2013"></a><a name="index-header-field_002c-_0040samp_007bX_002dSpam_002dLevel_003a_007d-2014"></a><a name="index-header-field_002c-_0040samp_007bX_002dSpam_002dStatus_003a_007d-2015"></a>
<pre class="smallexample">     PATH=$PATH:/usr/bin/mh
     MAILDIR=$HOME/`mhparam Path`
     
     # Fight spam with SpamAssassin.
     :0fw
     | spamc
     
     # Anything with a spam level of 10 or more is junked immediately.
     :0:
     * ^X-Spam-Level: ..........
     /dev/null
     
     :0:
     * ^X-Spam-Status: Yes
     spam/.
</pre>
   <p>If you don't use <samp><span class="command">spamc</span></samp>, use `<samp><span class="samp">spamassassin -P -a</span></samp>'.

   <p>Note that one of the recipes above throws away messages with a score
greater than or equal to 10. Here's how you can determine a value that
works best for you.

   <p>First, run `<samp><span class="samp">spamassassin -t</span></samp>' on every mail message in your
archive and use <samp><span class="command">gnumeric</span></samp> to verify that the average plus the
standard deviation of good mail is under 5, the SpamAssassin default
for &ldquo;spam&rdquo;.

   <p>Using <samp><span class="command">gnumeric</span></samp>, sort the messages by score and view the
messages with the highest score. Determine the score which encompasses
all of your interesting messages and add a couple of points to be
conservative. Add that many dots to the `<samp><span class="samp">X-Spam-Level:</span></samp>' header
field above to send messages with that score down the drain.

   <p>In the example above, messages with a score of 5-9 are set aside in
the `<samp><span class="samp">+spam</span></samp>' folder for later review. The major weakness of
rules-based filters is a plethora of false positives so it is
worthwhile to check.

   <p><a name="index-mh_002djunk_002dblacklist-2016"></a><a name="index-mh_002djunk_002dwhitelist-2017"></a><a name="index-J-b-2018"></a><a name="index-J-w-2019"></a>
If SpamAssassin classifies a message incorrectly, or is unsure, you can
use the MH-E commands <kbd>J b</kbd> (<code>mh-junk-blacklist</code>) and
<kbd>J w</kbd> (<code>mh-junk-whitelist</code>).

   <p><a name="index-g_t_0040command_007bsa_002dlearn_007d-2020"></a><a name="index-g_t_0040file_007b_002espamassassin_002fuser_005fprefs_007d-2021"></a><a name="index-files_002c-_0040file_007b_002espamassassin_002fuser_005fprefs_007d-2022"></a>
The command <kbd>J b</kbd> (<code>mh-junk-blacklist</code>) adds a
`<samp><span class="samp">blacklist_from</span></samp>' entry to <samp><span class="file">~/spamassassin/user_prefs</span></samp>,
deletes the message, and sends the message to the Razor, so that
others might not see this spam. If the <samp><span class="command">sa-learn</span></samp> command is
available, the message is also recategorized as spam.

   <p>The command<kbd>J w</kbd> (<code>mh-junk-whitelist</code>) adds a
`<samp><span class="samp">whitelist_from</span></samp>' rule to `<samp><span class="samp">~/.spamassassin/user_prefs</span></samp>'. If
the <samp><span class="command">sa-learn</span></samp> command is available, the message is also
recategorized as ham.

   <p>Over time, you'll observe that the same host or domain occurs
repeatedly in the `<samp><span class="samp">blacklist_from</span></samp>' entries, so you might think
that you could avoid future spam by blacklisting all mail from a
particular domain. The utility function
<code>mh-spamassassin-identify-spammers</code> helps you do precisely that. 
This function displays a frequency count of the hosts and domains in
the `<samp><span class="samp">blacklist_from</span></samp>' entries from the last blank line in
<samp><span class="file">~/.spamassassin/user_prefs</span></samp> to the end of the file. This
information can be used so that you can replace multiple
`<samp><span class="samp">blacklist_from</span></samp>' entries with a single wildcard entry such as:

<pre class="smallexample">     blacklist_from *@*amazingoffersdirect2u.com
</pre>
   <p>In versions of SpamAssassin (2.50 and on) that support a Bayesian
classifier, <kbd>J b</kbd> <code>(mh-junk-blacklist</code>) uses the program
<samp><span class="command">sa-learn</span></samp> to recategorize the message as spam. Neither MH-E,
nor SpamAssassin, rebuilds the database after adding words, so you
will need to run `<samp><span class="samp">sa-learn --rebuild</span></samp>' periodically. This can be
done by adding the following to your <samp><span class="file">crontab</span></samp>:

<pre class="smallexample">     0 * * * *	sa-learn --rebuild &gt; /dev/null 2&gt;&amp;1
</pre>
   <h4 class="subheading">Bogofilter</h4>

<p><a name="index-bogofilter-2023"></a><a name="index-spam-filters_002c-bogofilter-2024"></a>
Bogofilter is a Bayesian spam filtering program. Get it from your
local distribution or from the
<a href="http://bogofilter.sourceforge.net/">bogofilter web site</a>.

   <p>Bogofilter is taught by running:

<pre class="smallexample">     bogofilter -n &lt; good-message
</pre>
   <p>on every good message, and

<pre class="smallexample">     bogofilter -s &lt; spam-message
</pre>
   <p><a name="index-full-training-2025"></a>
on every spam message. This is called a <dfn>full training</dfn>; three
other training methods are described in the FAQ that is distributed
with bogofilter. Note that most Bayesian filters need 1000 to 5000 of
each type of message to start doing a good job.

   <p>To use bogofilter, add the following recipes to <samp><span class="file">~/.procmailrc</span></samp>:

   <p><a name="index-g_t_0040samp_007bX_002dBogosity_003a_007d-header-field-2026"></a><a name="index-header-field_002c-_0040samp_007bX_002dBogosity_003a_007d-2027"></a>
<pre class="smallexample">     PATH=$PATH:/usr/bin/mh
     MAILDIR=$HOME/`mhparam Path`
     
     # Fight spam with Bogofilter.
     :0fw
     | bogofilter -3 -e -p
     
     :0:
     * ^X-Bogosity: Yes, tests=bogofilter
     spam/.
     
     :0:
     * ^X-Bogosity: Unsure, tests=bogofilter
     spam/unsure/.
</pre>
   <p><a name="index-mh_002djunk_002dblacklist-2028"></a><a name="index-mh_002djunk_002dwhitelist-2029"></a><a name="index-J-b-2030"></a><a name="index-J-w-2031"></a>
If bogofilter classifies a message incorrectly, or is unsure, you can
use the MH-E commands <kbd>J b</kbd> (<code>mh-junk-blacklist</code>) and <kbd>J
w</kbd> (<code>mh-junk-whitelist</code>) to update bogofilter's training.

   <p>The <cite>Bogofilter FAQ</cite> suggests that you run the following
occasionally to shrink the database:

<pre class="smallexample">     bogoutil -d wordlist.db | bogoutil -l wordlist.db.new
     mv wordlist.db wordlist.db.prv
     mv wordlist.db.new wordlist.db
</pre>
   <p>The <cite>Bogofilter tuning HOWTO</cite> describes how you can fine-tune
bogofilter.

<h4 class="subheading">SpamProbe</h4>

<p><a name="index-SpamProbe-2032"></a><a name="index-spam-filters_002c-SpamProbe-2033"></a>
SpamProbe is a Bayesian spam filtering program. Get it from your local
distribution or from the <a href="http://spamprobe.sourceforge.net">SpamProbe web site</a>.

   <p>To use SpamProbe, add the following recipes to <samp><span class="file">~/.procmailrc</span></samp>:

   <p><a name="index-g_t_0040command_007bformail_007d-2034"></a><a name="index-g_t_0040samp_007bX_002dSpamProbe_003a_007d-header-field-2035"></a><a name="index-header-field_002c-_0040samp_007bX_002dSpamProbe_003a_007d-2036"></a>
<pre class="smallexample">     PATH=$PATH:/usr/bin/mh
     MAILDIR=$HOME/`mhparam Path`
     
     # Fight spam with SpamProbe.
     :0
     SCORE=| spamprobe receive
     
     :0 wf
     | formail -I "X-SpamProbe: $SCORE"
     
     :0:
     *^X-SpamProbe: SPAM
     spam/.
</pre>
   <p><a name="index-mh_002djunk_002dblacklist-2037"></a><a name="index-mh_002djunk_002dwhitelist-2038"></a><a name="index-J-b-2039"></a><a name="index-J-w-2040"></a>
If SpamProbe classifies a message incorrectly, you can use the MH-E
commands <kbd>J b</kbd> (<code>mh-junk-blacklist</code>) and <kbd>J w</kbd>
(<code>mh-junk-whitelist</code>) to update SpamProbe's training.

<h4 class="subheading">Other Things You Can Do</h4>

<p>There are a couple of things that you can add to <samp><span class="file">~/.procmailrc</span></samp>
in order to filter out a lot of spam and viruses. The first is to
eliminate any message with a Windows executable (which is most likely
a virus). The second is to eliminate mail in character sets that you
can't read.

   <p><a name="index-g_t_0040samp_007bContent_002dTransfer_002dEncoding_003a_007d-header-field-2041"></a><a name="index-g_t_0040samp_007bContent_002dType_003a_007d-header-field-2042"></a><a name="index-g_t_0040samp_007bSubject_003a_007d-header-field-2043"></a><a name="index-header-field_002c-_0040samp_007bContent_002dTransfer_002dEncoding_003a_007d-2044"></a><a name="index-header-field_002c-_0040samp_007bContent_002dType_003a_007d-2045"></a><a name="index-header-field_002c-_0040samp_007bSubject_003a_007d-2046"></a>
<pre class="smallexample">     PATH=$PATH:/usr/bin/mh
     MAILDIR=$HOME/`mhparam Path`
     
     #
     # Filter messages with win32 executables/virii.
     #
     # These attachments are base64 and have a TVqQAAMAAAAEAAAA//8AALg
     # pattern. The string "this program cannot be run in MS-DOS mode"
     # encoded in base64 is 4fug4AtAnNIbg and helps to avoid false
     # positives (Roland Smith via Pete from the bogofilter mailing list).
     #
     :0 B:
     * ^Content-Transfer-Encoding:.*base64
     * ^TVqQAAMAAAAEAAAA//8AALg
     * 4fug4AtAnNIbg
     spam/exe/.
     
     #
     # Filter mail in unreadable character sets (from the Bogofilter FAQ).
     #
     UNREADABLE='[^?"]*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987'
     
     :0:
     * 1^0 $ ^Subject:.*=\?($UNREADABLE)
     * 1^0 $ ^Content-Type:.*charset="?($UNREADABLE)
     spam/unreadable/.
     
     :0:
     * ^Content-Type:.*multipart
     * B ?? $ ^Content-Type:.*^?.*charset="?($UNREADABLE)
     spam/unreadable/.
</pre>
   <div class="footnote">
<hr>
<h4>Footnotes</h4><p class="footnote"><small>[<a name="fn-1" href="#fnd-1">1</a>]</small> Note that
the option <code>mh-junk-background</code> is used as the <code>display</code>
argument in the call to <code>call-process</code>. Therefore, turning on
this option means setting its value to `<samp><span class="samp">0</span></samp>'. You can also set its
value to `<samp><span class="samp">t</span></samp>' to direct the programs' output to the `<samp><span class="samp">*MH-E
Log*</span></samp>' buffer; this may be useful for debugging.</p>

   <p><hr></div>

   </body></html>