File: README

package info (click to toggle)
bmf 0.9.4-14
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 408 kB
  • sloc: ansic: 6,206; sh: 322; makefile: 134
file content (130 lines) | stat: -rw-r--r-- 4,801 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
		bmf -- Bayesian Mail Filter

About bmf
=========

This is a mail filter which uses the Bayes algorithm as explained in Paul
Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
versatile than similar applicatios.  Implementation is ANSI C and uses POSIX
functions.  Supported platforms are (in theory) all POSIX systems. Support
for win32 is undecided.

This project provides features which are not available in other filters:

(1) Independence from external programs and libraries.  Tokens are stored in
memory using simple vectors which require no heavyweight external data
structure libraries.  Multiple token database formats are supported,
including flat files, libdb, and mysql.  Conversion between formats will
always be possible with the included import/export utility and flat files
will always remain an option.

(2) Efficient processing.  Input data is parsed by a handcrafted parser
which weighs in under 3% of the equivalent code generated by flex.  No
portion of the input is ever copied and all i/o and memory allocation are
done in large chunks.  Updated token lists are merged and written in one
step.  Hashing is being considered for the next version to improve lookup
speed.

(3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
decoding routines are used.  Decoding of quoted-printable text for selected
mime types is being considered for the next version.

Note: the core filter function is from esr's bogofilter v0.6 (available at
http://sourceforge.net/projects/bogofilter/) with bugfix updates.

For the most recent version of this software, see: 

	http://sourceforge.net/projects/bmf/

How to integrate bmf
====================

The following procmail recipes will invoke bmf for each incoming email and
place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
mode of operation and the second invokes bmf as a filter.

	### begin sample one ###
	# Invoke bmf and use return code to filter spam in one step
	:0HB
	* ? bmf
	| formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam

	### begin sample two ###
	# Invoke bmf as a filter
	:0 fw
	| bmf -p

	# Filter spam
	:0:
	^X-Spam-Status: Yes
	$MAILDIR/spam

The following maildrop equivalents are suggested by Christian Kurz.

	### begin sample one ###
	# Invoke bmf and use return code to filter spam in one step
	exception {
		`bmf`
		if ( $RETURNCODE == 0 )
			to $MAILDIR/spam
	}

	### begin sample two ###
	# Invoke bmf as a filter
	exception {
		xfilter "bmf -p"
		if (/^X-Stam-Status: Yes/)
			to $MAILDIR/spam
	}


If you put bmf in your procmail or maildrop scripts as suggested above, it
will always register an email as either spam or non-spam.  To reverse this
registration and train bmf, the following mutt macros may be useful:

  macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
  macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
  macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"

These will override these commands:

  <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
  <Esc>t = test for spamicity.
  <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.

How to train bmf
================

First, please keep in mind that bmf "learns" how to recognize spam from the
input that you give it.  It works best if you give it exactly the email that
you receive, or have received in the recent past.

Here are some good techniques for training bmf:

  - If you keep a history of email that you have received, use your current
    and/or saved emails.  It is fairly easy to create a small shell script
    that will pass all of your normal email to "bmf -n" and all of your spam
    to "bmf -s".  Note that if you do not use the mbox storage format, you
    MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
    work properly because bmf sees the entire input as one big email.

  - If you already use spamassassin, you can use it to train bmf for a
    couple of days or weeks.  If spamassassin tags it as spam, run it
    through "bmf -s".  If not, run it through "bmf -n".  This can be
    automated with procmail or maildrop recipes.

Here are some things that you should NOT do:

  - Get impatient with the training process and repeatedly pass one email
    through "bmf -s".

  - Manually move words around between lists and/or adjust the word counts.

Final words
===========

Thanks for trying bmf.  If you have any problems, comments, or suggestions,
please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net. 

							Tom Marshall
							20 Oct 2002