1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188
|
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
<meta http-equiv="adalign" content="right" />
<link rel="stylesheet" href="bsfilter.css" type="text/css" />
<title>bsfilter / bayesian spam filter</title>
</head>
<body>
<h1>bsfilter / bayesian spam filter</h1>
<p class="icon">
<a href="index.html">japanses</a>
<a href="index-e.html">english</a>
</p>
<h2>0. What is bsfilter ?</h2>
<ul>
<li>a filter which distinguishes spam and non-spam(called "clean" in this page) mail</li>
<li>support mails written in japanese language</li>
<li>written in Ruby</li>
<li>support 3 methods for access<ul>
<li>traditional unix-style filter. study and judge local files or pipe</li>
<li>IMAP. study and judge mails in an IMAP server. <strong>IMAP over SSL supported</strong></li>
<li>POP proxy. run between POP server and MUA. <strong>POP over SSL supported</strong></li>
</ul></li>
<li>basic concepts come from
<a href="http://www.paulgraham.com/spam.html">A Plan for Spam</a>,
<a href="http://www.paulgraham.com/better.html">Better Bayesian Filtering</a>,
<a href="http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html">Spam Detection</a>
</li>
<li>distributed under GPL</li>
</ul>
<h2><a id="toc">1. Contents</a></h2>
<ul>
<li><a href="#toc">1. Contents</a></li>
<li><a href="#download">2. Download</a></li>
<li><a href="#concept">3. How bsfilter works?</a></li>
<li><a href="#started">4. Lets' get started</a></li>
<li><a href="#help">5. Help</a></li>
<li><a href="#usage">6. Usage</a></li>
<li><a href="#imap">7. Usage(IMAP)</a></li>
<li><a href="#pop">8. Usage(POP proxy)</a></li>
</ul>
<h2><a id="download">2. Download/Install</a></h2>
<ul>
<li><a href="https://github.com/nbkenichi/bsfilter/releases"><strong>Releases</strong></a></li>
<li><a href="https://github.com/nbkenichi/bsfilter"><strong>Sources</strong></a></li>
</ul>
<h3>2.1. UNIX</h3>
<p>Install ruby interpreter. Put bsfilter/bsfilter at a directory in your executable path.
On some OSs or distributions, you may use a package like ports or ebild.</p>
<h2><a id="concept">3. How bsfilter works?</a></h2>
<h3>3.1. using spam proability of each token</h3>
<p class="fig"><img src="judge.png" alt="using spam proability of each token" /></p>
<h3>3.2. need to prepare</h3>
<p class="fig"><img src="prepare.png" alt="need to prepare" /></p>
<h2><a id="started">4. Let's get started</a></h2>
<h3>preprare</h3>
<p>It is necessary to prepare databases before filtering</p>
<p>1. count tokens in clean mails</p>
<pre>
% bsfilter --add-clean ~/Mail/inbox/*
</pre>
<p>2. count tokens in spam</p>
<pre>
% bsfilter --add-spam ~/Mail/spam/*
</pre>
<p>3. calculate spam probability for each token</p>
<pre>
% bsfilter --update
</pre>
<h3>filtering</h3>
<p>example: specify filenames for filtering as command line argumetns. spam probability numbers(between 0 and 1) are displayed.</p>
<pre>
% bsfilter ~/Mail/inbox/1
combined probability /home/nabeken/Mail/inbox/1 1 0.012701
</pre>
<p>example: feed mail for filtering through stdin pipe. exit status is 0 in case of spam</p>
<pre>
~% bsfilter < ~/Mail/inbox/1 ; echo $status
1
~% bsfilter < ~/Mail/spam/1 ; echo $status
0
</pre>
<p>procmail sample recipe 1:
move spams to spam folder using exit status</p>
<pre>
:0 HB:
* ? bsfilter -a
spam/.
</pre>
<p>procmail sample recipe 1:
add <code>X-Spam-Flag:</code>, <code>X-Spam-Probability:</code> headers and
move spams to black or gray folder based on spam probability at <code>X-Spam-Probability:</code> header</p>
<pre>
:0 fw
| /home/nabeken/bin/bsfilter --pipe --insert-flag --insert-probability
:0
* ^X-Spam-Probability: *(1|0\.[89])
black/.
:0
* ^X-Spam-Probability: *0\.[67]
gray/.
</pre>
<h2><a id="help">5. Help</a></h2>
<p><a href="https://github.com/nbkenichi/bsfilter/issues">issues</a></p>
<h2><a id="usage">6. Usage</a></h2>
<h3>formats of command line</h3>
<p>there are 2 formats.</p>
<ol>
<li>bsfilter [options] [commands] < MAIL</li>
<li>bsfilter [options] [commands] MAIL ...</li>
</ol>
<p>There are maintenance mode and filtering mode.</p>
<ul>
<li>When commands are spcecified, bsfilter is under maintenance mode. It updates databases, but doesn't judge mails.</li>
<li>When commands aren't specified, bsfilter is under filtering mode. It judges mails, but doen't update databases.</li>
<li>There is an exception. When "auto-update" is specified in filtering mode, bsfilter updates databases and judge mails also.</li>
</ul>
<p>
Use format 1 in filtering mode in order to feed mail from stdin and judge it. Exit status becomes 0 in case of spam.
If bsfilter invoked by MDA(procmailrc and etc), this style are used.
</p>
<p>
Use format 2 in filtering mode in order to specify multiple mails at command line and judge at once.
Results are displayed at stdout.
</p>
<p><strong>type "bsfilter --help" to see all commands and options.</strong></p>
<h2><a id="imap">7. Usage(IMAP)</a></h2>
<p>bsfilter is able to communicate server by IMAP and study or judge mails stored in it.
bsfilter is able to insert headers or move mails to a specified folder
</p>
<p class="fig"><img src="imap.png" alt="communicate with IAMP server" /></p>
<h3>example</h3>
<p>sample of bsfilter.conf</p>
<pre>
imap-server server.example.com
imap-auth login
imap-user hanako
imap-password open_sesame
</pre>
<p>judge mails without X-Spam-Flag in "inbox", insert X-Spam-Probability header and move spams into "inbox.spam"</p>
<pre>
% bsfilter --imap --imap-fetch-unflagged --insert-flag --insert-probability --imap-folder-spam inbox.spam inbox
</pre>
<h2><a id="pop">8. Usage(POP proxy)</a></h2>
<p>bsfilter ia able to work as POP proxy and judge mails and insert headers on a path from POP server to MUA. "--auto-update" is a valid option but "--add-clean" and "--add-spam" are not.</p>
<p>Let's assume that POP server is running on pop.example.com using port 110.</p>
<p class="fig"><img src="pop-without-bsfilter.png" alt="POP without bsfilter" /></p>
<p>
bsfilter is runing as POP. POP is used between the server and bsfilter and between bsfilter and MUA. bsfilter judge mails and insert headers. In this case, use the following options. Because default of --pop-port and --pop-poryx-port are 110 and 10110, they are able to be omitted.
</p>
<pre>
% bsfilter --pop --auto-update --insert-flag --insert-probability --pop-server pop.example.com --pop-port 110 --pop-proxy-port 10110
</pre>
<p class="fig"><img src="pop-with-bsfilter.png" alt="POP with bsfilter" /></p>
<p>
When pops.exmaple.com uses POP over SSL, POP over SSL is used between the server and bsfilter, POP is used between bsfilter and MUA.<strong>(after 1.67.2.*)</strong>
</p>
<pre>
% bsfilter --ssl --pop --auto-update --insert-flag --insert-probability --pop-server pops.example.com --pop-port 995 --pop-proxy-port 10110
</pre>
<p class="fig"><img src="pops-with-bsfilter.png" alt="POP with bsfilter(SSL)" /></p>
</body>
</html>
|