File: README

package info (click to toggle)
sylfilter 0.8-7
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, forky, sid, trixie
  • size: 2,124 kB
  • sloc: ansic: 12,806; sh: 8,910; makefile: 349
file content (149 lines) | stat: -rw-r--r-- 4,221 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
  SylFilter - a message filter

  Copyright (C) 2011-2013 Hiroyuki Yamamoto <hiro-y@kcn.ne.jp>
  Copyright (C) 2011-2013 Sylpheed Development Team


About This Program
==================

This is SylFilter, a generic message filter library and command-line tools.
SylFilter provides a bayesian filter which is very popular as a spam filtering
algorithm. SylFilter is also internationalized and can be applied to any
languages.

SylFilter library provides simple but powerful C APIs and can be used from C
programs.

SylFilter command-line tool can be used as a junk filter program like major
tools such as bogofilter and bsfilter etc.

SylFilter is free software and distributed under the BSD-like license.
See COPYING for detail.


Install
=======

This program requires GLib and a key-value store engine. Install them before building.
Currently SQLite (enabled by default), QDBM and GDBM are supported for key-value store engine.

  $ ./configure
  ( $ ./configure --disable-sqlite --enable-qdbm (enables QDBM) )
  ( $ ./configure --disable-sqlite --enable-gdbm (enables GDBM) )

  $ make
  $ sudo make install

By default, built-in subset of libsylph is used for message parsing.
To use libsylph installed on your system, specify --with-libsylph option.

  ./configure --with-libsylph=builtin     use built-in LibSylph (default)
  ./configure --with-libsylph=standalone  use standalone version of LibSylph
  ./configure --with-libsylph=sylpheed    use Sylpheed's LibSylph

If libsylph is installed on non-standard location, also use
--with-libsylph-dir option.


Usage
=====

SylFilter accepts rfc822 message files (for example: MH, Maildir, eml).

Learning junk mails

  $ sylfilter -j ~/Mail/junk/*

Learning clean mails

  $ sylfilter -c ~/Mail/clean/*

Classifying mails

  $ sylfilter ~/Mail/inbox/1234

Show learn status

  $ sylfilter -s

Show learn status and all learned tokens

  $ sylfilter -s -v

Show help message

  $ sylfilter -h
  $ sylfilter --help


Usage with Sylpheed
===================

On 'Common preferences... - Junk mail - Learning command:', manually set
each command as following:

Junk                : sylfilter -j
Not Junk            : sylfilter -c
Classifying command : sylfilter


Other information
=================

Token database files are created under ~/.sylfilter/ .
(On Windows: %APPDATA%\SylFilter\)


Library Design
==============

The filtering of SylFilter consists of a set of simple filter modules.

         (Learning)                   (Classifying)

        rfc822 message                rfc822 message
              |                             |
   [ text content filter ]       [ text content filter ]
              |                             |
  [ word separator filter ]       [ blacklist filter ]  --> spam
              |                             |
      [ n-gram filter ]         [ word separator filter ]
              |                             |
     [ learning filter ]            [ n-gram filter ]
                                            |
                                   [ bayesian filter ]  --> spam
                                            |
                                         non-spam

The library users can create arbitrary combination of provided filters.
Users also can add their original custom filters.

Please read the source of src/sylfilter.c for library usage.


Algorithm of Bayesian Filter
============================

SylFilter implements Fisher's method which is described by Gary Robinson.
It is also implemented by bogofilter and bsfilter.

  http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
  http://www.bgl.nu/bogofilter/fisher.html

SylFilter initially implemented the customized version of algorithm
described by Paul Graham.

  http://paulgraham.com/spam.html
  http://paulgraham.com/better.html

Robinson-Fisher method is used by default.

Basically the algorithm can be described as follows:

1. Counts the number of occurrences of words in a spam and non-spam.
2. Calculates the probability that a message containing it is a spam for
   each words in a message.
3. Calculates the combined probability using important words in the message.

See the above Web pages for the detail.