File: about.rst

package info (click to toggle)
pyzor 1%3A1.0.0-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 788 kB
  • ctags: 1,034
  • sloc: python: 6,518; makefile: 154; sh: 19
file content (39 lines) | stat: -rw-r--r-- 1,514 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
About
======

History
--------

Pyzor initially started out to be merely a Python implementation of Razor, but
due to the protocol and the fact that Razor's server is not Open Source or 
software libre, Frank Tobin decided to implement Pyzor with a new protocol and 
release the entire system as Open Source and software libre.

Protocol
----------

The central premise of Pyzor is that it converts an email message to a short 
digest that uniquely identifies the message. Simply hashing the entire message 
is an ineffective method of generating a digest, because message headers will 
differ when the content does not, and because spammers will often try to make 
a message unique by injecting random/unrelated text into their messages.

To generate a digest, the 2.0 version of the Pyzor protocol:

 * Discards all message headers.
 * If the message is greater than 4 lines in length:
 
  * Discards the first 20% of the message.
  * Uses the next 3 lines.
  * Discards the next 40% of the message.
  * Uses the next 3 lines.
  * Discards the remainder of the message.
  
 * Removes any 'words' (sequences of characters separated by whitespace) that are 10 or more characters long.
 * Removes anything that looks like an email address (X@Y).
 * Removes anything that looks like a URL.
 * Removes anything that looks like HTML tags.
 * Removes any whitespace.
 * Discards any lines that are fewer than 8 characters in length.
 
This is intended as an easy-to-understand explanation, rather than a technical one.