1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
|
About
======
History
--------
Pyzor initially started out to be merely a Python implementation of Razor, but
due to the protocol and the fact that Razor's server is not Open Source or
software libre, Frank Tobin decided to implement Pyzor with a new protocol and
release the entire system as Open Source and software libre.
Protocol
----------
The central premise of Pyzor is that it converts an email message to a short
digest that uniquely identifies the message. Simply hashing the entire message
is an ineffective method of generating a digest, because message headers will
differ when the content does not, and because spammers will often try to make
a message unique by injecting random/unrelated text into their messages.
To generate a digest, the 2.0 version of the Pyzor protocol:
* Discards all message headers.
* If the message is greater than 4 lines in length:
* Discards the first 20% of the message.
* Uses the next 3 lines.
* Discards the next 40% of the message.
* Uses the next 3 lines.
* Discards the remainder of the message.
* Removes any 'words' (sequences of characters separated by whitespace) that are 10 or more characters long.
* Removes anything that looks like an email address (X@Y).
* Removes anything that looks like a URL.
* Removes anything that looks like HTML tags.
* Removes any whitespace.
* Discards any lines that are fewer than 8 characters in length.
This is intended as an easy-to-understand explanation, rather than a technical one.
|