File: sa-heatu.html

package info (click to toggle)
spamassassin-heatu 3.02+20101108-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster, jessie, jessie-kfreebsd, sid, stretch, wheezy
  • size: 148 kB
  • ctags: 14
  • sloc: perl: 138; makefile: 14
file content (239 lines) | stat: -rw-r--r-- 11,822 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title> Spam Assassin Heuristic Email Address Tracker Utility sa-heatu 3.01 100904</title> 
<head>
<meta name="author" content="Dennis German">
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="description" content="Utility to display and cleanup spamassassim's HEAT entries (autowhite_list)">
<meta name="keywords" content="spam email ham auto white list spamassassin "
<meta name="robots" content="index,follow">
<meta name="distribution" content="EST">
<link rel=stylesheet href=style.css type=text/css>
<style>
td.rpad {padding-right:50px}
td.o {font-family:monospace;vertical-align:top;font-size:130%}
code {font-size:120%}
</style> 
</head>
<body> 
<h1 align=center>sa-heatu</h1>
<h3 align=center>version 3.01</h3>
<p>
A utility to display, edit and <i>age</i> the Spam Assassin <b>H</b>euristic <b>E</b>mail <b>A</b>ddress <b>T</b>racker database <code>~/.spamassassin/auto-whitelist</code>.
<p>
<a href=#backgrnd>HEAT Background</a><sub style=font-size:140%><a href=#backgrnd style=text-decoration:none><sub>&#x261F;</sub></sub></a><p>
<table bgcolor=mintCream><tr><td>

<h2  style=font-family:cursive align=center id=top>This version contains a significant enhancment</h2>
<p>
There is now a parallel hash file containing date entries.  
<p>
<code>sa-heatu</code> matches entries from <code>auto-whitelist</code> with <code>timestamps</code> entries.
If the count from the <code>auto-whitelist</code> entry is greater then the <code>timestamps</code> entry,
 the <code>timestamps</code> entry is updated with the current time and count.
 If there was no <code>timestamps</code> entry one is created.<br>
Entries that have not been updated in 183 days are expired.<p>
This utility now operates in a "current file in, new file out" mode as opposed to  the previous <nobr>"update in place"</nobr>
 mode. This may cause the hash files to be reduced in size due to <b>expiring</B> entries. Deleting entries from a hash does not reduce it's size.
<p>

This very simple minded approach to aging permits expiring old entries without any impact on spamassassin's operation.

<br>
</table>

<hr>
<p>
<table bgcolor=azure><tr><td>
<p>


<code>sa-heatu</code> is used to maintain the database<b style=color:green;cursor:help title=" 
only if it stored in a hash file, not a SQL database
">&dagger;</b>
<p>

<table>
<tr><td><code> sa-heatu <td><code>--quiet | --showUpdates | --verbose  
                 <tr><td><td><code>--firstTimes | --DONTupdateTimestamps |  --noTimestamps   
                 <tr><td><td><code>--expireOlderThan <var>days </var>
                 <tr><td><td><code>--remove nnnnnn@dddddd.xxx   <td >[<code><var>dbFile</var></code> 
[<code><var>timeStampFile</var> </code>]]
</table>
<p>
<table border=1 cellpadding=10 cellspacing=0>

<tr><td><code>--quiet  <td> Don't output anything.
<tr><td><code>--showUpdates <td> Output entries updated or added or removed, in addition to the summary, 
<tr><td><code>--verbose <td> Output every entry. <br>
                Warning the output should piped to a filter or redirected to a file as is be very long 
                 <tr><td><code>--firstTimes <td> Use this for the first run to avoid reading <code>timestamps</code>.
<tr><td><code><nobr>--DONTupdateTimestamps</nobr>  <td> &nbsp;
<tr><td><code>--noTimestamps   <td> No timestamps processing is done. 
                 <tr><td><code>--expireOlderThan <var>days <td> Expires entries older than <code><var>days</var></code>
i.e. they are not written to the output files. <br>             
                                Default: 183
                 <tr><td><nobr><code>--remove <var>nnnnnn@dddddd.xxx   </nobr><td> Remove entries with this <nobr>email address</nobr> and any IP address
<tr><td><code><var>dbfile <td> Score and count database <small>(perl hash file)</small>. Default:<code>auto-whitelist</code>
<tr><td><code><var>timestamps<td> timestamps database <small>(perl hash file)</small>. Default:<code>timestamps</code>

<tr><td><code><nobr>-h</nobr><td><pre>sa-heatu Spam Assassin - Heuristic Email Address Tracker Utility  3.02 100817  
                    DGermansa@Real-world-Systems.com (c)2010 Dennis G German  

 usage: sa-heatu --quiet --showUpdates --verbose  
                 --firstTimes --DONTupdateTimestamps  --noTimestamps   
                 --expireOlderThan days 
                 --remove nnnnnn@dddddd.xxx                          dbfile timestamps 

</table>
</b></pre>

<p>
<h3>Recommended Operation</h3>
Run daily from cron. Suggested script:<p>
<table bgcolor=azure ><tr><td><p><b><pre style=font-size:112%>
/usr/local/bin/sa-heatu -showUpdates &gt;&gt; ~/.spamassassin/sa-heatu.log
if [ $? -ne 0 ]; then echo "sa-heatu failed, databases unmoved\!"; exit 1; fi
#
rm auto-whitelisto timestampso
# Now save the old files to  xxx-1 and install the ones output by sa-heatu.
mv -f auto-whitelist  auto-whitelist-1 ; if [ $? -ne 0 ]; then echo "mv auto-whitelist  failed, I quit\!";exit 1; fi
mv auto-whitelisto auto-whitelist;       if [ $? -ne 0 ]; then echo "mv auto-whitelisto failed, I quit\!";exit 1; fi
mv -f timestamps  timestamps-1;          if [ $? -ne 0 ]; then echo "mv timestamsp      failed, I quit\!";exit 1; fi
mv timestampso timestamps;               if [ $? -ne 0 ]; then echo "mv timestampso     failed, I quit\!";exit 1; fi
echo "autop-whitelist, timestamps updated"
</pre><p></table>

<p>
Output from <code>sa-heatu</code> can be sorted to display frequent (or rare) senders of spam (or ham ).<p>

Running <code>sa-heatu --verbose</code> should be avoided unless the output is redirectrd to a file or
piped to a filter since the database contains a  (surprisingly) large number of entries.
            <p>
<hr>
Display ham senders: <p id=remember>
<small>(Remember the date and time stamp is the time sa-heatu was run, not the time the email was received).</small
<pre style=font-family:sans-serif>  average            total     count      email address                                               ip network address    last time updated</pre> 
  <code>  sa-heatu --verbose --DONTupdateTimestamps |sort -n  | head -5</code>
<pre>   -19.3     -96.3   5   jason.haar@trimble.co.nz                    222.154; kept, Aug 20 21:24 2010<sup
 style=cursor:help;color:green;text-decoration:none title="
Remember the date and time stamp is the time sa-heatu was run, not the time the email was received.

">&dagger;</sup></a>
   -19.3     -96.3   5   karliak@ajetaci.cz                          77.48; kept, Aug 20 21:24 2010
   -19.3    -115.6   6   scheidell@secnap.net                        204.89; new,
   -19.3    -115.6   6   si@yacc.co.uk                               62.232; new, Aug 27 21:59 2010
   -19.3    -134.9   7   mkitchin.public@gmail.com                   66.238; kept, Aug 20 21:24 2010
</pre>
<hr>
Display spammers:<p>
<code> sa-heatu --verbose --DONTupdateTimestamps |sort -rn | head -4</code>
<pre>    61.8     123.5   2   claims_office001@kimo.com                    221.2; kept,Aug 20 21:24 2010
    60.8      60.8   1   mr.williams.wright@gmail.com                 82.128; kept, Aug 20 21:24 2010
    56.2     112.4   2   danjos_01@yahoo.com                          41.26; kept, Aug 20 21:24 2010
    55.2     110.5   2   danjos_01@yahoo.com                          67.205; kept, Aug 20 21:24 2010
</pre> 

<hr>
<p>
Find senders whose messages are incorrectly adjusted.  
<p>
To display a single sender's record:<br>
&nbsp; &nbsp;<code>     sa-heat --noTimestamps --verbose | grep -i Spammer@example.com</code>
<hr>
<p>
Remove the entries for a particular email address, for all IP networks :<br>
<code>&nbsp; &nbsp;sa-heat --noTimestamps --remove spammer@example.com<br>
&nbsp; &nbsp;mv -f ~/.spamassassin/auto-whitelisto autowhitelist    <br><br></code>

<hr>
<p>
Included in the tar is <code>64c.hexdump</code> which is a formatting specification file for <code>hexdump</code> which
can be used to display the <code>timestamps</code> and other perl hash files.<p>
<table align=center cellspacing=0 ><tr><td>

<code>hexdump -f 64c.hexdump timestamps</code>
<p>
</table>
See <a href=http://www.Real-World-Systems.com/docs/hexdump.1.html>www.Real-World-Systems.com/docs/hexdump.1.html</a><p>
<p>
</table>




<p id=backgrnd>
<a href=#top>top</a><p>
<table bgcolor=mistyrose><tr><td>
<h3>HEAT Background</h3>
The Heuristic Email Address Tracker feature in <code>spamassassin</code> retains a summary of
scores from messages received by <nobr>email address</nobr> and <nobr>IP network address</nobr>.<br>
When a new message is received, the final score is adjusted as a function of the previous average value resulting in a:
<ul>
<li>boost&nbsp;  from senders who have sent ham (nice messages) or a<br>
<li>penally from senders who have sent spam <br>
</ul>
<p>
<ol>The final SCORE of a message is calculated by:
<li>SCORE based on rules
<li>Compute DELTA as (MEAN-SCORE)*auto_whitelist_factor (from configuration)
<li>Bump SCORE by DELTA
</ol>
The result is compared against <code>required_score</code> and if the message score is greater, it is considered spam.  
<p>
Negative values indicate senders of ham, positive values senders of spam.
<p>
The sender's email address, the IP adress, accumulated score, and  number of emails received are stored is in a perl hash.
<p>
Spammers have been known to use this to their advantage by sending a benign email which scores high as ham. 
They then send spam which has it's score "neturalized" by the Heuristic Email Address Tracker scheme
 and the message will be, falsely, considered ham!<br>
If you receive a message that is clearly spam, check X-Spam-Report in the message header
for the string:<br>
         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<code>AWL: From: address</code>

<p>
There is no mechanism   within <code>spamassassin</code> to remove incorrect entries from the database.<p>

Although this is a small amount of data, no  mechanism is provided within <code>spamassassin</code> to expire old entries.
<br><br>
There has long been a discussion regarding the significantly <i>misnamed</i> AWL (Auto White List).<br>
<p>

</table>
<p>
<h3>Next revision</h3>
After significant analysis and thought it seems that giving a bonus to users who send ham is not necessary.
People who usually send you nice emails will continue to send nice emails an don't need any help in scoring. <br>
Giving hammers a bonus has the unfortunate consequence that if ham is received from an spammer a subsequent message is
given a little slack, i.e. a bonus, for having sent ham previously. Spammers have take advantage of this.  <br>
The PLANNED future option<code>--dehammer </code> deletes  all  entries with a negative score i.e.previously sent ham.<br>
Please send me a message if you have comments on this, or anything else about <code>sa-heatu</code>
<p>
<p>
This is an enhanced version of the original tool. 

<p>This document and the current version of sa-heatu can be downloaded at: 
<a href=http://www.Real-World-Systems.com/mail/sa-heatu.3.02.tar>sa-heatu.3.02.tar</a>
<p><br>
</table>
<p>
<table bgcolor=snow><tr><td>
<br>
Previous versions of this utility included <code>--prune</code> which has been depricated. The idea was that:<br>
"The size of the database can be significantly reduced by using:
 &nbsp; &nbsp;    <code>sa-heat --prune</code>
This caused any entry with only 1 entry to be removed on the (somewhat mistaken) assumption  that
an emailer that has only sent 1 email isn't worth remembering."  <p>
This is definitly mistaken as if a spammer has recently sent a message he may soon send another.
<br><br>
</table>
<p>

<script src=http://www.Real-World-Systems.com/mail/endodocsa.js></script>


</body>
</html>