1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242
|
Maildrop tips and tricks
Last update: 10/11/1998.
--------------------------------------------------------------------------
The include statement
The $LINES and $SIZE environment
Partial matches using the ! operator
Short-circuit evaluation using logical operators
The foreach statement
The $HOME/.tmp directory
--------------------------------------------------------------------------
Here's a list of tips and tricks to write efficient mail filters using
maildrop. Maildrop - just like any programming language - can do certain
things better than others.
The include statement
When maildrop is started, it reads $HOME/.mailfilter, or some other file
containing mail filtering instructions. Before doing anything else, this
entire file is read and semi-compiled. Any grammatical or syntax errors
will result in maildrop terminating with a temporary error code, without
doing anything with the message.
The entire file must be 'semi-compiled". The 'semi-compilation' process
consists of building an logical representation of the filtering statements
in memory.
If you have a long and complicated set of instructions, it will be
semi-compiled whether or not it will actually be executed. For example,
consider the following code:
if ( /^Subject: rosebud/ )
{
...
some
really
long
set
of
instructions
...
}
If the contents of the "if" construct are large, maildrop may spend
considerable amount of time and memory semi-compiling filtering
instructions that may never be used. Consider putting all these
instructions into a separate file, and using the include statement to
execute them.
The include statement is processed only when it is actually executed by
maildrop. That carries certain side effects. One feature of maildrop is
the semi-compilation which weeds out errors in the filtering recipe file,
holding all mail and not doing anything until the errors are fixed. The
include statement is not processed until it is actually executed. If there
are errors in the file referenced by the include statement, maildrop will
not know about them until it executes the include statement. When maildrop
detects the error, it will terminate with a temporary error code, holding
all mail. However, any filtering instructions before the include statement
would've already been executed.
The $LINES and $SIZE environment variables
These variables are initialized to the number of lines in the message, and
the size of the message in bytes. They may not be exact, due to the
presence of the From_ line, or -A arguments on the command line, but they
will always be a reasonable approximation.
Scanning headers for patterns will take the same amount of time whether
the body of the message has 100 or 1,000 lines. However, the "b" option
causes the message body to be searched. If someone sends you a 5 megabyte
binary file, each pattern matched against the message body will take a
significant time to complete.
Whenever possible, the first thing you should do is check both variables
to see if the message is excessively large. If so, divert the message to
a separate mailbox, or reject it.
Weighted scoring is especially sensitive to the size of the message being
delivered. Normally, as soon as a pattern match is found, maildrop
terminates the pattern scan. If weighted scoring is used then the pattern
scan is immediately restarted, until the end of the message is reached.
A fixed overhead is required to start a pattern scan going, so that
weighted scoring that matches a very short pattern, like
/[:lower:]/:Dbw,1,1 will take a significant amount of CPU time to
complete. Maildrop is simply not optimized for these kinds of situations.
Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform reasonably
well for messages less than 60K long.
Here's an example of using $LINES and $SIZE to divert large messages to a
separate mailbox:
if ($LINES > 1000 || $SIZE > 60000)
{
to "mail/IN.large"
}
.
.
.
Partial matches using the ! operator
Maildrop implements the ! operator in patterns by breaking up the pattern
into two separate patterns, which are compiled and executed separately.
Whenever the first pattern is successfully matched, maildrop runs the
second pattern. If the second pattern does not match, maildrop resumes
matching the first pattern until it matches again.
Since starting a pattern match involves some fixed overhead, minimizing
the number of times the first pattern will match will reduce the amount of
time it takes to match the entire pattern.
For example: you're trying to extract an IP address from the first
Received header. Here's your first attempt at doing so:
/^Received:.*![0-9\.]+/
IP=$MATCH2
This will not work at all. From the manual page for maildropfilter:
When there is more than one way to match a string,
maildrop favors matching as much as possible initially.
So this fragment will set IP to the last digit, or period found in the
first Received: header, because the first half of the pattern will match
as much as possible.
Here's your second attempt, then:
/^Received:.*[^0-9\.]![0-9\.]+/
IP=$MATCH2
This will work, except the first half of the pattern will end up matching
every character in the Received: header that's not a digit or a period.
Maildrop will then stop and try to run the second half of the pattern.
Your mail server will probably put a left bracket before the IP address of
the relay. Noting that, the following code picks up the IP address as
efficiently as possible:
/^Received:.*\[![0-9\.]+/
IP=$MATCH2
Short-circuit evaluation using logical operators
The logical operators || and && work in maildrop just like they work in C.
Consider the following expression:
if (/ ... some pattern ... / || / ... some other pattern ... /)
{
.
.
.
}
If the first pattern is found, maildrop will not execute the second
pattern, since the logical expression will be true no matter what. You can
use that to your advantage. If you have two patterns, and one of them
takes more time to process, put the other first. Even if both patterns are
relatively quick, if you expect one of them to be found almost every time,
put that one first so that it will not be necessary to run the second
pattern most of the time.
The same concept works for the && operator:
if (/ ... some pattern ... / && / ... some other pattern ... /)
{
.
.
.
}
If the first pattern is not found, maildrop will not execute the second
pattern, since the logical expression will be false no matter what.
The foreach statement
The foreach statement is implemented as follows. Maildrop will execute the
regular expression, and compile a list of all the patterns in the message
matched by the regular expression. Afterwards, maildrop will execute the
foreach statement, once for each matched pattern.
It is important to note that maildrop will create a list in memory
containing every pattern matched by the regular expression. if the regular
expression is found many times in the message, this will require a lot of
memory. For example, the following is a Very Bad Thing[tm]:
foreach /[:alpha:]/
{
.
.
.
}
Maildrop does not include any built-in limits on resource usage, except
for the watchdog timer. Therefore, as a system administrator, it is your
responsibility to limit resources used by the maildrop process.
This implementation of the foreach statement is the one that yields the
least number of surprises. Observe that the contents of the foreach
construct can modify the message using the xfilter statement. Furthermore,
the regular expression may include variable substitution, and those
variables could be modified by the foreach statement. If the foreach
statement were to be executed "on the fly", these factors would yield
unpredictable - and rather messy - results.
The $HOME/.tmp directory
Under certain conditions, maildrop will need to use a temporary file.
Temporary files are created in the $HOME/.tmp directory. Temporary files
are used to store large messages (instead of storing them in memory).
Temporary files may also be used by the xfilter command, or in other
situations.
Maildrop will automatically delete all temporary files before terminating.
However, if the maildrop process is killed, or terminates abnormally, a
temporary file may remain and take up disk space. To have those leftover
files automatically cleaned up, put the following code in /etc/profile, or
some other file that all users run after logging in:
find $HOME/.tmp -depth -mtime +7 \( \( \
-type d -exec rmdir {} \; \) -o -exec rm -f {} \; \) 2>/dev/null
|