File: spec_33.html

package info (click to toggle)
exim-html 3.20-1
links: PTS
area: main
in suites: etch, etch-m68k, sarge, woody
size: 2,868 kB
ctags: 4,188
sloc: makefile: 40; sh: 19
file content (439 lines) | stat: -rw-r--r-- 16,945 bytes
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from spec on 25 November 2000 -->

<TITLE>Exim Specification - 33. Retry configuration</TITLE>
</HEAD>
<body bgcolor="#FFFFFF" text="#00005A" link="#FF6600" alink="#FF9933" vlink="#990000">
Go to the <A HREF="spec_1.html">first</A>, <A HREF="spec_32.html">previous</A>, <A HREF="spec_34.html">next</A>, <A HREF="spec_59.html">last</A> section, <A HREF="spec_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC734" HREF="spec_toc.html#TOC734">33. Retry configuration</A></H1>
<P>
<A NAME="IDX1656"></A>
<A NAME="IDX1657"></A>
The fifth part of the configuration file contains a list of retry rules which
control how often Exim tries to deliver messages that cannot be delivered at
the first attempt. If there are no retry rules, Exim gives up after the first
failure. The -<EM>brt</EM> command line option can be used to test which retry rule
will be used for a given address or domain.

</P>
<P>
The most common cause of retries is temporary failure to deliver to a remote
host. Exim's retry processing in this case is applied on a per-host (strictly,
per IP address) basis, not on a per-message basis. Thus, if one message has
recently been delayed, a new message to the same host does not immediately get
tried, but waits for the host's retry time to arrive. If the value of
<EM>log_level</EM> is greater than 4, the message
<A NAME="IDX1658"></A>
`retry time not reached' is written to the main log whenever a delivery is
skipped for this reason. Section 48.2 contains more details of the
handling of errors during remote deliveries.

</P>
<P>
Retry processing applies to directing and routing as well as to delivering,
except as covered in the next paragraph. The retry rules do not distinguish
between these three actions, so it is not possible, for example, to specify
different behaviour for failures to route the domain <EM>snark.fict.book</EM> and
failures to deliver to the host <EM>snark.fict.book</EM>. I didn't think anyone would
ever need this added complication, so did not implement it.
However, although they share the same retry rule, the actual retry times for
routing, directing, and transporting a given domain are maintained
independently.

</P>
<P>
When a delivery is not part of a queue run (typically an immediate delivery
on receipt of a message), the directors are always run for local addresses, and
local deliveries are always attempted, even if retry times are set for them.
This makes for better behaviour if one particular message is causing problems
(for example, causing quota overflow, or provoking an error in a filter file).
If such a delivery suffers a temporary failure, the retry data gets updated as
normal, and subsequent delivery attempts from queue runs occur only when the
retry time for the local address is reached.

</P>

<P>



<H2><A NAME="SEC735" HREF="spec_toc.html#TOC735">33.1 Retry rules</A></H2>

<P>
<A NAME="IDX1659"></A>
Each retry rule occupies one line and consists of three parts, separated by
white space: a pattern, an error name, and a list of retry parameters. The
rules are searched in order until one is found whose pattern matches the
failing host or address.

</P>
<P>
The pattern may be a complete address (<EM>local_part@domain</EM>), a plain domain,
a wildcarded domain (that is, starting with an asterisk), a domain lookup (as
in a domain list), or a regular expression. The first form must be used with
local domains only; in this case the local part may begin with an asterisk.

</P>
<P>
After a directing or local delivery failure, regular expressions and patterns
containing local parts are normally matched against the complete address
(<EM>local_part@domain</EM>). However, if there is no local part in a pattern that
is not a regular expression, the local part of the address isn't used in
the matching. Thus an entry such as

<PRE>
lookingglass.fict.book        *  F,24h,30m;
</PRE>

<P>
matches any address whose domain is <EM>lookingglass.fict.book</EM>, whether this is
a local or a remote domain, whereas

<PRE>
alice@lookingglass.fict.book  *  F,24h,30m;
</PRE>

<P>
can be used only if <EM>lookingglass.fict.book</EM> is a local domain. It applies to
temporary failures involving the local part <EM>alice</EM>, but not to any other
local parts.

</P>
<P>
If a local delivery is being used to collect messages for onward transmission
by some other means (for example, as batched SMTP), a temporary failure may not
be dependent on the local part at all. Both the <EM>appendfile</EM> and <EM>pipe</EM>
transports have an option called <EM>retry_use_local_part</EM> which can be set
false in order to suppress the inclusion of local parts when matching retry
patterns for those transport instances. When this option is set, patterns
containing local parts are skipped, and regular expressions are matched against
the domain only.

</P>
<P>
For remote domains, when looking for a retry rule after a routing attempt has
failed (for example, after a DNS timeout), each line in the retry configuration
is tested only against the domain in the address. However, when looking for a
retry rule after a remote delivery attempt has failed (for example, a
connection timeout), each line in the retry configuration is first tested
against the remote host name, and then against the domain name in the address.
For example, if the MX records for <EM>a.b.c.d</EM> are

<PRE>
a.b.c.d  MX  5  x.y.z
         MX  6  p.q.r
         MX  7  m.n.o
</PRE>

<P>
and the retry rules are

<PRE>
p.q.r    *      F,24h,30m;
a.b.c.d  *      F,4d,45m;
</PRE>

<P>
then failures to deliver to host <EM>p.q.r</EM> use the first rule to determine retry
times, but for all the other hosts for the domain <EM>a.b.c.d</EM>, the second rule is
used, and that rule would also be used if routing to <EM>a.b.c.d</EM> suffers a
temporary failure.

</P>

<P>
The second field in a retry rule is the name of a particular error, or an
asterisk, which matches any error. The errors that can be tested for are:

</P>

<UL>

<LI>

<EM>refused_MX</EM>: connection refused from a host obtained from an MX record

<LI>

<EM>refused_A</EM>: connection refused from a host not obtained from an MX record

<LI>

<EM>refused</EM>: any connection refusal

<LI>

<EM>timeout_connect</EM>: connection timed out

<LI>

<EM>timeout_DNS</EM>: DNS lookup timed out

<LI>

<EM>timeout</EM>: any timeout

<LI>

<EM>quota</EM>: quota exceeded in local delivery

<LI>

<EM>quota_&#60;<EM>time</EM>&#62;</EM>: quota exceeded in local delivery, and the mailbox has not
been read for &#60;<EM>time</EM>&#62;.
</UL>

<P>
The quota errors apply both to system-enforced quotas and to Exim's own quota
mechanism in the <EM>appendfile</EM> transport.
<font color=green>
It also applies when a local delivery is deferred because a partition is full
(the ESNOSP error).
</font>

</P>
<P>
The third field in a retry rule is a sequence of retry parameter sets,
separated by semicolons. Each set consists of

<PRE>
&#60;<EM>letter</EM>&#62;,&#60;<EM>cutoff time</EM>&#62;,&#60;<EM>arguments</EM>&#62;
</PRE>

<P>
The letter identifies the algorithm for computing a new retry time; the cutoff
time is the time beyond which this algorithm no longer applies, and the
arguments vary the algorithm's action. The cutoff time is measured from the
time that the first failure for the domain (combined with the local part if
relevant) was detected, not from the time the message was received.
<A NAME="IDX1660"></A>
The available algorithms are:

</P>

<UL>

<LI>

<EM>F</EM>: retry at fixed intervals. There is a single time parameter specifying the
interval.

<LI>

<EM>G</EM>: retry at geometrically increasing intervals. The first argument specifies
a starting value for the interval, and the second a multiplier.
</UL>

<P>
When computing the next retry time, the algorithm definitions are scanned in
order until one whose cutoff time has not yet passed is reached. This is then
used to compute a new retry time that is later than the current time. In the
case of fixed interval retries, this simply means adding the interval to the
current time. For geometrically increasing intervals, retry intervals are
computed from the rule's parameters until one that is greater than the previous
interval is found. The main configuration variable
<A NAME="IDX1661"></A>
<A NAME="IDX1662"></A>
<A NAME="IDX1663"></A>
<EM>retry_interval_max</EM> limits the maximum interval between retries.

</P>
<P>
A single remote domain may have a number of hosts associated with it, and each
host may have more than one IP address. Retry algorithms are selected on the
basis of the domain name, but are applied to each IP address independently. If,
for example, a host has two IP addresses and one is broken, Exim will generate
retry times for it and will not try to use it until its next retry time comes.
Thus the good IP address is likely to be tried first most of the time.

</P>
<P>
Retry times are hints rather than promises. Exim does not make any attempt to
run deliveries exactly at the computed times. Instead, a queue-running process
starts delivery processes for delayed messages periodically, and these attempt
new deliveries only for those addresses that have passed their next retry time.
If a new message arrives for a deferred address, an immediate delivery attempt
occurs only if the address has passed its retry time. In the absence of new
messages, the minimum time between retries is the interval between
queue-running processes. There is not much point in setting retry times of five
minutes if your queue-runners happen only once an hour, unless there are a
significant number of incoming messages (which might be the case on a system
that is sending everything to a smart host, for example).

</P>
<P>
The data in the retry hints database can be inspected by using the
<EM>exim_dumpdb</EM> or <EM>exim_fixdb</EM> utility programs (see chapter 53). The
latter utility can also be used to change the data. The <EM>exinext</EM> utility
script can be used to find out what the next retry times are for the hosts
associated with a particular mail domain, and also for local deliveries that
have been deferred.

</P>


<H2><A NAME="SEC736" HREF="spec_toc.html#TOC736">33.2 Retry rule examples</A></H2>

<P>
Here are some example retry rules suitable for use when <EM>wonderland.fict.book</EM>
is a local domain:

<PRE>
alice@wonderland.fict.book quota_5d  F,7d,3h
wonderland.fict.book       quota_5d
wonderland.fict.book       *         F,1h,15m; G,2d,1h,2;
lookingglass.fict.book     *         F,24h,30m;
*                          refused_A F,2h,20m;
*                          *         F,2h,15m; G,16h,1h,1.5; F,5d,8h
</PRE>

<P>
The first rule sets up special handling for mail to
<EM>alice@wonderland.fict.book</EM> when there is an over-quota error and the mailbox
hasn't been read for at least 5 days. Retries continue every three hours for 7
days. The second rule handles over-quota errors for all other local parts at
<EM>wonderland.fict.book</EM>; the absence of a local part has the same effect as
supplying `*@'. As no retry algorithms are supplied, messages that fail are
bounced immediately if the mailbox hasn't been read for at least 5 days.

</P>
<P>
The third rule handles all other errors at <EM>wonderland.fict.book</EM>; retries
happen every 15 minutes for an hour, then with geometrically increasing
intervals until two days have passed since a delivery first failed. The fourth
rule controls retries for the domain <EM>lookingglass.fict.book</EM>, whether it is
local or remote, and the remaining two rules handle all other domains, with
special action for connection refusal from hosts that were not obtained from an
MX record.

</P>
<P>
The final rule in a retry configuration should always have asterisks in the
first two fields so as to provide a general catch-all for any addresses that do
not have their own special handling. This example tries every 15 minutes for 2
hours, then with intervals starting at one hour and increasing by a factor of
1.5 up to 16 hours, then every 8 hours up to 5 days.

</P>



<H2><A NAME="SEC737" HREF="spec_toc.html#TOC737">33.3 Timeout of retry data</A></H2>

<P>
<A NAME="IDX1664"></A>
<A NAME="IDX1665"></A>
Exim timestamps the data that it writes to its retry hints database. When it
consults the data during a delivery it ignores any that is older than the value
set in <EM>retry_data_expire</EM> (default 7 days). If, for example, a host hasn't
been tried for 7 days, Exim will try to deliver to it immediately a message
arrives, and if that fails, it will calculate a retry time as if it were
failing for the first time.

</P>
<P>
This improves the behaviour for messages routed to rarely-used hosts such as MX
backups. If such a host was down at one time, and happens to be down again when
Exim tries a month later, using the old retry data would imply that it had been
down all the time, which is not a justified assumption.

</P>
<P>
If a host really is permanently dead, this behaviour causes a burst of retries
every now and again, but only if messages routed to it are rare. It there is a
message at least once every 7 days the retry data never expires.

</P>



<H2><A NAME="SEC738" HREF="spec_toc.html#TOC738">33.4 Long-term failures</A></H2>

<P>
<A NAME="IDX1666"></A>
Special processing happens when an address has been failing for so long that
the cutoff time for the last algorithm has been reached. This is independent of
how long any specific message has been failing; it is the length of continuous
failure for the address that counts. When this is the case for a local
delivery, or for all IP addresses associated with a remote delivery, a
subsequent delivery failure causes Exim to give up on the address, and a
delivery error message is generated. In order to cater for new messages that
may use the failing address, a next retry time is still computed from the final
algorithm, and is used as follows:

</P>
<P>
If the delivery is a local one, one delivery attempt is always made for
any subsequent messages. If it fails, the address fails immediately. The
post-cutoff retry time is not used.

</P>
<P>
If the delivery is remote, there are two possibilities, controlled by the
<A NAME="IDX1667"></A>
<EM>delay_after_cutoff</EM> option of the <EM>smtp</EM> transport. The option is true by
default and in that case:

</P>

<UL>

<LI>

Until the post-cutoff retry time for one of the IP addresses is reached, any
attempt to deliver to the failing address is bounced immediately. After that
time, one new delivery attempt is made to those IP addresses that are past
their retry times, and if that still fails, the address is bounced and new
retry times are computed.
</UL>

<P>
In other words, Exim delays retrying an IP address after the final cutoff time
until a new retry time is reached, and can therefore bounce an email address
without ever trying a delivery when machines have been down for a long time.
This ensures that few resources are wasted in repeatedly trying to deliver to
a broken destination, but if it does recover, Exim will eventually notice.

</P>
<P>
If <EM>delay_after_cutoff</EM> is set false, Exim behaves differently. If all IP
addresses are past their final cutoff time, Exim tries to deliver to those IP
addresses that have not been tried since the message arrived. If there are
none, or if they all fail, the address is bounced. In other words, it does not
delay when a new message arrives, but tries the expired addresses immediately,
unless they have been tried since the message arrived. If there is a continuous
stream of messages for the failing domains, unsetting
<EM>delay_after_cutoff</EM> means that there will be many more attempts to deliver
to failing IP addresses than when <EM>delay_after_cutoff</EM> is true.

</P>


<H2><A NAME="SEC739" HREF="spec_toc.html#TOC739">33.5 Ultimate address timeout</A></H2>

<P>
An additional rule is needed to cope with cases where a host is intermittently
available, or when a message has some attribute that prevents its delivery when
others to the same address get through. In this situation, because some
messages are successfully delivered, the `retry clock' for the address keeps
getting restarted, and so a message could remain on the queue for ever. To
prevent this, if a message has been on the queue for longer than the cutoff
time of any applicable retry rule
<font color=green>
for a given address, a delivery is attempted for that address, even if it is
not yet time, and if this delivery fails, the address is timed out. A new retry
time is not computed in this case, so that other messages for the same address
are considered immediately.
</font>

</P>

<P><HR><P>
Go to the <A HREF="spec_1.html">first</A>, <A HREF="spec_32.html">previous</A>, <A HREF="spec_34.html">next</A>, <A HREF="spec_59.html">last</A> section, <A HREF="spec_toc.html">table of contents</A>.
</BODY>
</HTML>