1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545
|
(This document was generated from design-notes.html)
_________________________________________________________________
Design Notes On Fetchmail
These notes are for the benefit of future hackers and maintainers. The
following sections are both functional and narrative, read from beginning to
end.
History
A direct ancestor of the fetchmail program was originally authored (under
the name popclient) by Carl Harris <ceharris@mal.com>. I took over
development in June 1996 and subsequently renamed the program `fetchmail' to
reflect the addition of IMAP support and SMTP delivery. In early November
1996 Carl officially ended support for the last popclient versions.
Before accepting responsibility for the popclient sources from Carl, I had
investigated and used and tinkered with every other UNIX remote-mail
forwarder I could find, including fetchpop1.9, PopTart-0.9.3, get-mail,
gwpop, pimp-1.0, pop-perl5-1.2, popc, popmail-1.6 and upop. My major goal
was to get a header-rewrite feature like fetchmail's working so I wouldn't
have reply problems anymore.
Despite having done a good bit of work on fetchpop1.9, when I found
popclient I quickly concluded that it offered the solidest base for future
development. I was convinced of this primarily by the presence of
multiple-protocol support. The competition didn't do POP2/RPOP/APOP, and I
was already having vague thoughts of maybe adding IMAP. (This would advance
two other goals: learn IMAP and get comfortable writing TCP/IP client
software.)
Until popclient 3.05 I was simply following out the implications of Carl's
basic design. He already had daemon.c in the distribution, and I wanted
daemon mode almost as badly as I wanted the header rewrite feature. The
other things I added were bug fixes or minor extensions.
After 3.1, when I put in SMTP-forwarding support (more about this below) the
nature of the project changed -- it became a carefully-thought-out attempt
to render obsolete every other program in its class. The name change quickly
followed.
The rewrite option
MTAs ought to canonicalize the addresses of outgoing non-local mail so that
From:, To:, Cc:, Bcc: and other address headers contain only fully qualified
domain names. Failure to do so can break the reply function on many mailers.
(Sendmail has an option to do this.)
This problem only becomes obvious when a reply is generated on a machine
different from where the message was delivered. The two machines will have
different local username spaces, potentially leading to misrouted mail.
Most MTAs (and sendmail in particular) do not canonicalize address headers
in this way (violating RFC 1123). Fetchmail therefore has to do it. This is
the first feature I added to the ancestral popclient.
Reorganization
The second thing I did reorganize and simplify popclient a lot. Carl
Harris's implementation was very sound, but exhibited a kind of unnecessary
complexity common to many C programmers. He treated the code as central and
the data structures as support for the code. As a result, the code was
beautiful but the data structure design ad-hoc and rather ugly (at least to
this old LISP hacker).
I was able to improve matters significantly by reorganizing most of the
program around the `query' data structure and eliminating a bunch of global
context. This especially simplified the main sequence in fetchmail.c and was
critical in enabling the daemon mode changes.
IMAP support and the method table
The next step was IMAP support. I initially wrote the IMAP code as a generic
query driver and a method table. The idea was to have all the
protocol-independent setup logic and flow of control in the driver, and the
protocol-specific stuff in the method table.
Once this worked, I rewrote the POP3 code to use the same organization. The
POP2 code kept its own driver for a couple more releases, until I found
sources of a POP2 server to test against (the breed seems to be nearly
extinct).
The purpose of this reorganization, of course, is to trivialize the
development of support for future protocols as much as possible. All
mail-retrieval protocols have to have pretty similar logical design by the
nature of the task. By abstracting out that common logic and its interface
to the rest of the program, both the common and protocol-specific parts
become easier to understand.
Furthermore, many kinds of new features can instantly be supported across
all protocols by modifying the one driver module.
Implications of smtp forwarding
The direction of the project changed radically when Harry Hochheiser sent me
his scratch code for forwarding fetched mail to the SMTP port. I realized
almost immediately that a reliable implementation of this feature would make
all the other delivery modes obsolete.
Why mess with all the complexity of configuring an MDA or setting up
lock-and-append on a mailbox when port 25 is guaranteed to be there on any
platform with TCP/IP support in the first place? Especially when this means
retrieved mail is guaranteed to look like normal sender- initiated SMTP
mail, which is really what we want anyway.
Clearly, the right thing to do was (1) hack SMTP forwarding support into the
generic driver, (2) make it the default mode, and (3) eventually throw out
all the other delivery modes.
I hesitated over step 3 for some time, fearing to upset long-time popclient
users dependent on the alternate delivery mechanisms. In theory, they could
immediately switch to .forward files or their non-sendmail equivalents to
get the same effects. In practice the transition might have been messy.
But when I did it (see the NEWS note on the great options massacre) the
benefits proved huge. The cruftiest parts of the driver code vanished.
Configuration got radically simpler -- no more grovelling around for the
system MDA and user's mailbox, no more worries about whether the underlying
OS supports file locking.
Also, the only way to lose mail vanished. If you specified localfolder and
the disk got full, your mail got lost. This can't happen with SMTP
forwarding because your SMTP listener won't return OK unless the message can
be spooled or processed.
Also, performance improved (though not so you'd notice it in a single run).
Another not insignificant benefit of this change was that the manual page
got a lot simpler.
Later, I had to bring --mda back in order to allow handling of some obscure
situations involving dynamic SLIP. But I found a much simpler way to do it.
The moral? Don't hesitate to throw away superannuated features when you can
do it without loss of effectiveness. I tanked a couple I'd added myself and
have no regrets at all. As Saint-Exupery said, "Perfection [in design] is
achieved not when there is nothing more to add, but rather when there is
nothing more to take away." This program isn't perfect, but it's trying.
The most-requested features that I will never add, and why not:
Password encryption in .fetchmailrc
The reason there's no facility to store passwords encrypted in the
.fetchmailrc file is because this doesn't actually add protection.
Anyone who's acquired the 0600 permissions needed to read your .fetchmailrc
file will be able to run fetchmail as you anyway -- and if it's your
password they're after, they'd be able to rip the necessary decoder out of
the fetchmail code itself to get it.
All .fetchmailrc encryption would do is give a false sense of security to
people who don't think very hard.
Truly concurrent queries to multiple hosts
Occasionally I get a request for this on "efficiency" grounds. These people
aren't thinking either. True concurrency would do nothing to lessen
fetchmail's total IP volume. The best it could possibly do is change the
usage profile to shorten the duration of the active part of a poll cycle at
the cost of increasing its demand on IP volume per unit time.
If one could thread the protocol code so that fetchmail didn't block on
waiting for a protocol response, but rather switched to trying to process
another host query, one might get an efficiency gain (close to constant
loading at the single-host level).
Fortunately, I've only seldom seen a server that incurred significant wait
time on an individual response. I judge the gain from this not worth the
hideous complexity increase it would require in the code.
Multiple concurrent instances of fetchmail
Fetchmail locking is on a per-invoking-user because finer-grained locks
would be really hard to implement in a portable way. The problem is that you
don't want two fetchmails querying the same site for the same remote user at
the same time.
To handle this optimally, multiple fetchmails would have to associate a
system-wide semaphore with each active pair of a remote user and host
canonical address. A fetchmail would have to block until getting this
semaphore at the start of a query, and release it at the end of a query.
This would be way too complicated to do just for an "it might be nice"
feature. Instead, you can run a single root fetchmail polling for multiple
users in either single-drop or multidrop mode.
The fundamental problem here is how an instance of fetchmail polling host
foo can assert that it's doing so in a way visible to all other fetchmails.
System V semaphores would be ideal for this purpose, but they're not
portable.
I've thought about this a lot and roughed up several designs. All are
complicated and fragile, with a bunch of the standard problems (what happens
if a fetchmail aborts before clearing its semaphore, and how do we recover
reliably?).
I'm just not satisfied that there's enough functional gain here to pay for
the large increase in complexity that adding these semaphores would entail.
Multidrop and alias handling
I decided to add the multidrop support partly because some users were
clamoring for it, but mostly because I thought it would shake bugs out of
the single-drop code by forcing me to deal with addressing in full
generality. And so it proved.
There are two important aspects of the features for handling multiple-drop
aliases and mailing lists which future hackers should be careful to
preserve.
1. The logic path for single-recipient mailboxes doesn't involve header
parsing or DNS lookups at all. This is important -- it means the code
for the most common case can be much simpler and more robust.
2. The multidrop handing does not rely on doing the equivalent of passing
the message to sendmail -t. Instead, it explicitly mines members of a
specified set of local usernames out of the header.
3. We do not attempt delivery to multidrop mailboxes in the presence of DNS
errors. Before each multidrop poll we probe DNS to see if we have a
nameserver handy. If not, the poll is skipped. If DNS crashes during a
poll, the error return from the next nameserver lookup aborts message
delivery and ends the poll. The daemon mode will then quietly spin until
DNS comes up again, at which point it will resume delivering mail.
When I designed this support, I was terrified of doing anything that could
conceivably cause a mail loop (you should be too). That's why the code as
written can only append local names (never @-addresses) to the recipients
list.
The code in mxget.c is nasty, no two ways about it. But it's utterly
necessary, there are a lot of MX pointers out there. It really ought to be a
(documented!) entry point in the bind library.
DNS error handling
Fetchmail's behavior on DNS errors is to suppress forwarding and deletion of
the individual message that each occurs in, leaving it queued on the server
for retrieval on a subsequent poll. The assumption is that DNS errors are
transient, due to temporary server outages.
Unfortunately this means that if a DNS error is permanent a message can be
perpetually stuck in the server mailbox. We've had a couple bug reports of
this kind due to subtle RFC822 parsing errors in the fetchmail code that
resulted in impossible things getting passed to the DNS lookup routines.
Alternative ways to handle the problem: ignore DNS errors (treating them as
a non-match on the mailserver domain), or forward messages with errors to
fetchmail's invoking user in addition to any other recipients. These would
fit an assumption that DNS lookup errors are likely to be permanent problems
associated with an address.
IPv6 and IPSEC
The IPv6 support patches are really more protocol-family independence
patches. Because of this, in most places, "ports" (numbers) have been
replaced with "services" (strings, that may be digits). This allows us to
run with certain protocols that use strings as "service names" where we in
the IP world think of port numbers. Someday we'll plumb strings all over and
then, if inet6 is not enabled, do a getservbyname() down in SocketOpen. The
IPv6 support patches use getaddrinfo(), which is a POSIX p1003.1g mandated
function. So, in the not too distant future, we'll zap the ifdefs and just
let autoconf check for getaddrinfo. IPv6 support comes pretty much
automatically once you have protocol family independence.
Internationalization
Internationalization is handled using GNU gettext (see the file ABOUT_NLS in
the source distribution). This places some minor constraints on the code.
Strings that must be subject to translation should be wrapped with GT_() or
N_() -- the former in function arguments, the latter in static initializers
and other non-function-argument contexts.
Checklist for Adding Options
Adding a control option is not complicated in principle, but there are a lot
of fiddly details in the process. You'll need to do the following minimum
steps.
* Add a field to represent the control in struct run, struct query, or
struct hostdata.
* Go to rcfile_y.y. Add the token to the grammar. Don't forget the %token
declaration.
* Pick an actual string to declare the option in the .fetchmailrc file.
Add the token to rcfile_l.
* Pick a long-form option name, and a one-letter short option if any are
left. Go to options.c. Pick a new LA_ value. Hack the longoptions table
to set up the association. Hack the big switch statement to set the
option. Hack the `?' message to describe it.
* If the default is nonzero, set it in def_opts near the top of
load_params in fetchmail.c.
* Add code to dump the option value in fetchmail.c:dump_params.
* For a per-site or per-user option, add proper FLAG_MERGE actions in
fetchmail.c's optmerge() function. For a global option, add an override
at the end of load_params; this will involve copying a "cmd_run." field
to a corresponding "run." field, see the existing code for models.
* Document the option in fetchmail.man. This will require at least two
changes; one to the collected table of options, and one full text
description of the option.
* Hack fetchmailconf to configure it. Bump the fetchmailconf version.
* Hack conf.c to dump the option so we won't have a version-skew problem.
* Add an entry to NEWS.
* If the option implements a new feature, add a note to the feature list.
There may be other things you have to do in the way of logic, of course.
Before you implement an option, though, think hard. Is there any way to make
fetchmail automatically detect the circumstances under which it should
change its behavior? If so, don't write an option. Just do the check!
Lessons learned
1. Server-side state is essential
The person(s) responsible for removing LAST from POP3 deserve to suffer.
Without it, a client has no way to know which messages in a box have been
read by other means, such as an MUA running on the server.
The POP3 UID feature described in RFC1725 to replace LAST is insufficient.
The only problem it solves is tracking which messages have been read by this
client -- and even that requires tricky, fragile implementation.
The underlying lesson is that maintaining accessible server-side `seen'
state bits associated with Status headers is indispensible in a Unix/RFC822
mail server protocol. IMAP gets this right.
2. Readable text protocol transactions are a Good Thing
A nice thing about the general class of text-based protocols that SMTP,
POP2, POP3, and IMAP belongs to is that client/server transactions are easy
to watch and transaction code correspondingly easy to debug. Given a decent
layer of socket utility functions (which Carl provided) it's easy to write
protocol engines and not hard to show that they're working correctly.
This is an advantage not to be despised! Because of it, this project has
been interesting and fun -- no serious or persistent bugs, no long hours
spent looking for subtle pathologies.
3. IMAP is a Good Thing.
Now that there is a standard IMAP equivalent of the POP3 APOP validation in
CRAM-MD5, POP3 is completely obsolete.
4. SMTP is the Right Thing
In retrospect it seems clear that this program (and others like it) should
have been designed to forward via SMTP from the beginning. This lesson may
be applicable to other Unix programs that now call the local MDA/MTA as a
program.
5. Syntactic noise can be your friend
The optional `noise' keywords in the rc file syntax started out as a
late-night experiment. The English-like syntax they allow is considerably
more readable than the traditional terse keyword-value pairs you get when
you strip them all out. I think there may be a wider lesson here.
Motivation and validation
It is truly written: the best hacks start out as personal solutions to the
author's everyday problems, and spread because the problem turns out to be
typical for a large class of users. So it was with Carl Harris and the
ancestral popclient, and so with me and fetchmail.
It's gratifying that fetchmail has become so popular. Until just before 1.9
I was designing strictly to my own taste. The multi-drop mailbox support and
the new --limit option were the first features to go in that I didn't need
myself.
By 1.9, four months after I started hacking on popclient and a month after
the first fetchmail release, there were literally a hundred people on the
fetchmail-friends contact list. That's pretty powerful motivation. And they
were a good crowd, too, sending fixes and intelligent bug reports in volume.
A user population like that is a gift from the gods, and this is my
expression of gratitude.
The beta testers didn't know it at the time, but they were also the subjects
of a sociological experiment. The results are described in my paper, The
Cathedral And The Bazaar.
Credits
Special thanks go to Carl Harris, who built a good solid code base and then
tolerated me hacking it out of recognition. And to Harry Hochheiser, who
gave me the idea of the SMTP-forwarding delivery mode.
Other significant contributors to the code have included Dave Bodenstab
(error.c code and --syslog), George Sipe (--monitor and --interface), Gordon
Matzigkeit (netrc.c), Al Longyear (UIDL support), Chris Hanson (Kerberos V4
support), and Craig Metz (OPIE, IPv6, IPSEC).
Conclusion
At this point, the fetchmail code appears to be pretty stable. It will
probably undergo substantial change only if and when support for a new
retrieval protocol or authentication method is added.
Relevant RFCS
Not all of these describe standards explicitly used in fetchmail, but they
all shaped the design in one way or another.
RFC821
SMTP protocol
RFC822
Mail header format
RFC937
Post Office Protocol - Version 2
RFC974
MX routing
RFC976
UUCP mail format
RFC1081
Post Office Protocol - Version 3
RFC1123
Host requirements (modifies 821, 822, and 974)
RFC1176
Interactive Mail Access Protocol - Version 2
RFC1203
Interactive Mail Access Protocol - Version 3
RFC1225
Post Office Protocol - Version 3
RFC1344
Implications of MIME for Internet Mail Gateways
RFC1413
Identification server
RFC1428
Transition of Internet Mail from Just-Send-8 to 8-bit SMTP/MIME
RFC1460
Post Office Protocol - Version 3
RFC1508
Generic Security Service Application Program Interface
RFC1521
MIME: Multipurpose Internet Mail Extensions
RFC1869
SMTP Service Extensions (ESMTP spec)
RFC1652
SMTP Service Extension for 8bit-MIMEtransport
RFC1725
Post Office Protocol - Version 3
RFC1730
Interactive Mail Access Protocol - Version 4
RFC1731
IMAP4 Authentication Mechanisms
RFC1732
IMAP4 Compatibility With IMAP2 And IMAP2bis
RFC1734
POP3 AUTHentication command
RFC1870
SMTP Service Extension for Message Size Declaration
RFC1891
SMTP Service Extension for Delivery Status Notifications
RFC1892
The Multipart/Report Content Type for the Reporting of Mail System
Administrative Messages
RFC1894
An Extensible Message Format for Delivery Status Notifications
RFC1893
Enhanced Mail System Status Codes
RFC1894
An Extensible Message Format for Delivery Status Notifications
RFC1938
A One-Time Password System
RFC1939
Post Office Protocol - Version 3
RFC1957
Some Observations on Implementations of the Post Office Protocol
(POP3)
RFC1985
SMTP Service Extension for Remote Message Queue Starting
RFC2033
Local Mail Transfer Protocol
RFC2060
Internet Message Access Protocol - Version 4rev1
RFC2061
IMAP4 Compatibility With IMAP2bis
RFC2062
Internet Message Access Protocol - Obsolete Syntax
RFC2195
IMAP/POP AUTHorize Extension for Simple Challenge/Response
RFC2177
IMAP IDLE command
RFC2449
POP3 Extension Mechanism
RFC2554
SMTP Service Extension for Authentication
RFC2595
Using TLS with IMAP, POP3 and ACAP
RFC2645
On-Demand Mail Relay: SMTP with Dynamic IP Addresses
RFC2683
IMAP4 Implementation Recommendations
RFC2821
Simple Mail Transfer Protocol
RFC2822
Internet Message Format
_________________________________________________________________
Eric S. Raymond <esr@snark.thyrsus.com>
|