$Id: README 44 2003-02-01 22:56:20Z aqua $
This is sugarplum, a spam-bot database poisoner utility. The specific usage
of a spam poisoner is to provide a spammer's email spider with bad data --
ideally lowering the database's usefulness so much that the database must be
reverted, discarded or manually edited.
Installation instructions may be found in the INSTALL file.
A web spider (many terms have been used, including 'bot,' 'web scanner,' 'web
bot,' etc; I'll use spider or web spider herein) is a program whose job is to
wander through web pages, either in search of some specific data or a general
set of data. Spiders are how search engines are built, and are involved in
autonomous info-spider agents by various terms. They're good tools, and
useful for many things. In particular, they're supposed to obey the Robot
Exclusion Standard (RES), which instructs the spider where it may not go, and
what it may not do (index, follow links) with a URL.
Most any technology has the potential for abuse, however, and a lot of e-mail
spammers now employ specialized spiders for the purpose of harvesting email
addresses for their various unpleasant purposes. Such spiders work more or
less like any other spider, crawling around looking for email addresses to add
to their database. Spam spiders, needless to say, ignore the RES spec, and
many attempt to appear as innocuous as possible via unusual search patterns,
randomly changing User-Agent: settings, etc.
While working for a regional ISP in 1998-99, I came across a homemade poisoner
called "dauber," hacked up by the head sysadmin, Scott Doty <email@example.com>.
At the time, dauber simply printed out a page of random words, containing a
few email addresses in which the remote spider's IP address was encoded. The
addresses were invalid, but spammers who sent mail to those addresses left log
entries which could then be decoded to identify what spider had generated the
traffic, and in some cases, identified a netblock which would then have their
packets routed into oblivion, world without end. A neat trick.
Sugarplum is an amalgam of several ideas for opposing spam via interfering
with the use of collection spiders. I can only lay a claim to three, both
notions that have no doubt occurred to many others also:
1. Poison spam bots by provding them with a randomly
generated tree of bad data, usually in great quantity.
[credit to Scott Doty as per above, and many others]
2. Encode the harvester's IP address in teergrube (tarpit) addresses,
to identify where they came from.
[credit to the teergrube FAQ, slashdot.org discussion, dauber, etc.]
3. Amongst the various bad addresses, include an assortment
of addresses belonging to known spammers, so that they
may spam each other.
[credit to soc.subculture.bondage-bdsm, circa 1997]
4. Adjust one's webserver configuration such that no matter
what page a spam bot requests, it transparently receives
5. If a spambot wanders into the poison, identify it
as a spambot by noting whether its User-Agent: header
value changes in an un-human like fashion.
[ n.b.: this functionality has been obviated in v0.9.8 ]
6. Avoid counterdetection (letting the spambot know it's
being poisoned) by rendering output in a fashion as close
to normal human output as automatically feasible (even repeatable
output, if deterministic mode is used). This
involves variable HTML syntax and content, extensive
randomization, vague attempts at grammar, etc. The
primary assumption in this respect is to assume that the
author of the spambot is at least as smart as you are --
and that it will notice any tricks obvious enough that
you yourself could pick them up.
5. Upon positive identification of a spambot, launch
out-of-band protective measures against it, such as
adding its IP to a firewall deny-rule, or making
point-target denial of service attacks against it.
[ n.b.: this has been removed in v0.9.8 ]
Sugarplum was written in 1999, according to the observed habits of spammers at
the time. These have changed some since. Address harvesting, while still
very common, is no longer the most prevalent method of obtaining addresses for
bulk mailing (at present dictionary attacks against large hosts seem to be the
Early releases of Sugarplum gained some slight notoriety for including hooks
with which it could be configured to launch denial of service attacks against
harvesters. While this was arguably workable at the time, as of this writing
all major OS vendors have spent enough energy hardening their TCP/IP stacks as
to make "ping of death"-style attacks largely unviable. As of v0.9.8, this
facility has been removed from sugarplum. It may reappear in some later
version if a new class of viable counterattack against a single harvester
For some time it was possible to identify some spambot harvesters by their
proclivity for randomizing their user-agent headers, returning a different
agent on each HTTP request. This was an obvious error on part of the spambot
authors and has not been seen in some time. Sugarplum's facility for
recognizing this behavior, which in any case was useful only for confidently
launching a counterattack, has been removed.
How it works:
The mechanisms that make up sugarplum are:
A pair of dictionaries, one with words in the local language, the
other with the addresses of known spammers,
A set of Apache mod_rewrite rules for spambots that identify
themselves as such, and maps their requests back into the poison,
A CGI to perform the actual poisoning.
Etiquette and ethical considerations:
The ethical/moral/legal implications of spam are relatively straightforward,
but should nonetheless be considered all the way through before making use of
sugarplum. I won't go into the various arguments here -- make up your own
mind, and see the net-abuse newsgroups and related resources if you need more
data. There are legitimate reasons for using address harvesters, though their
utility has (indirectly) been destroyed by widespread use of harvesters for
Sugarplum is capable of producing entirely random addresses, some percentage
of which will coincide with legitimate addresses, or with legitimate domains
having universal "blanket" delivery. Since the addresses are random, the odds
of intersection with an address that cannot simply be deactivated without cost
are very low, but the possibility still concerns some people. While I don't
agree that it's a significant problem, as of v0.9.8 this form of randomization
is disabled by default, to try to provide the safest possible default
Assorted random gibberish:
The name "sugarplum" is an appealing little irony; most poisons (substances
poisonous to humans, anyway) taste bitter. According to the dictionary, a
sugarplum is any of a number of small candies or rolled sweetmeats. A
sugarplum that involves actual plums is made by carmelizing the fruit syrup
and sugar in a pan. The whole point of feeding poison to the unwary is either
to make it invisible and indistinguishable from normal food/drink, or else to
make it deliberately tantalizing, with its poisonous nature well concealed
until after copious consumption.
Sugarplum may be freely distributed, modified, etc. under terms of the GNU
General Public License (GPL) v2, or at your option, any later version.
Devin Carraway <firstname.lastname@example.org>
$Date: 2002/09/27 11:07:06 $