1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586
|
The Introducing AUB Document
1. What is aub?
More and more people are posting binary files to usenet these days.
Some of these binaries are executables and audio data; a majority seem to
be pictures of various things, typically landscapes, movie stars and naked
people. Because of limitations in the type data that usenet can accommodate,
binaries must be encoded into text, and because binary files are commonly very
large relative to text files usenet was designed to handle, they frequently
must be broken up into pieces. Programs have been developed which take a
given binary, encode it, and automatically post it in pieces with descriptive
subject lines.
When this data arrives at a remote site, users see subject lines
that look something like this:
12011 roadkill03.gif, part 1/4
12012 roadkill03.gif, part 3/4
12013 More pictures of tatooed children, please...
12014 Re: roadkill02.gif -- I love the way the eyes bulge out
12015 roadkill03.gif, part 4/4
12016 roseanne_nude.jpg, part 02 of 02
12017 Only BINARIES should be posted here, GOD DAMMIT
12018 roadkill03.gif, part 2/4
12019 HI, I'M BIFF!!!! THESE PIX ARE WAY COOL!!!!
12020 roseanne_nude.jpg, part 01 of 02
While the process of encoding and splitting up binaries for posting
to usenet is relatively straightforward, the process of retrieving, sorting,
and decoding the pieces (which do not necessarily arrive in order) at
receiving sites is less straightforward, tedious, time consuming, and very
prone to human error.
aub, which stands for "assemble usenet binaries", automates this
reassembly process for you. aub is intended for use in newsgroups to which
binaries are posted exclusively. When run, it accesses news articles via
either a disk-based news spool directory, or via an NNTP news server,
determines whether or not any new binaries have appeared in selected
newsgroups since the last time it was run, and if so, retrieves, organizes
and decodes them, depositing them in a configurable location. This process
requires no human intervention once aub has been configured. aub also keeps
track of binaries which it has seen some, but not all, of the pieces of. It
remembers how to find these old pieces, so that when new, previously missing
pieces arrive at your site, it will build the entire binary the next time it
is run. It also remembers which binaries it has already seen all of the
pieces of already, so that it does not waste time rebuilding the same binaries
over and over again.
aub was created as a time saver; too many people at too many sites
were spending way too much time manually unpacking binary files. Its ability
to identify and assemble binary images depends on people posting images with
subject lines that observe (loosely) established conventions. aub's
recognition capabilities have been significantly improved since the earliest
release.
2. How does aub work?
aub looks for subject lines containing strings like:
N of N
N / N
N N
N | N
where N is any number composed of one or more digits, and white
space is optional. Once it sees such a line, it tries to figure out a
name for the binary by looking at the rest of the subject line. These names
are relevant only to aub's internal functioning; when unpacked, binaries are
named according to the information they were encoded with. However, it's
important that, whatever internal name aub decides on for the binary, that
name be recognizable in the subject lines of all pieces.
aub ignores all news articles with null subject lines and subject
lines that begin with "Re:" regardless of other content.
aub uses two files which are maintained in each user's home directory.
One is $HOME/.aubconf, which is a configuration file that allows you to
customize aub's behavior. See section 5 for a detailed explanation of the
structure of configuration files. The other file is $HOME/.aubrc. You
should never need to modify this file; aub creates it and maintains it. It's
used to keep track of what articles in which groups aub has resolved
already, and what articles aub believes to be pieces of binaries that it
hasn't seen all of the pieces of yet.
3. What do I need on my system to run aub?
You will need Larry Wall's perl interpreter. Older versions of aub
also required David Mack's uumerge program; this functionality has since been
folded into aub for the sake of speed. perl is available via anonymous FTP
from uunet.uu.net, tut.cis.ohio-state.edu, and jpl-decvax.jpl.nasa.gov.
Your machine must also have access to news, either via the NNTP
NNTP protocol, or by being able to open raw news files on a disk somewhere.
Previous versions of aub required that your news access be NNTP-based; this
restriction has since been lifted.
4. How do I install aub?
There's really only one thing that you might need to configure.
aub is a perl script. The first line of the program looks like this:
#!/usr/local/bin/perl
This appears to tell your shell where to find the perl interpreter.
If the path of perl on your system is something else, you'll need to change
this line, or create a link called /usr/local/bin/perl which points to where
your perl executable actually resides.
If you need to change this, you'll probably see a message like:
'aub: Bad address.' when you try to run aub.
5. How do I configure aub?
Older versions of aub made use of a configuration file which was
normally called $HOME/.aubinit. But few interesting customizations could
be accomplished with .aubinit files, because the configuration language
was so primitive. The configuration language has been redesigned to allow
much greater flexibility. Old .aubinit files will no longer work, or be
recognized by aub (except inasmuch as aub will notice them and point out
to you that you need to create a new configuration file if you don't already
have one.) The new configuration file for aub should be called $HOME/.aubconf.
Configuration files are line-oriented; each line is processed
separately. If any line contains the '#' character, aub concludes that
the character begins a comment, and discards the comment character and
everything one the line that follows it. If for some reason you need to
put a '#' character in your configuration file and do not want it to be
interpreted as beginning a comment, you'll have to escape it by preceding it
with a backslash character, e.g. '\#'.
Each non-blank line in a configuration file must begin with a
keyword recognized by aub. The case of keywords is not significant.
As far as aub is concerned, "keyword", "KEYWORD", "Keyword" and "KeYWorD"
all mean the same thing. Some keywords require arguments; some require no
arguments appear, and some permit varialbe numbers of arguments. If aub
sees keywords it doesn't understand in your .aubconf file, it will complain
to you about them.
One of the keywords aub understands is the GROUP keyword. It's
used to tell aub that you want to decode binaries from the newsgroup(s)
which appear as argument(s) to the keyword. For example:
GROUP alt.binaries.pictures.misc
GROUP alt.binaries.pictures.misc alt.binaries.pictures.fractals
Every configuration file must contain at least one GROUP keyword to
be correct.
In general, aub understands two types of keywords. One type is
called 'position insensitive', which means that the keyword will have the
same effect no matter where in the configuration file it appears. The
other type is called 'position sensitive', which means that the keyword
means something different when it appears before any GROUP keywords than
it does when it appears after any given GROUP keyword.
One such position sensitive keyword is the DIRectory keyword.
This keyword is used to tell aub what directory to put binaries it decodes
in. ("DIRectory" is spelled the way it is because only the 'DIR' part needs
to appear in a configuration file for aub to recognize it. In fact, aub will
interpret any keyword beginning with the letters 'DIR' as being an instance
of the DIRectory keyword.)
When a position sensitive keyword appears _before_ any GROUP keyword,
the keyword is interpreted as being the default for all groups that appear
later.
When a position sensitive keyword appears _after_ any GROUP keyword,
it is interpreting as applying *only* to that group, overriding any previous
default which may have been established via use of the same keyword, or
by the value of environment variables (see section 8.)
Position sensitive keywords appearing after a GROUP keyword which
lists multiple groups are applied only to the last group listed, not to
all groups appearing on the group line.
For example, the following three configuration files are equivalent:
# Sample .aubconf file no. 1 -- basic example
#
dir /tmp/aub # Default directory
group alt.binaries.pictures.misc # Process these
group alt.binaries.pictures.fractals # two groups
# Sample .aubconf file no. 2 -- multiple group usage, mixed case
#
DiR /tmp/aub # Default directory
gRoUp alt.binaries.pictures.misc alt.binaries.pictures.fractals
# Sample .aubconf file no. 3 -- does not use defaults
#
group alt.binaries.pictures.misc
directory /tmp/aub
group alt.binaries.pictures.fractals
direct-to /tmp/aub # 'dir' is all you need
The following three configuration files are also equivalent, though
not equivalent to the previous three:
# Sample .aubconf file no. 4 -- explicit placement of binaries
#
group alt.binaries.pictures.misc
dir /tmp/aub/misc
group alt.binaries.pictures.fractals
dir /tmp/aub/fractals
# Sample .aubconf file no. 5 -- explicit and default placement
#
dir /tmp/aub/misc # Default directory
group alt.binaries.pictures.misc # Use default directory
group alt.binaries.pictures.fractals
dir /tmp/aub/fractals # Override default
# Sample .aubconf file no. 6 -- explicit and default placement revisited
#
dir /tmp/aub/fractals # Default directory
group alt.binaries.pictures.misc
dir /tmp/aub/fractals # Override default
group alt.binaries.pictures.fractals # Use default directory
The configuration file:
# Sample .aubconf file no. 7 -- invalid
#
group alt.binaries.pictures.misc
dir /tmp/aub
group alt.binaries.pictures.fractals # No good
is invalid, because no directory for aub to place binaries decoded
from the newsgroup alt.binaries.pictures.fractals is specified. The
DIRectory keyword is unique in this regard; there must be some use of the
keyword that enables aub to figure out where to put binaries for every
group specified, or it will refuse to run. The easiest way to deal with
this is to always establish a default directory by using the DIRectory
keyword somewhere before any groups appear.
Other position sensitive keywords are available.
DESCription <file>
This keyword causes aub to extract text from what it thinks is the
text portion of posted articles, and append it to the file you specify. This
is useful if you're interested in reading the text that describes what all
the binaries aub is unpacking are about. A maximum of 60 lines per binary
extracted will be put into the file you indicate. Each description is
prepended with the name of the decoded binary it refers to, and the group
that binary was decoded from.
HOOK <program>
This keyword enables you to select which binaries aub decodes
using your own software. If the HOOK keyword is specified, aub will
invoke the argument program and supply it with subject line of the first
piece of a binary that it can potentially decode via standard input. If the
program returns true (zero), aub will decode the binary. If the program
returns false (non-zero), aub will skip decoding the binary, and continue
processing.
It is not (yet) possible to specify arguments to the user program.
For example, the following sample program returns true if standard
input contains the string ".gif" (case insignificant), and false otherwise.
#!/usr/local/bin/perl
#
# /tmp/sample_aub_hook: a simple, sample hook program
#
$sl = <STDIN>; # Get standard input
exit(0) if ($sl =~ m/.gif/i); # Contains ".gif"
exit(1); # Didn't see ".gif"
Suppose this program were attached to aub via the configuration line:
hook /tmp/sample_aub_hook
Then aub would only decode binaries containing the string '.gif'.
You can write hook programs in any language you choose.
POSTprocess <postprocessor> <extn> ...
This keyword enables you to postprocess binaries whose names end
in the string <extn> (you can list any number of these suffixes on a single
line in the configuration file.) Case is not significant in <extn>. Before
a POSTprocess keyword can appear, <postprocessor> must first be defined
using the DEFine keyword, which is position insensitive. The format of
the DEFine keyword is
DEFine <postprocessor> <unix cmd>
<postprocessor> may be any string. It's recommended that you
stick to alphanumerics.
<unix cmd> is any UNIX command, with arguments. Simple substitutions
are performed on <unix cmd> before it's executed in conjunction with the
existenece of a POSTprocess keyword and the appearance of a binary whose
filename ends in one of the <extn> suffixes listed as arguments to the
POSTprocess keyword. This all makes perfect sense but is a little difficult
to explain. The following example should make things much clearer.
Consider the following configuration file:
# Sample aub configuration file demonstrating use of a postprocessor
#
dir /tmp/aubdir
define jpg2gif djpeg -G $f > $h_.gif
postprocess jpg2gif .jpg .jpeg
group alt.binaries.pictures.misc
The first line tells aub that it should decode binaries into the
directory /tmp/aubdir. The second line defines a postprocessor for aub.
The name of the postprocessor is specified as "jpg2gif". The third line
says that the postprocessor will be invoked whenever a binary with a name
ending in '.jpg' or '.jpeg' is decoded. The fourth line specifies the
group that binaries are to be decoded from.
Suppose the binary full_moon.jpeg is decoded from
alt.binaries.pictures.misc. The binary name "full_moon.jpeg" can be
thought of as consisting of three parts; the head part -- everything before
the last '.' character -- the '.' character itself, and the tail part --
everything after the last '.' character. aub uses the abbreviations
'$h', '$t', and '$f' to refer to the head part, tail part, and entire
filename, respectively. (If no '.' character appears in the name of a
decoded binary, $h equals $f, the entire name of the binary, and $t is
empty.)
Because the binary name "full_moon.jpeg" ends in ".jpeg", one of the
arguments specified on line two of the sample configuration file, aub
invokes the postprocessor "jpg2gif". aub substitutes the appropriate
values for '$f' and '$h', in this case, "full_moon.jpeg" and "full_moon"
into the postprocessor definition, and executes the resulting UNIX command,
which in this case is 'djpeg -G full_moon.jpeg > full_moon_.gif' Assuming
that you have the djpeg program on your machine (this software is available
via anonymous FTP from ftp.uu.net under the graphics/jpeg directory), this
command will cause the .jpeg file to be automatically converted into a
similarly named .gif file when it is decoded.
A few more examples, again, based on the configuration file above
Filename of decoded binary $h $t $f
------------------------------------------------------------------------------
crescent_moon.jpg crescent_moon jpg crescent_moon.jpg
big.dog.gif big.dog gif big.dog.gif
Filename of decoded binary Postprocessed Reason
------------------------------------------------------------------------------
crescent_moon.jpg yes $f ends in '.jpg'
big.dog.gif no $f doesn't end in '.jpg' or in
'.jpeg'
Filename of decoded binary UNIX command executed
------------------------------------------------------------------------------
crescent_moon.jpg djpeg -G crescent_moon.jpg > crescent_moon_.gif
big.dog.gif (none executed)
We could have easily have written:
define jpg2gif djpeg -G $f > $h_.gif ; rm -f $f
to cause aub to remove the old .jpeg version of the binary after
converting it to .gif format.
I've added the extra underscore character in this example to
decrease the chance that djpeg, when it runs, will clobber another
binary which aub already unpacked with the name "full_moon.gif" or
"cresecent_moon.gif".
Postprocessor definitions that can't be executed for some reason
may cause you (and aub) some problems at run time.
The following keywords are, like DEFine, position independent:
NNTP <server>
This tells aub that your news access is NNTP-based, and that it
should use the specified host as an NNTP server.
SPOOL <directory>
This tells aub that your news access is based on access to raw news
files, and that <directory> is the root of the news spool tree.
A single configuration file may not contain both the NNTP and SPOOL
keywords.
If neither the NNTP keyword nor the SPOOL keyword appear in your
configuration file, aub will assume your news access is via NNTP and use
your NNTPSERVER environment variable, if it is defined, to decide what
server to connect to. If your NNTPSERVER environment variable is not
defined, aub will try to figure out where you normally read news from.
If it can't do that, it will ask you to supply the information.
If you ever change the mechanism by which you access news, or the
server you read news on, you'll need to remove the .aubrc file that aub
maintains to keep track of what groups you have and have not read. Otherwise,
because articles are numbered differently on different servers, aub will get
hopelessly confused. (It's possible, though not recommended, to switch
seamlessly back and forth between NNTP and SPOOL access to news on the
same host.) This is probably the only time you'll ever want to tamper with
a .aubrc file.
DEBUG <n>
Sets the default debugging level aub runs at to N. N must be a
non-negative integer. Debugging level 0 is the default; when run at
debugging level zero, aub produces no output unless it runs into serious
problems. Setting the debugging level to 1 will tell you about what aub is
doing. Setting the debugging level to 2 will tell you even more about what
aub is doing. Setting the debugging level to 3 or higher will show you
more than you ever wanted to know.
RECognize <extn> ...
The recognition code (the part of aub that identifies binaries)
maintains a list of common suffixes that it uses to recognize binaries
while it scans subject lines. For example, many binaries have names ending
in ".gif", so ".gif" is on aub's internal list of hints. The RECognize
keyword allows you to add suffixes to this internal list of hints.
Use this capability sparinging. You can really give aub a coronary
by saying something like 'rec a b c d e f g ...'. Doing something foolish
like that will cause your aub to lose the ability to assemble things that it
would otherwise have been able to.
The current list of common suffixes aub maintains is:
".gif", ".jpg", ".jpeg", ".gl", ".zip", ".au", ".zoo", ".exe", ".dl",
".snd", ".mpg", ".mpeg", ".tiff", ".lzh", ".wav"
NOXHDR
This keyword is meaningful only if your news access is NNTP-based.
It will cause aub to not use the XHDR command to access the subject lines
of news articles, even if the NNTP server you're using has XHDR capability.
If the same keyword appears multiple times, and the second
appearance is not a position sensitive override of some established default,
then aub ignores the second instance of the keyword.
7. How do I use aub?
After you've built your configuration file, just run 'aub'.
If this is the first time you've run aub since v1.1, you may
want to undefine any AUB-related environment variables you had set. These
variables are interpreted differently now. See section 8. You will not
need to remove your .aubrc file, but your .aubinit file is no longer useful
and you'll probably want to get rid of it once you've created .aubconf.
If this is the first time you've run any version of aub, ever, you
may want to use the '-c' command line option. Or you may not...see section 9.
8. Environment variables used by aub.
$AUBDIR Sets the default directory binaries are unpacked into.
Equivalent to specifying a DIRectory keyword before
any GROUP keywords. Will override any DIRectory
keyword appearing before any GROUP keyword, but not
those appearing after a GROUP keyword.
$AUBDESC Analogous to $AUBDIR
$AUBHOOK Analogous to $AUBDIR
$NNTPSERVER Specifies an NNTP server to use for news access if
no NNTP keyword appears in the configuration file.
If an NNTP keyword does appear, $NNTPSERVER is
ignored.
Note that $AUBGROUPS is no longer used as of version 2.0.3.
If aub doesn't seem to be doing what you'd expect it to do based
on your .aubconf file, it could be because your environment variables
are causing defaults you've established there to be ignored.
9. Command line options supported by aub:
-c 'Catch-up' mode; aub will bring its internal
pointers (and your .aubrc file) up to date, but will
not actually generate any binaries. This is useful
when you run aub for the first time; it keeps it
from generating megabytes and megabytes, as it scans
old news articles.
-n 'No-checkpoint' mode; prohibits aub from updating
its internal pointers (your .aubrc file). This option
is primarily useful only during debugging.
-dn 'Debug' mode; sets the debugging level to N. This
overrides the debugging level set in the configuration
file, except that 'aub -d0' does not work...this is a
bug.
-M Causes aub to print the long form of the documentation
(this document.)
-m Causes aub to print a summary of the documentation.
-C Lists significant changes since that last major
release of aub.
10. What do I do if I have problems installing or configuring aub?
See if you can figure out what the problem is. I've only set aub
up on my local system, so it's possible you could have problems I haven't
foreseen. If you really can't get it to work, try talking to a friend who
knows systems programming and administration type stuff. Offer your friend
food -- systems people especially like dim sum and Heineken.
You could also send me mail. Whether or not I answer your mail will
depend a lot on how busy I am. Sorry, but I have an obligation to get work
done promptly for my client, who's paying me for my time. I can't really deal
with supporting aub on the side for the entire net. Also, if your problem
has to do with peculiarities of your local site, there may not be a lot I
can do about it.
11. What else do I need to know?
In order to guarantee proper administration of the .aubrc file,
you can only run one instance of aub at a time. In this respect aub is
similar to most newsreaders.
The first time you run aub over a given group, if you choose not to
use the -c option, it may take a long time to run. This is because it's
looking at all of the articles in the group, and building lots of binaries.
After you run it for the first time, it only needs to look at new stuff in
the group. Things will go much faster after that.
If aub assembles two binaries with the same name, and wants to store
them in the same place, it will compare them to see whether or not they're
identical. If they are identical, it will discard the newer copy. If
they're not identical, it will append '+' characters as necessary to the
name of the second binary until the name is unique.
aub checkpoints its progress in the .aubrc file after processing
each group. This keeps it from having to start all over again if it dies
of a signal, expired CPU time limit, etc...
aub takes liberties with changing around the names of binaries
that it doesn't particularly like. It may rename binaries to be called
"Mangled" if people post things that are supposed to be unpacked to "." or
"..", or something equally obnoxious, for instance. It will drop the
leading "." off of binaries called ".something", and relativize pathnames
so that your binaries always wind up in the directories you want them in.
It's unfriendly to run aub so often that you occupy too much of your
news server's time.
It's pronounced "oww-buh", as in "S(au)di", not "awe-buh", as in
"sl(aw)".
This software is offered as-is, with no guarantees or promises made
by me whatsoever. I disclaim all responsibility for loss or damage caused
by the program.
Mark Stantz
stantz@sierra.stanford.edu
stantz@sgi.com
8/92
|