File: Introducing_AUB

package info (click to toggle)
aub 2.0.5-2
  • links: PTS
  • area: main
  • in suites: slink
  • size: 264 kB
  • ctags: 38
  • sloc: perl: 1,424; makefile: 31
file content (586 lines) | stat: -rw-r--r-- 25,179 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586


			The Introducing AUB Document


	1.	What is aub?

	More and more people are posting binary files to usenet these days.
Some of these binaries are executables and audio data; a majority seem to
be pictures of various things, typically landscapes, movie stars and naked
people.  Because of limitations in the type data that usenet can accommodate, 
binaries must be encoded into text, and because binary files are commonly very 
large relative to text files usenet was designed to handle, they frequently 
must be broken up into pieces.  Programs have been developed which take a 
given binary, encode it, and automatically post it in pieces with descriptive 
subject lines.

	When this data arrives at a remote site, users see subject lines
that look something like this:

		12011 roadkill03.gif, part 1/4
		12012 roadkill03.gif, part 3/4
		12013 More pictures of tatooed children, please...
		12014 Re: roadkill02.gif -- I love the way the eyes bulge out
		12015 roadkill03.gif, part 4/4
		12016 roseanne_nude.jpg, part 02 of 02
	   	12017 Only BINARIES should be posted here, GOD DAMMIT	
		12018 roadkill03.gif, part 2/4
		12019 HI, I'M BIFF!!!!  THESE PIX ARE WAY COOL!!!!
		12020 roseanne_nude.jpg, part 01 of 02

	While the process of encoding and splitting up binaries for posting 
to usenet is relatively straightforward, the process of retrieving, sorting,
and decoding the pieces (which do not necessarily arrive in order) at 
receiving sites is less straightforward, tedious, time consuming, and very
prone to human error.  

	aub, which stands for "assemble usenet binaries", automates this 
reassembly process for you.  aub is intended for use in newsgroups to which 
binaries are posted exclusively.  When run, it accesses news articles via
either a disk-based news spool directory, or via an NNTP news server, 
determines whether or not any new binaries have appeared in selected 
newsgroups since the last time it was run, and if so, retrieves, organizes 
and decodes them, depositing them in a configurable location.  This process 
requires no human intervention once aub has been configured.  aub also keeps 
track of binaries which it has seen some, but not all, of the pieces of.  It 
remembers how to find these old pieces, so that when new, previously missing 
pieces arrive at your site, it will build the entire binary the next time it 
is run.  It also remembers which binaries it has already seen all of the 
pieces of already, so that it does not waste time rebuilding the same binaries 
over and over again.

	aub was created as a time saver; too many people at too many sites 
were spending way too much time manually unpacking binary files.  Its ability 
to identify and assemble binary images depends on people posting images with
subject lines that observe (loosely) established conventions.  aub's 
recognition capabilities have been significantly improved since the earliest 
release.


	2.	How does aub work?

	aub looks for subject lines containing strings like:

		N of N
		N / N
		N  N
		N | N

	where N is any number composed of one or more digits, and white
space is optional.  Once it sees such a line, it tries to figure out a
name for the binary by looking at the rest of the subject line.  These names 
are relevant only to aub's internal functioning; when unpacked, binaries are 
named according to the information they were encoded with.  However, it's 
important that, whatever internal name aub decides on for the binary, that 
name be recognizable in the subject lines of all pieces.

	aub ignores all news articles with null subject lines and subject
lines that begin with "Re:" regardless of other content.

	aub uses two files which are maintained in each user's home directory.
One is $HOME/.aubconf, which is a configuration file that allows you to 
customize aub's behavior.  See section 5 for a detailed explanation of the
structure of configuration files.  The other file is $HOME/.aubrc.  You
should never need to modify this file; aub creates it and maintains it.  It's
used to keep track of what articles in which groups aub has resolved 
already, and what articles aub believes to be pieces of binaries that it 
hasn't seen all of the pieces of yet.  


    	3.	What do I need on my system to run aub?

	You will need Larry Wall's perl interpreter.  Older versions of aub
also required David Mack's uumerge program; this functionality has since been
folded into aub for the sake of speed.  perl is available via anonymous FTP 
from uunet.uu.net, tut.cis.ohio-state.edu, and jpl-decvax.jpl.nasa.gov.  

	Your machine must also have access to news, either via the NNTP
NNTP protocol, or by being able to open raw news files on a disk somewhere.  
Previous versions of aub required that your news access be NNTP-based; this 
restriction has since been lifted.


	4.	How do I install aub?

	There's really only one thing that you might need to configure.
aub is a perl script.  The first line of the program looks like this:

		#!/usr/local/bin/perl

	This appears to tell your shell where to find the perl interpreter.  
If the path of perl on your system is something else, you'll need to change 
this line, or create a link called /usr/local/bin/perl which points to where
your perl executable actually resides.

	If you need to change this, you'll probably see a message like:
'aub: Bad address.' when you try to run aub.


	5.	How do I configure aub?

	Older versions of aub made use of a configuration file which was
normally called $HOME/.aubinit.  But few interesting customizations could 
be accomplished with .aubinit files, because the configuration language
was so primitive.  The configuration language has been redesigned to allow
much greater flexibility.  Old .aubinit files will no longer work, or be
recognized by aub (except inasmuch as aub will notice them and point out
to you that you need to create a new configuration file if you don't already
have one.)  The new configuration file for aub should be called $HOME/.aubconf.

	Configuration files are line-oriented; each line is processed 
separately.  If any line contains the '#' character, aub concludes that 
the character begins a comment, and discards the comment character and 
everything one the line that follows it.  If for some reason you need to
put a '#' character in your configuration file and do not want it to be 
interpreted as beginning a comment, you'll have to escape it by preceding it 
with a backslash character, e.g. '\#'.

	Each non-blank line in a configuration file must begin with a 
keyword recognized by aub.  The case of keywords is not significant.
As far as aub is concerned, "keyword", "KEYWORD", "Keyword" and "KeYWorD"
all mean the same thing.  Some keywords require arguments; some require no 
arguments appear, and some permit varialbe numbers of arguments.  If aub 
sees keywords it doesn't understand in your .aubconf file, it will complain 
to you about them.

	One of the keywords aub understands is the GROUP keyword.  It's
used to tell aub that you want to decode binaries from the newsgroup(s)
which appear as argument(s) to the keyword.  For example:

		GROUP alt.binaries.pictures.misc
		GROUP alt.binaries.pictures.misc alt.binaries.pictures.fractals

	Every configuration file must contain at least one GROUP keyword to
be correct.  

	In general, aub understands two types of keywords.  One type is 
called 'position insensitive', which means that the keyword will have the
same effect no matter where in the configuration file it appears.  The
other type is called 'position sensitive', which means that the keyword 
means something different when it appears before any GROUP keywords than
it does when it appears after any given GROUP keyword.

	One such position sensitive keyword is the DIRectory keyword.
This keyword is used to tell aub what directory to put binaries it decodes
in.  ("DIRectory" is spelled the way it is because only the 'DIR' part needs 
to appear in a configuration file for aub to recognize it.  In fact, aub will 
interpret any keyword beginning with the letters 'DIR' as being an instance
of the DIRectory keyword.)

	When a position sensitive keyword appears _before_ any GROUP keyword,
the keyword is interpreted as being the default for all groups that appear
later.

	When a position sensitive keyword appears _after_ any GROUP keyword,
it is interpreting as applying *only* to that group, overriding any previous
default which may have been established via use of the same keyword, or
by the value of environment variables (see section 8.)

	Position sensitive keywords appearing after a GROUP keyword which
lists multiple groups are applied only to the last group listed, not to 
all groups appearing on the group line.

	For example, the following three configuration files are equivalent:

	# Sample .aubconf file no. 1 -- basic example
	# 
	dir /tmp/aub					# Default directory
	group alt.binaries.pictures.misc		# Process these
	group alt.binaries.pictures.fractals		#  two groups

        # Sample .aubconf file no. 2 -- multiple group usage, mixed case
        #
        DiR /tmp/aub                                    # Default directory
        gRoUp alt.binaries.pictures.misc alt.binaries.pictures.fractals

        # Sample .aubconf file no. 3 -- does not use defaults
        #
        group alt.binaries.pictures.misc
        directory /tmp/aub                            
        group alt.binaries.pictures.fractals
        direct-to /tmp/aub                           	# 'dir' is all you need

	The following three configuration files are also equivalent, though
not equivalent to the previous three:

        # Sample .aubconf file no. 4 -- explicit placement of binaries
        #
        group alt.binaries.pictures.misc
        dir /tmp/aub/misc
        group alt.binaries.pictures.fractals
	dir /tmp/aub/fractals

        # Sample .aubconf file no. 5 -- explicit and default placement 
        #
        dir /tmp/aub/misc   				# Default directory
        group alt.binaries.pictures.misc		# Use default directory
        group alt.binaries.pictures.fractals
	dir /tmp/aub/fractals				# Override default

        # Sample .aubconf file no. 6 -- explicit and default placement revisited
        #
        dir /tmp/aub/fractals 				# Default directory
        group alt.binaries.pictures.misc
	dir /tmp/aub/fractals				# Override default
        group alt.binaries.pictures.fractals		# Use default directory

	The configuration file:

	# Sample .aubconf file no. 7 -- invalid
	#
	group alt.binaries.pictures.misc
	dir /tmp/aub
	group alt.binaries.pictures.fractals		# No good

	is invalid, because no directory for aub to place binaries decoded
from the newsgroup alt.binaries.pictures.fractals is specified.  The 
DIRectory keyword is unique in this regard; there must be some use of the
keyword that enables aub to figure out where to put binaries for every 
group specified, or it will refuse to run.  The easiest way to deal with 
this is to always establish a default directory by using the DIRectory
keyword somewhere before any groups appear.  


	Other position sensitive keywords are available.  


		DESCription <file>

	This keyword causes aub to extract text from what it thinks is the 
text portion of posted articles, and append it to the file you specify.  This
is useful if you're interested in reading the text that describes what all
the binaries aub is unpacking are about.  A maximum of 60 lines per binary
extracted will be put into the file you indicate.  Each description is
prepended with the name of the decoded binary it refers to, and the group
that binary was decoded from.


		HOOK <program>

	This keyword enables you to select which binaries aub decodes
using your own software.  If the HOOK keyword is specified, aub will 
invoke the argument program and supply it with subject line of the first
piece of a binary that it can potentially decode via standard input.  If the 
program returns true (zero), aub will decode the binary.  If the program 
returns false (non-zero), aub will skip decoding the binary, and continue 
processing.

	It is not (yet) possible to specify arguments to the user program.

	For example, the following sample program returns true if standard
input contains the string ".gif" (case insignificant), and false otherwise.

	#!/usr/local/bin/perl
	#
	# /tmp/sample_aub_hook: a simple, sample hook program
	#

	$sl = <STDIN>;                  # Get standard input
	exit(0) if ($sl =~ m/.gif/i);   # Contains ".gif"
	exit(1);			# Didn't see ".gif"

	Suppose this program were attached to aub via the configuration line:

		hook /tmp/sample_aub_hook

	Then aub would only decode binaries containing the string '.gif'.

	You can write hook programs in any language you choose.  


		POSTprocess <postprocessor> <extn> ...

	This keyword enables you to postprocess binaries whose names end
in the string <extn> (you can list any number of these suffixes on a single
line in the configuration file.)  Case is not significant in <extn>.  Before
a POSTprocess keyword can appear, <postprocessor> must first be defined 
using the DEFine keyword, which is position insensitive.  The format of
the DEFine keyword is

		DEFine	<postprocessor> <unix cmd>

	<postprocessor> may be any string.  It's recommended that you
stick to alphanumerics.

	<unix cmd> is any UNIX command, with arguments.  Simple substitutions
are performed on <unix cmd> before it's executed in conjunction with the
existenece of a POSTprocess keyword and the appearance of a binary whose
filename ends in one of the <extn> suffixes listed as arguments to the 
POSTprocess keyword.  This all makes perfect sense but is a little difficult
to explain.  The following example should make things much clearer.

	Consider the following configuration file:

	# Sample aub configuration file demonstrating use of a postprocessor
	#
	dir /tmp/aubdir
	define jpg2gif djpeg -G $f > $h_.gif
	postprocess jpg2gif .jpg .jpeg
	group alt.binaries.pictures.misc

	The first line tells aub that it should decode binaries into the
directory /tmp/aubdir.  The second line defines a postprocessor for aub.  
The name of the postprocessor is specified as "jpg2gif".  The third line 
says that the postprocessor will be invoked whenever a binary with a name 
ending in '.jpg' or '.jpeg' is decoded.  The fourth line specifies the 
group that binaries are to be decoded from.

	Suppose the binary full_moon.jpeg is decoded from 
alt.binaries.pictures.misc.  The binary name "full_moon.jpeg" can be 
thought of as consisting of three parts; the head part -- everything before
the last '.' character --  the '.' character itself, and the tail part --
everything after the last '.' character.  aub uses the abbreviations 
'$h', '$t', and '$f' to refer to the head part, tail part, and entire
filename, respectively.  (If no '.' character appears in the name of a 
decoded binary, $h equals $f, the entire name of the binary, and $t is 
empty.) 

	Because the binary name "full_moon.jpeg" ends in ".jpeg", one of the
arguments specified on line two of the sample configuration file, aub 
invokes the postprocessor "jpg2gif".  aub substitutes the appropriate 
values for '$f' and '$h', in this case, "full_moon.jpeg" and "full_moon"
into the postprocessor definition, and executes the resulting UNIX command,
which in this case is 'djpeg -G full_moon.jpeg > full_moon_.gif'  Assuming 
that you have the djpeg program on your machine (this software is available 
via anonymous FTP from ftp.uu.net under the graphics/jpeg directory), this 
command will cause the .jpeg file to be automatically converted into a 
similarly named .gif file when it is decoded.

	A few more examples, again, based on the configuration file above

   Filename of decoded binary        $h		$t		$f
------------------------------------------------------------------------------
	crescent_moon.jpg	crescent_moon	jpg	crescent_moon.jpg
	big.dog.gif		big.dog		gif	big.dog.gif

   Filename of decoded binary	Postprocessed         Reason
------------------------------------------------------------------------------
	crescent_moon.jpg	   yes       $f ends in '.jpg'
	big.dog.gif		   no	     $f doesn't end in '.jpg' or in
					      '.jpeg'

    Filename of decoded binary	UNIX command executed
------------------------------------------------------------------------------
	crescent_moon.jpg	djpeg -G crescent_moon.jpg > crescent_moon_.gif
	big.dog.gif		(none executed)


	We could have easily have written:

		define jpg2gif djpeg -G $f > $h_.gif ; rm -f $f 

	to cause aub to remove the old .jpeg version of the binary after
converting it to .gif format.

	I've added the extra underscore character in this example to 
decrease the chance that djpeg, when it runs, will clobber another 
binary which aub already unpacked with the name "full_moon.gif" or
"cresecent_moon.gif". 

	Postprocessor definitions that can't be executed for some reason
may cause you (and aub) some problems at run time.  


	The following keywords are, like DEFine, position independent:


		NNTP <server>

	This tells aub that your news access is NNTP-based, and that it
should use the specified host as an NNTP server. 


		SPOOL <directory>

	This tells aub that your news access is based on access to raw news
files, and that <directory> is the root of the news spool tree. 

	A single configuration file may not contain both the NNTP and SPOOL
keywords.

	If neither the NNTP keyword nor the SPOOL keyword appear in your
configuration file, aub will assume your news access is via NNTP and use
your NNTPSERVER environment variable, if it is defined, to decide what 
server to connect to.  If your NNTPSERVER environment variable is not
defined, aub will try to figure out where you normally read news from.
If it can't do that, it will ask you to supply the information.

	If you ever change the mechanism by which you access news, or the
server you read news on, you'll need to remove the .aubrc file that aub
maintains to keep track of what groups you have and have not read.  Otherwise,
because articles are numbered differently on different servers, aub will get
hopelessly confused.  (It's possible, though not recommended, to switch
seamlessly back and forth between NNTP and SPOOL access to news on the 
same host.)  This is probably the only time you'll ever want to tamper with
a .aubrc file.


		DEBUG <n>

	Sets the default debugging level aub runs at to N.  N must be a 
non-negative integer.  Debugging level 0 is the default; when run at 
debugging level zero, aub produces no output unless it runs into serious
problems.  Setting the debugging level to 1 will tell you about what aub is
doing.  Setting the debugging level to 2 will tell you even more about what
aub is doing.  Setting the debugging level to 3 or higher will show you 
more than you ever wanted to know.


		RECognize <extn> ...

	The recognition code (the part of aub that identifies binaries) 
maintains a list of common suffixes that it uses to recognize binaries 
while it scans subject lines.  For example, many binaries have names ending 
in ".gif", so ".gif" is on aub's internal list of hints.  The RECognize
keyword allows you to add suffixes to this internal list of hints.

	Use this capability sparinging.  You can really give aub a coronary 
by saying something like 'rec a b c d e f g ...'.  Doing something foolish 
like that will cause your aub to lose the ability to assemble things that it 
would otherwise have been able to.  

	The current list of common suffixes aub maintains is:

	".gif", ".jpg", ".jpeg", ".gl", ".zip", ".au", ".zoo", ".exe", ".dl", 
	".snd", ".mpg", ".mpeg", ".tiff", ".lzh", ".wav"


		NOXHDR

	This keyword is meaningful only if your news access is NNTP-based.
It will cause aub to not use the XHDR command to access the subject lines
of news articles, even if the NNTP server you're using has XHDR capability.  


	If the same keyword appears multiple times, and the second 
appearance is not a position sensitive override of some established default,
then aub ignores the second instance of the keyword.


	7.	How do I use aub?

	After you've built your configuration file, just run 'aub'.  

	If this is the first time you've run aub since v1.1, you may 
want to undefine any AUB-related environment variables you had set.  These
variables are interpreted differently now.  See section 8.  You will not
need to remove your .aubrc file, but your .aubinit file is no longer useful
and you'll probably want to get rid of it once you've created .aubconf.

	If this is the first time you've run any version of aub, ever, you 
may want to use the '-c' command line option.  Or you may not...see section 9.


	8.	Environment variables used by aub.

	$AUBDIR		Sets the default directory binaries are unpacked into.
			Equivalent to specifying a DIRectory keyword before 
			any GROUP keywords.  Will override any DIRectory 
			keyword appearing before any GROUP keyword, but not 
			those appearing after a GROUP keyword.

	$AUBDESC	Analogous to $AUBDIR

	$AUBHOOK	Analogous to $AUBDIR

	$NNTPSERVER	Specifies an NNTP server to use for news access if
			no NNTP keyword appears in the configuration file.
			If an NNTP keyword does appear, $NNTPSERVER is 
			ignored.

	Note that $AUBGROUPS is no longer used as of version 2.0.3.

	If aub doesn't seem to be doing what you'd expect it to do based
on your .aubconf file, it could be because your environment variables
are causing defaults you've established there to be ignored.


	9.	Command line options supported by aub:

	-c		'Catch-up' mode; aub will bring its internal 
			pointers (and your .aubrc file) up to date, but will 
			not actually generate any binaries.  This is useful 
			when you run aub for the first time; it keeps it 
			from generating megabytes and megabytes, as it scans 
			old news articles.

	-n		'No-checkpoint' mode; prohibits aub from updating
			its internal pointers (your .aubrc file).  This option
			is primarily useful only during debugging.

	-dn		'Debug' mode; sets the debugging level to N.  This
			overrides the debugging level set in the configuration
			file, except that 'aub -d0' does not work...this is a 
			bug.

	-M		Causes aub to print the long form of the documentation
			(this document.) 

	-m		Causes aub to print a summary of the documentation.

	-C		Lists significant changes since that last major 
			release of aub.


	10.	What do I do if I have problems installing or configuring aub?

	See if you can figure out what the problem is.  I've only set aub
up on my local system, so it's possible you could have problems I haven't
foreseen.  If you really can't get it to work, try talking to a friend who
knows systems programming and administration type stuff.  Offer your friend
food -- systems people especially like dim sum and Heineken.

	You could also send me mail.  Whether or not I answer your mail will 
depend a lot on how busy I am.  Sorry, but I have an obligation to get work 
done promptly for my client, who's paying me for my time.  I can't really deal 
with supporting aub on the side for the entire net.  Also, if your problem
has to do with peculiarities of your local site, there may not be a lot I 
can do about it.


	11.	What else do I need to know?

	In order to guarantee proper administration of the .aubrc file,
you can only run one instance of aub at a time.  In this respect aub is
similar to most newsreaders.

	The first time you run aub over a given group, if you choose not to
use the -c option, it may take a long time to run.  This is because it's 
looking at all of the articles in the group, and building lots of binaries.  
After you run it for the first time, it only needs to look at new stuff in 
the group.  Things will go much faster after that.  

	If aub assembles two binaries with the same name, and wants to store
them in the same place, it will compare them to see whether or not they're 
identical.  If they are identical, it will discard the newer copy.  If 
they're not identical, it will append '+' characters as necessary to the 
name of the second binary until the name is unique.

	aub checkpoints its progress in the .aubrc file after processing
each group.  This keeps it from having to start all over again if it dies
of a signal, expired CPU time limit, etc...

	aub takes liberties with changing around the names of binaries 
that it doesn't particularly like.  It may rename binaries to be called
"Mangled" if people post things that are supposed to be unpacked to "." or 
"..", or something equally obnoxious, for instance.  It will drop the 
leading "." off of binaries called ".something", and relativize pathnames
so that your binaries always wind up in the directories you want them in.

	It's unfriendly to run aub so often that you occupy too much of your
news server's time.

	It's pronounced "oww-buh", as in "S(au)di", not "awe-buh", as in 
"sl(aw)".

	This software is offered as-is, with no guarantees or promises made 
by me whatsoever.  I disclaim all responsibility for loss or damage caused
by the program.


						Mark Stantz
						stantz@sierra.stanford.edu
						stantz@sgi.com
						8/92