File: README

package info (click to toggle)
cookietool 2.5-6
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, buster, stretch
  • size: 328 kB
  • sloc: ansic: 1,179; makefile: 39
file content (429 lines) | stat: -rw-r--r-- 17,116 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429



                              CookieTool V2.5
                              ===============

       A team of programs to help you maintain your cookie database:

             "CookieTool" itself eliminates duplicate entries,
           sorts cookies alphabetically or by size if you wish.
       "CdbSplit" extracts parts of the database to a separate file,
    by keyword, by size, by number, or as groups of 'similar' cookies.
                "CdbDiff" compares two cookie databases and
      builds a list of those cookies only present in B, but not in A.



0. Who needs it?
----------------

These tools are intended for users of "Cookie", "IntuiCookie" (both
available on Aminet, util/misc/), or generally for any plain text cookie
database with entries separated by "%%" lines.  They are nice for crunching
your cookie collection by a few KByte of duplicate stuff, but also for
splitting it into seperate files, for example extracting quotations or
dumping those cookies too big to be fun to read.

Note that "CookieTool" and its companions know how to handle the database
itself, but not how to update the corresponding index file (also called
'hash file').  That means you still need "cookhash" (or whatever the
utility is called that came with your cookie display program).

With the introduction of the -f option (selectable file format at run
time), the cookie-tools have also become useful for managing arbitrary word
lists.


1. CookieTool command summary
-----------------------------

    cookietool [options] <cookiefile> [logfile]

The crunched cookie database will be WRITTEN BACK to the input file (quite
different from cookietool V1.x behaviour).  The deleted cookies will be
written to <logfile>, if one is specified.  (Thus one could restore the
original database by appending the logfile to the cookiefile again.)

options:      meaning:
    -c        case-sensitive comparisons (for both deleting and sorting)
    -d[0-3]   how fussy about word delimiters?
              -d3: strict, compare character by character
              -d2: ignore number and kind of spaces between words (DEFAULT)
              -d1: treat punctuation signs as spaces, too
              -d0: completely ignore punctuation signs and spaces
    -b        delete cookies that are "abbreviations" of another, too
    -p        passive, don't delete anything
    -s        sort output
    -sl        " , looking at the last line only  \  intended to
    -sw        " , looking at the last word only   }- sort quotations
    -s<sep>    " , starting at the last occurence /  by source
              of <sep>, e.g. '-s--' or '-s...'
    -ss       sort by size
    -o        overwrite the input file directly (no tempfile), risky!
              Use this *only* if your disk is so full that cookietool
              couldn't create its tempfile.
    -a        append, if <logfile> exists (instead of failing)
    -f[0-3]   input file format:
              -f3: cookies are separated by "%%" lines (DEFAULT)
              -f2: cookies are separated by "%" lines
              -f1: each line is a cookie
              -f0: each word is a cookie
    -F[0-3]   force output in a different file format

Please don't play with the -f/-F options "just to see what they do". They
can easily damage your database beyond repair.


2. CdbSplit command summary
---------------------------

    cdbsplit [options] <cookiefile> <hitfile>

By default, the input file will always be OVERWRITTEN by a reduced version
of the database, so that cookies are moved (not copied) to the hit file.
An existing hit file will never be overwritten, but may be appended to.

options:       meaning:
    -c         case-sensitive comparisons (for both keywords and groups)
    -d[0-3]    how fussy about word delimiters? (see details above)
    -k<keywd>  optional keyword  \_combine these to form simple
    -K<keywd>  mandatory keyword / boolean expressions
    -l<l_min>  accept only cookies with <l_min> lines or more
    -L<l_max>   "      "    "       "   <l_max> lines or less
    -w<w_min>  accept only cookies <w_min> chars wide or more
    -W<w_max>   "      "    "      <w_max> chars wide or less
    -n<n_min>  start at cookie number <n_min>
    -N<n_max>  stop after  "    "     <n_max>
    -m<n>      find groups of cookies starting with <n> matching characters
               (database must have been sorted for this to make sense!)
    -x         extract only, don't modify <cookiefile>
    -a         append, if <hitfile> exists (instead of failing)
    -f[0-3]    cookie file format (see details above)

Note that CdbSplit cannot *change* the file format, only CookieTool
itself has a -F option.


3. CdbDiff command summary
---------------------------

    cdbdiff [options] <dbase0> <dbase1> <resultfile>

The result file will receive those cookies that are only present in
<dbase1> but not in <dbase0>.  Neither of the input files will be modified.

options:       meaning:
    -c         case-sensitive comparisons
    -d[0-3]    how fussy about word delimiters? (see details above)
    -f[0-3]    cookie file format (see details above)
    -a         append, if <resultfile> exists (instead of failing)

From reading the above description, you may have noticed that CdbDiff only
does "half a diff" . To really find all differences between two files you
might type something like "cdbdiff file0 file1 diff01" and "cdbdiff file1
file0 diff10".

Another hint:  CdbDiff can also tell you which cookies two files have in
common.  To do so, execute "cdbdiff file0 file1 diff", then "cdbdiff file1
file0 diff -a" (which creates a list of all cookies which are only present
in one of the input files, but not in both), and finally "cdbdiff diff
file0 common" (or "cdbdiff diff file1 common", which would give the same
result).


4. Examples
-----------

These examples assume that your cookie database is in a single file called
"cookies" and that your favourite text editor is called "Ed".  And of
course I'd strongly suggest that you backup your files before trying any of
this.


4.1. Do what "onecookie" used to do
-----------------------------------

The classic "onecookie" could only delete verbatim copies of a cookie,
where even two spaces instead of one would make a difference.  CookieTool
can be told to behave like this, too:

  cookietool cookies -c -d3

The default settings are a bit more generous:

  cookietool cookies

might delete a few cookies more.  Upper- and lowercase letters are now
considered the same, and it doesn't matter if two words are seperated by
one or several spaces, by a tab sign, by a line break, etc.  So two copies
of the same text, but formatted in different ways, will still be recognized
as identical.

The question is:  Do you really want such copies deleted automatically, or
would you rather decide yourself which one of such *almost* identical
cookies should be deleted?  This question arises even more with the real
liberal settings like

  cookietool cookies -d0

which for example recognizes "Kill ugly radio.  -- Frank Zappa" and "Kill
ugly radio...  Frank Zappa" as identical.  (Both of these two styles of
supplying sources to quotations are frequently used.) More on that question
later.


4.2. Deleting abbreviations
---------------------------

It occurs rather frequently that one cookie seems to be an "abbreviation"
of another.  Sayings may consist of more than one sentence, but the first
sentence is sometimes quoted by itself.  And quotations are sometimes
written down with, sometimes without their author.  In both cases the
shorter cookie may be deleted, and cookietool can do that, too (-b).

However, one should not ignore puctuation signs with this option (don't use
-d1 or -d0), because that would consider "A penny saved is a penny." as an
abbreviation of "A penny saved is a penny earned.", which is not
desireable.  It might be a good idea to create a log file of the deleted
cookies and look at least at the shortest ones among them.

  cookietool -b cookies log ; extract to "log" rather than just deleting
  cookietool log -ss        ; sort the extracted cookies by size
  Ed log       ; check if some are worth keeping and delete the rest
  cdbsplit log cookies -a   ; put the survivors back

Using 'cdbsplit -a' without any search options is a nice way of moving
cookies back into your main database.  (Note that "Type log >>cookies",
"Delete log" would essentially do the same, but is risky:  If you
accidentally type '>' instead of '>>', that would overwrite your main
database instead of appending to it!  Such a thing can't happen with
cdbsplit -a.)


4.3. Move cookies between files
-------------------------------

Let's say you want to keep cookies which are quotations in a seperate
file.  That's easy, they should be recognized by the "--" which precedes
the source of the saying:

  cdbsplit cookies quotes -k--

Another example:  You might want to move all Bart Simpson quotes to a
separate "simpsons" file.  That's a little trickier, as "Bart" is a
rather short keyword, which might appear as part of other words as well.
Try three passes, cautious at first, then more generous to make sure you
get them all:

  cdbsplit cookies simpsons -KBart -KSimpson

This collects cookies with both "Bart" and "Simpson" in them (note the
capital -K!).  I can't imagine anything going wrong here.

  cdbsplit cookies simps2 "-kBart " -d1 -c

Note how the -d1 in this second command will make "Bart!" but not
"Barton" be identified as "Bart ".  But as this keyword fails if "Bart"
appears at the very end of a cookie, you still have to collect the rest:

  cdbsplit cookies simps3 -kBart

Now look at the "simps2" and "simps3" files and check if anything went
wrong with them.  In my case, I found a quotation by a guy named "Barth".
It's easy to put it back:

  cdbsplit simps3 cookies -kBarth -a

Finally, put the three hit files together:

  cdbsplit simps2 simpsons -a
  cdbsplit simps3 simpsons -a


4.4. Support for editing manually
---------------------------------

CdbSplit can help you collect all cookies that need reformatting (because
they are too wide) in an extra file, and put them back later:

  cdbsplit -w76 cookies wide
  Ed wide           ; add some line breaks
  cdbsplit wide cookies -a

Now this was easy.  But cdbsplit can even help you to find groups of
"similar" cookies!  That's helpful to eliminate cookies that differ only by
some typing error (e.g.  'seperate'/'separate'), something that cookietool
will *never* handle automatically.  To do this, you must sort your database
first, then tell cdbsplit how many agreeing characters make "similar"
cookies (I think 10 - 20 characters is usually a good choice):

  cookietool cookies -s -d0 -p
  cdbsplit cookies temp -d0 -m20
  Ed temp            ; delete some manually
  cdbsplit temp cookies -a

When editing the "temp" file, you should find groups of two or more cookies
with identical beginnings.  If you think they are really the same, you can
delete all but one (!) of each group.  Of course, this is tedious work, but
still far easier than just sorting the database and looking for similar
cookies with your eyes only!

Here's a more sophisticated procedure that will extract groups of cookies
starting and ending with the same word (well, almost):

  cookietool cookies -s -d1 -p     ; regular sorting first
  cookietool cookies -sw -d1 -p    ; *then* sort by last word
  cdbsplit cookies temp -d1 -m3    ; yes, 3 matching characters will do!
  Ed temp                          ; delete all but one from each group
  cdbsplit temp cookies -a         ; put the others back

Applying -s-- instead of -sw in the second pass could help you find
similar sayings that are attributed to the same person.


4.5. Joining "good" and "bad" cookie files
------------------------------------------

Suppose you have a well maintained cookie database, without double
entries, all the cookies are formatted the way you want them, and all the
authors of quotations are written down in your preferred style.  Now you
find an archive with new cookies somewhere and you want to add them to
your database.  You could simply join the files and let CookieTool remove
the duplicates, but I'd suggest a slightly more sophisticated procedure
using CdbDiff:

  cdbdiff cookies visitors diff -d0

Now you can look at the "diff" file (which will usually be much smaller
than "visitors") to see what new cookies you've got, edit and reformat
where needed, and then finally append the result to your main database.

  Ed diff
  cdbsplit diff cookies -a


4.6. Extract poems
------------------

Would you agree that a poem is something that has at least four lines,
but doesn't use the full line width?  So let's try this:

  cdbsplit -l4 -W60 cookies poems

You should check the contents of "poems" manually now, and maybe you will
want to move some of the wider cookies back.  Not a problem:

  cdbsplit poems cookies -w51 -a


4.7. File format options
------------------------

First of all, some things to avoid:
- Don't specify a -f option that doesn't match the actual format of your
  database.  This will usually do some serious damage, rather than just
  fail.  For example, running "cookietool -f1" on a database of "%%"-
  separated cookies will delete all but one of the "%%" lines!
- Think twice before changing file format with the -F option: Converting
  between format 2 and 3 is always safe, but converting to a "lower"
  format usually isn't.  For example, "cookietool -f3 -F1" more or less
  only removes the "%%" lines, it *does not* check for multi-line
  cookies, unwrapping them to a single line or whatever would seem
  reasonable.

Now let's assume you have a cookie database in "%%" format and want to
export all one-liners into a plain text file:

  cdbsplit -x -L1 cookies short
  cookietool short -f3 -F1

Or, for a weird reading experience, try the following, which removes all
duplicate words from this doc file:

  copy README temp
  cookietool temp -f0 -d0
  more temp
  delete temp

More seriously, CookieTool can be used to create a list of words from an
arbitrary text file. You only need to change the last example slightly:

  copy README words
  cookietool words -f0 -F1 -d0 -s


5. Background information
-------------------------

Just like "onecookie", "cookietool" has to load the complete database into
memory first, so it may fail on very large databases.  But unlike
"onecookie" does, the cookies aren't compared each against all others
(O(n*n) operation) but sorted first and then compared against their
neighbours only (O(n*log n) operation).  For a database of 1000 cookies,
that's about 100 times faster!

Overwriting input files is done by creating a tempfile and renaming it when
all else is done.  So breaking (or crashing) the programs won't lead to
data loss.  Unless, of course, you use cookietool in '-o' mode, which is
why that option is deprecated.

Note that breaking "cdbsplit" while it is appending to another file is no
good idea.  All cookies that were already copied are then present in both
files, and most likely the output file even ends with an incomplete
cookie!  The same can happen without your fault, if cdbsplit encounters a
"Disk Full" error.
In both cases, don't append any further data to this output file, or the
first of the new cookies will be merged with that incomplete cookie, due
to the missing %% separator!  You might run "cookietool" once on the
output file, to ensure a valid file format again.


6. History
----------

V1.0 -
V1.3  forget them, they were all crap, too hard to use

V2.0  no more reformatting of cookies, sorry for those who miss it :'(

V2.1  fixed a bug that would unnecessarily lose data after "Disk Full"
    errors

V2.2  added search for combinations of keywords and the -x option for
    CdbSplit, CookieTool can now sort by size

V2.3  changed licensing to GPL, minor bugfix in CdbSplit (the -w option
    was off by one from its designed behaviour), cookie separator can be
    redefined as something other that "%%" at compile time, added short
    manpages for cookietool and cdbsplit

V2.4  fixed some bugs (the -b option in CookieTool and the -d option in
    CdbSplit were broken, and some kinds of buffer overflow weren't
    properly intercepted), the cookie file format can now be set at run
    time, CookieTool can now append to its dumpfile.

V2.5  added the CdbDiff tool


7. Credits
----------

CookieTool, CdbSplit and CdbDiff are distributed under GNU General Public
License version 2 or later.  The author is:

    Wilhelm Nker  <wnoeker@t-online.de>
    Hertastr. 8, D-44388 Dortmund

Drop me an e-mail if you like these programs, or if you have some
suggestions.

The man pages, the Linux makefile and the "GPL paperwork" for V2.3 were
done by Miroslaw 'Jubal' Baran <baran@knm.org.pl>.

Greetings to Christian Kemp (author of IntuiCookie and of the Amiga port
of SmartAss, and, last not least: maintainer of the great Amiga Network
News web pages at www.ann.lu).

CookieTool, CdbSplit and CdbDiff were written using CygnusEd V4.2 and the
GNU C compiler (with libnix).