1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353
|
README for makeztxt version 1.60
by John Gruenenfelder (johng@as.arizona.edu)
Tuesday, August 12, 2003
The Weasel Reader website is here:
http://gutenpalm.sf.net
Contents
--------
1. What is makeztxt?
2. Features
3. Using makeztxt (creating)
4. Using makeztxt (deconstructing)
5. List of command line options
5.1. Options used when creating a zTXT database
5.2. Options used when deconstructing a zTXT database
6. Compiling makeztxt
7. Miscellaneous Notes
1. What is makeztxt?
------------------------------------------------------------
makeztxt is a simple commandline program that takes a plain ASCII text file
and compresses it into a zTXT database. makeztxt will remove newline
characters at the end of lines that contain text so that the paragraphs flow
better on the Palm screen. makeztxt supports the use of regular expressions
to automatically generate a list of bookmarks for you. Lastly, makeztxt can
also break an existing zTXT file into it's components (text, bookmarks,
annotations) and store them into separate files for you.
Please note that as a commandline program, makeztxt is intended for more
advanced users. There are several very good conversion programs available
that have easy to use GUI interfaces. If you are not experienced using the
DOS/UNIX commandline environment, you may wish to use one of those instead.
You can find links to all the conversion programs at the Weasel website:
http://gutenpalm.sf.net
2. Features
------------------------------------------------------------
* Creates zTXT databases
* Deconstructs zTXT databases into component pieces
* Use regular expression to automatically generate bookmarks
* Includes libztxt, a small C library, so you can easily add zTXT creation
or disection to your program
* Create zTXTs that allow on-demand decompression, or get 10% - 15% more
compression with the original style zTXT.
* Read regular expressions from a config file (.makeztxtrc)
3. Using makeztxt (creating)
------------------------------------------------------------
Running 'makeztxt --help' will print out the list of command line options and
what their functions are.
The best feature of makeztxt is its ability to use regular expressions to
search the input text for bookmark spots. This is done with the command line
options -l and -r.
-l will list all the bookmarks that are generated.
-r takes a regex as an argument to generate one or more bookmarks.
You can have as many -r options as you want.
A full listing of all of the options to makeztxt can be found in Section 5.
You can also put a list of regular expressions, one per line, in a file called
".makeztxtrc". This file goes in your home directory, or in the current
directory (if you have no home directory). A sample .makeztxtrc is included
with the distribution. You can also explicitly specify which file to read
regex from by using the -R option.
makeztxt can add a list of pre-generated bookmarks given in a file with the -m
option. Care should be taken to make sure that the bookmark offsets you
specify are valid in the converted text since makeztxt will, by default,
reformat the input text to better flow on a Palm screen (removing many line
feeds).
For annotations, makeztxt can also add pre-generated annotations given in a
file with the -A option. See Section 5 for information on how this file must
be formatted.
In addition, you can use a 2 part regular expression, like (regexp1)(regexp2),
and it will match on the entire line, but the bookmark display will only be
the regexp2 part.
eg.
makeztxt -l -r (Subject:)(.*) file.txt
Where file.txt contains a number of emails, or news articles will generate
bookmarks with the subject of the article, but without the word Subject:.
The following examples show the name of the work, the command line used, and
the first eight bookmarks generated by the command line:
Shakespeare's "King Henry V"
------------------------------------------------------------
>makeztxt -l -t "King Henry V" -r "DRAMATIS PERSONAE" -r "ACT [A-Z]+" \
-r "SCENE [A-Z]+" 2ws2310.txt
Generated bookmarks
Offset Title
----------- --------------------
12097 DRAMATIS PERSONAE
14841 ACT FIRST
14853 SCENE I
19241 SCENE II
33233 ACT II
35118 SCENE I
40805 SCENE II
49553 SCENE III
RL Stevenson's "Treasure Island"
------------------------------------------------------------
>makeztxt -l -t "Treasure Island" -r "PART [A-Z]+" -r " [0-9]+" \
treas10.txt
Generated bookmarks
Offset Title
----------- --------------------
12005 PART ONE
12422 PART TWO
12836 PART THREE
13087 PART FOUR
13685 PART FIVE
14102 PART SIX
14656 PART ONE
14723 1
Charles Darwin's "On the Origin of Species"
------------------------------------------------------------
>makeztxt -l -t "On the Origin of Species" -r "Introduction\." \
-r "Chapter [IVX]+" otoos10.txt
Generated bookmarks
Offset Title
----------- --------------------
19482 Introduction.
29724 Chapter I
99693 Chapter II
129257 Chapter III
165118 Chapter IV
259640 Chapter V
332498 Chapter VI
399182 Chapter VII
4. Using makeztxt (deconstructing)
------------------------------------------------------------
Running 'makeztxt -d --help' will print out commandline usage for disecting
zTXT files. This mode is much simpler than that of creating zTXT database, so
it should be much easier to use. Simply give makeztxt a zTXT PDB file
(filename.pdb) and it will output the uncompressed text data into another file
(filename.txt). The exact output filename can be specified with the -o
option.
makeztxt can also extract the bookmark list and the annotations from the zTXT
file and output them. To output a bookmark list, give an output filename with
the -m option. Similarly, to output a file with the zTXT's annotations, give
an output filename with the -A option.
That's all there is to it.
5. List of command line options
------------------------------------------------------------
5.1. Options used when creating a zTXT database:
-----------------------------------------------
-A/--annofile filename -- Give makeztxt a file containing annotations that
will be added into the generated zTXT database. This file must follow a
particular format to be understood by makeztxt. Each annotation is of the
format:
1) An annotation begins with a title line:
Title: My Annotation
where the text after the colon is the annotation's title with a
maximum of 20 characters.
2) The next line is the location in the text of the annotation anchor:
Offset: 12345
where the offset value is an absolute character position in the
*reformatted* text file.
3) The actual annotation text:
Annotation: This is the text of my annotation.
The annotation text will continue after a *single* "Annotation:" line
until one of the following conditions is met: a) the file ends, b)
another annotation is started with a "Title:" line, or c) the
annotation reaches the maximum size of 4096 characters.
-a/--adjust int -- Control the method of text formatting. Valid types are
0, 1, or 2. Method 0 will compute the average line length through the
entire file and strip newline characters from any line longer than the
average. Method 1 will strip the newline from any line with text in
it. Method 2 will leave the text unchanged. The default is 0.
-b/--length int -- If adjust method 0 is used, the value given with this
option is the length a line must be to have its newline stripped. Using
this option will override the value calculated by makeztxt.
-h/--help -- Display command line options and usage information.
-l/--list -- Display a list of all bookmarks generated by makeztxt or
specified by the user. This is useful if you want to make sure your
regular expressions are generating correct bookmarks.
-L/--launchable -- Sets the "launchable" attribute in the generated zTXT
database. The Launcher apps on a Palm device can use this attribute and
will display all zTXT documents in the main program listing allowing you
to launch Weasel and open a specific document by tapping on the document
directly. Default is OFF.
-m/--markfile filename -- Give makeztxt a file containing a pre-generated
list of bookmarks to add to the generated zTXT database. The bookmark
file has a very simple format. Each line begins with an integer offset
for the bookmark anchor. Following that are one or more spaces/tabs.
Finally is the bookmark title which occupies the remainder of the line up
to a maximum of 20 characters. A line might look like:
23955 Chapter VII
-n/--nobackup -- Instructs makeztxt to not set the backup attribute in the
generated zTXT database. This attribute, if set, will cause the database
to be backed up during the next HotSync operation. Default is to set this
attribute.
-o/--output filename -- Explicitly give the output filename which makeztxt
should use. If this filename is not given, makeztxt will generate an
output filename by removing the extension of the input file and replacing
it with "pdb". If makeztxt is reading input from standard input this
option is mandatory.
-R/--regexfile filename -- makeztxt will attempt to read a default set of
regular expressions from the file .makeztxtrc in the user's home directory
or from /etc/makeztxt.conf if that fails. This option can be used to tell
makeztxt which file to read the list of regex from. Useful for user's on
systems with no home directories.
-r/--regex string -- Supply makeztxt with a regular expression for bookmark
generation. string is a valid regex. This option can be given multiple
times on the command line, each one adding a new regex.
-t/--title string -- Specify the title of the generated zTXT database. The
database title is stored within the database and is the name which will
appear under Palm OS. The title is limited to 32 characters. If makeztxt
is reading input from standard input this options is mandatory.
-V/--version -- Cause makeztxt to print out version information and exit.
-z/--compression int -- Set the method of compression to be used. makeztxt
supports to methods of compression. Method 1 allows for random access
with a zTXT document and is the standard method. Method 2 gives 10-15%
higher compression but requires that the entire document be decompressed
before it can be read by the user. Default is method 1.
5.2. Options used when deconstructing a zTXT database:
-----------------------------------------------------
-d/--deconstruct -- This option tells makeztxt that you wish to deconstruct
a zTXT database. It is required for this mode of operation.
-A/--annofile filename -- Specify the filename into which makeztxt will
store any annotations extracted from the input zTXT file. If this
option is not given, annotations will not be extracted.
-h/--help -- Display command line options and usage information.
-m/--markfile filename -- Specify the filename into which makeztxt will
store any bookmarks extracted from the input zTXT file. If this option is
not given, bookmarks will not be extracted.
-o/--output filename -- Specify the output file makeztxt will store the
extracted text data. If this option is not given, makeztxt will generate
a default filename by removing the extension from the input file name and
replacing it with "txt". If makeztxt is reading input from standard input
this option is mandatory.
-V/--version -- Cause makeztxt to print out version information and exit.
6. Compiling makeztxt (for great profit!)
------------------------------------------------------------
makeztxt uses zLib v1.1.3 (http://www.info-zip.org/pub/infozip/zlib).
You will need to have zLib compiled for your HOST machine. All Linux
distributions as well as most other Unices come with zLib, though it is
possible you may be lacking the zLib header files.
You should look in the Makefile to make sure the program names and paths are
okay.
If you are running on Sun hardware, uncomment the PACK line in the Makefile.
makeztxt will not work without this. If you are getting mysterious crashes,
you might want to try this switch as well, however, if you are on an x86
system, you should not enable that flag.
If your system does not have GNU regex (Solaris, Cygwin, others) then
uncomment the USEPOSIX line to cause makeztxt to use POSIX regex.
If you are compiling on a Windows system, or any system which makes a
distinction between text and binary files, you'll need to uncomment out the
HAVEBINARYFLAG line in order to get valid output from makeztxt.
Lastly, if you are compiling with the Cygwin toolset, you need an extra
library for regex to function. Uncomment the CYGWINLIBS line to enable this.
Now run:
"make"
You should now have makeztxt.
If you're messing with the source, then maybe you want to help. If you have
any problems, feel free to email me at johng@as.arizona.edu . Please use, if
possible, the latest code from the CVS repository. It can be found at:
http://sourceforge.net/projects/gutenpalm
If you would like to submit a bug report or a feature request, please make use
of the facilities on Weasel's SourceForge project page. This allows for much
easier management of bug and feature request tracking. It also ensures that
your report is not forgotten about. The project page is at:
http://sf.net/projects/gutenpalm
7. Miscellaneous Notes
------------------------------------------------------------
** The standard "it runs fine for me" disclaimer applies. I've tested it a
lot myself, but you can never predict everything. Still, there's no
oddball hacking involved so I think the chance of catastophic Palm
explosion should be small indeed. This is not to say that it won't ever
crash/hang, but if it does...
|