File: good_sample.html

package info (click to toggle)
txt2html 2.44-4
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 668 kB
  • ctags: 141
  • sloc: perl: 3,556; makefile: 49
file content (325 lines) | stat: -rw-r--r-- 12,098 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>txt2html/HTML::TextToHTML Sample Conversion</TITLE>
<META NAME="generator" CONTENT="HTML::TextToHTML v2.44">
</HEAD>
<BODY>
<P><A HREF="http://txt2html.sourceforge.net/">txt2html</A>/<A HREF="http://www.katspace.com/tools/text_to_html/">HTML::TextToHTML</A> Sample Conversion

<P>This sample is based hugely on the original sample.txt produced
by <A HREF="http://www.aigeek.com/">Seth Golub</A> for txt2html.

<P>I used the following options to convert this document:

<PRE>
     -titlefirst -mailmode -make_tables
     --custom_heading_regexp '^ *--[\w\s]+-- *$'
     --system_link_dict txt2html.dict
     --append_body sample.foot --infile sample.txt --outfile sample.html
</PRE>
<P>This has either been done at the command line with:

<P>        perl -MHTML::TextToHTML -e run_txt2html -- <EM>options</EM>

<P>or using the script

<P>        txt2html <EM>options</EM>

<P>or from a (test) perl script with:
        
<PRE>
        use HTML::TextToHTML;
        my $conv = new HTML::TextToHTML();
        $conv-&gt;txt2html([<EM>options</EM>]);
</PRE>
<HR>

<!-- New Message -->
<P class='mail_header'><A NAME="0">From</A> <A HREF="mailto:bozo@clown.wustl.edu">bozo@clown.wustl.edu</A><BR>
Return-Path: &lt;<A HREF="mailto:bozo@clown.wustl.edu">bozo@clown.wustl.edu</A>&gt;<BR>
Message-Id: &lt;<A HREF="mailto:9405102200.AA04736@clown.wustl.edu">9405102200.AA04736@clown.wustl.edu</A>&gt;<BR>
Content-Length: 1070<BR>
From: <A HREF="mailto:bozo@clown.wustl.edu">bozo@clown.wustl.edu</A> (Bozo the Clown)<BR>
To: <A HREF="mailto:kitty@example.com">kitty@example.com</A> (<A HREF="http://www.katspace.com/">Kathryn Andersen</A>)<BR>
Subject: Re: HTML::TextToHTML<BR>
Date: Sun, 12 May 2002 10:01:10 -0500

<P>Bozo wrote:
<P class='quote_mail'>BtC&gt; Can you post an example text file with its html'ed output?<BR>
BtC&gt; That would provide a much better first glance at what it does<BR>
BtC&gt; without having to look through and see what the perl code does.

<P>Good idea.  I'll write something up.

<HR>

<P>The header lines were kept separate because they looked like mail
headers and I have mailmode on.  The same thing applies to Bozo's
quoted text.  Mailmode doesn't screw things up very often, but since
most people are usually converting non-mail, it's off by default.

<P>Paragraphs are handled ok.  In fact, this one is here just to
demonstrate that.

<P><STRONG>THIS LINE IS VERY IMPORTANT!</STRONG><BR>
(Ok, it wasn't <EM>that</EM> important)

<H1><A NAME="section_1">EXAMPLE HEADER</A></H1>

<P>Since this is the first header noticed (all caps, underlined with an
"="), it will be a level 1 header.  It gets an anchor named
"section_1".

<H2><A NAME="section_1_1">Another example</A></H2>
<P>This is the second type of header (not all caps, underlined with "=").
It gets an anchor named "section_1_1".

<H2><A NAME="section_1_2">Yet another example</A></H2>

<P>This header was in the same style, so it was assigned the same header
tag.  Note the anchor names in the HTML. (You probably can't see them
in your current document view.)  Its anchor is named "section_1_2". 
Get the picture?

<H3><A NAME="section_1_2_1">-- This is a custom header --</A></H3>

<P>You can define your own custom header patterns if you know what your
documents look like.

<H2><A NAME="section_1_3">Features of HTML::TextToHTML</A></H2>

<UL>
  <LI>Handles different kinds of lists
  <OL>
    <LI>Bulleted
    <LI>Numbered
    <UL>
      <LI>You can nest them as far as you want.
      <LI>It's pretty decent about figuring out which level of list it
        is supposed to be on.
      <UL>
        <LI>You don't need to change bullet markers to start a new list.
      </UL>
    </UL>
    <LI>Lettered
    <OL>
      <LI>Finally handles lettered lists
      <LI>Upper and lower case both work
      <OL>
        <LI>Here's an example
        <LI>I've been meaning to add this for some time.
      </OL>
      <LI>HTML without CSS can't specify how ordered lists should be
        indicated, so it will be a numbered list in most browsers.
    </OL>
    <LI>Definition lists (see below)
  </OL>
  <LI>Doesn't screw up mail-ish things
  <LI>Spots preformated text
</UL>
<PRE>
                 It just needs to have enough whitespace in the line.
        Surrounding blank lines aren't necessary.  If it sees enough
        whitespace in a line, it preformats it.  How much is enough?
        Set it yourself at command line if you want.
</PRE>
<UL>
  <LI>You can append a file automatically to all converted files.  This
   is handy for adding signatures to your documents.
  <LI>Deals with paragraphs decently.
<P>   Looks for short lines in the middle of paragraphs and keeps them
   short with the use of breaks (&lt;BR&gt;).  How short the lines need to
   be is configurable.
<P>   Unhyphenates split words that are in the middle of paragraphs.  
   Let me know if trailing punctuation isn't handled "properly".  
   It should be.
<P>   One can also have multi-paragraph list items, like this one.
  <LI>Puts anchors at all headers and, if you're using the mail header
   features, at the beginning of each mail message.  The anchor names
   for headings are based on guessed section numbers.  
  <UL>
    <LI>You can turn off this option too, if you don't like it.
  </UL>
  <LI>Groks Mosaic-style "formatted text" headers (like the one below)
  <LI>Can hyperlink things according to a dictionary file.
   The sample dictionary handles URLs like <A HREF="http://www.aigeek.com/">http://www.aigeek.com/</A> and
   &lt;<A HREF="http://www.katspace.com/">http://www.katspace.com/</A>&gt; and also shows how to do simpler
   things such as linking the word txt2html the first time it appeared.
  <LI>One can also use the link-dictionary to define custom tags, for
   example using the star character to indicate <EM>italics</EM>.
  <LI>Recognises and parses tables of different types:
  <UL>
    <LI>DELIM: A table determined by delimiters.
    <LI>ALIGN: No need for fancy delimiters, this figures out
     a table by looking at the layout, the spacing of the cells.
    <LI>BORDER: has a nice border around the table
    <LI>PGSQL: the same format as Postgresql query results.
  </UL>
  <LI>Also with XHTML!  Turn on the --xhtml option and it will ensure that
   all paragraphs and list items have end-tags, all tags are in
   lower-case, and the doctype is for XHTML.
</UL>
<H4><A NAME="section_1_3_1_1">Example of short lines</A></H4>

<P>We're the knights of the round table<BR>
We dance whene'er we're able<BR>
We do routines and chorus scenes<BR>
With footwork impeccable.<BR>
We dine well here in Camelot<BR>
We eat ham and jam and spam a lot.

<H4><A NAME="section_1_3_1_2">Example of varied formatting</A></H4>

<P>If I want to <EM>emphasize</EM> something, then I'd use stars to wrap
around the words, <EM>even if there were more than one</EM>, <EM>that's</EM>
what I'd do.  But I could also <U>underline</U> words, so long as
the darn thing was not a_variable_name, in which case I wouldn't
want to lose the underscores in something which thought it was
underlining.  Though we might want to <U>underline more than one word</U>
in a sentence.  Especially if it is <U>The Title Of A Book</U>.
For another kind of emphasis, let's go and <STRONG>put something in bold</STRONG>.
<P>   But it doesn't even need to be that simple. Something which is <EM>really
exciting</EM> is coping with italics and similar things <EM>spread across
multiple lines</EM>.

<H4><A NAME="section_1_3_1_3">Example of Long Preformatting</A></H4>

<P>(extract from Let It Rain by Kristen Hall)

<PRE>
        I have given, I have given and got none
        Still I'm driven by something I can't explain
        It's not a cross, it is a choice
        I cannot help but hear his voice
        I only wish that I could listen without shame

        Let it rain, let it rain, on me
        Let it rain, oh let it rain,
        Let it rain, on me

        I have been a witness to the perfect crime
        Wipe the grin off of my face to hide the pain
        It isn't worth the tears you cry
        To have a perfect alibi
        Now I'm beaten at the hands of my own game

        Let it rain, let it rain, on me
        Let it rain, oh let it rain,
        Let it rain, on me
</PRE>
<H4><A NAME="section_1_3_1_4">Definition Lists</A></H4>

<P>A definition list comprises the following:

<DL>
  <DT>Term</DT>
<DD>  The term part of a DL item is a word on a line by itself, ending
with a colon.
  <DT>Definition</DT>
<DD>The definition part of a DL item is at least one paragraph following
the term.
<P>  If one has more than one paragraph in the definition, the first line of
the next paragraph needs to be indented two spaces from where the term
starts, otherwise we don't know that it belongs to the definition.
</DL>
<H4><A NAME="section_1_3_1_5">Examples of Tables</A></H4>

<H5><A NAME="section_1_3_1_5_1">ALIGN</A></H5>

<P>Here is a simple ALIGN table:

<TABLE>
<TR><TD>-e</TD><TD>File exists.
</TD></TR><TR><TD>-z</TD><TD>File has zero size.
</TD></TR><TR><TD>-s</TD><TD>File has nonzero size (returns size).
</TD></TR>
</TABLE>
<P>Here are some of the conditions of ALIGN tables:

<TABLE>
<TR><TD ALIGN="RIGHT"><STRONG>Context:</STRONG></TD><TD>A table needs to be surrounded by blank lines.
</TD></TR><TR><TD ALIGN="RIGHT"><STRONG>Length:</STRONG></TD><TD>A table must contain at least two rows.
</TD></TR><TR><TD ALIGN="RIGHT"><STRONG>Width:</STRONG></TD><TD>A table must contain at least two columns.
</TD></TR><TR><TD ALIGN="RIGHT"><STRONG>Spacing:</STRONG></TD><TD>There needs to be at least two spaces between the columns,
</TD></TR><TR><TD ALIGN="RIGHT"></TD><TD>otherwise there might be some random paragraph which
</TD></TR><TR><TD ALIGN="RIGHT"></TD><TD>could have inter-word spacing that lined up by accident.
</TD></TR><TR><TD ALIGN="RIGHT"><STRONG>Cell Size:</STRONG></TD><TD>If you have more than one line (as just above) then
</TD></TR><TR><TD ALIGN="RIGHT"></TD><TD>you will simply get empty cells where the other column is empty.
</TD></TR><TR><TD ALIGN="RIGHT"><STRONG>Alignment:</STRONG></TD><TD>Alignment of cells is attempted to be preserved.
</TD></TR>
</TABLE>
<H5><A NAME="section_1_3_1_5_2">BORDER</A></H5>

<P>This is a table with a border.

<TABLE border="1">
<THEAD><TR><TH>Food</TH><TH>Qty</TH></TR></THEAD>
<TBODY>
<TR><TD>Bread</TD><TD>1</TD></TR>
<TR><TD>Milk</TD><TD>1</TD></TR>
<TR><TD>Oranges</TD><TD>3</TD></TR>
<TR><TD>Apples</TD><TD>6</TD></TR>
</TBODY>
</TABLE>

<H5><A NAME="section_1_3_1_5_3">PGSQL</A></H5>

<P>This is the same table like Postgresql would make it.

<TABLE border="1">
<THEAD><TR><TH>Food</TH><TH>Qty</TH></TR></THEAD>
<TBODY>
<TR><TD>Bread</TD><TD>1</TD></TR>
<TR><TD>Milk</TD><TD>1</TD></TR>
<TR><TD>Oranges</TD><TD>3</TD></TR>
<TR><TD>Apples</TD><TD>6</TD></TR>
</TBODY>
</TABLE>
<P>        (4 rows)


<H5><A NAME="section_1_3_1_5_4">DELIM</A></H5>

<P>A delimited table needs to have its delimiters at the start and end,
just to be sure that this is a table.

<TABLE border="1">
<TR><TD>Fred</TD><TD>Nurk</TD><TD>58</TD></TR>
<TR><TD>George</TD><TD>Washington</TD><TD>62</TD></TR>
<TR><TD>Mary</TD><TD>Quant</TD><TD>35</TD></TR>
</TABLE>

<P>And one can have almost any delimiter one wishes.

<TABLE border="1">
<TR><TD>Darcy, Fitzwilliam</TD><TD>hero</TD></TR>
<TR><TD>Bennet, Elizabeth</TD><TD>heroine</TD></TR>
<TR><TD>Wickham, George</TD><TD>villain</TD></TR>
</TABLE>

<H1><A NAME="section_2">THINGS TO DO</A></H1>

<P>There are some things which this module doesn't handle yet which
I would like to implement.

<OL>
  <LI>I would like to be able to preserve lettered lists, that is:
  <OL>
    <LI>recognise that they are letters and not numbers (which it already
      does)
    <LI>display the correct OL properties with CSS so as to preserve
      that information.
  </OL>
</OL>
<HR>

<P>The footer is everything from the end of this sentence to the
&lt;/BODY&gt; tag.

<HR>
<ADDRESS>
<A href="http://www.katspace.com/">KatSpace</A>
</ADDRESS>
</BODY>
</HTML>