File: index.txt

package info (click to toggle)
t2html 2016.1020%2Bgit294e8d7-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 836 kB
  • sloc: perl: 4,711; makefile: 133
file content (301 lines) | stat: -rw-r--r-- 15,004 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
Table of contents

       1.0 Document id
           1.1 General
           1.2 Description
              1.2.1 Overview of features:
              1.2.2 HTML conversion
              1.2.3 HTML 4.01
              1.2.4 Link check for the text file
              1.2.5 Splitting the text file to pieces
           1.3 Curious what the document looks like?
           1.4 Writing a text document
           1.5 Emacs and minor mode
           1.6 Ripping program documentation
              1.6.1 Documentation tools in programming languages
              1.6.2 Other programming languages
           1.7 Download the code

       2.0 Other converters
           2.1 Postscript
           2.2 Texinfo
           2.3 Other text to HTML tools
           2.4 General Document Maintenance tools

1.0 Document id

    1.1 General

        #T2HTML-TITLE Conversion for text files
        #T2HTML-OPTION --css-file=index.css
        #T2HTML-OPTION --css-code-bg
        #T2HTML-OPTION --css-code-note=Note:
        #T2HTML-OPTION --simple
        #T2HTML-COMMENT t2html --auto-detect --out <file>

        Copyright (C) 1996-2016 Jari Aalto

        License: This material may be distributed only subject to
        the terms and conditions set forth in GNU General Public
        License v2 or later; or, at your option, distributed under the
        terms of GNU Free Documentation License version 1.2 or later
        (GNU FDL).

    1.2 T2html program features

        Writing text documents is different from writing messages to
        Usenet or to your fellow workers. There already exists several
        tools to convert email messages into HTML, like *MHonArc*,
        Email hyper archiver, but for regular text documents, like for
        memos, FAQs, help pages and for other papers, there wasn't
        any suitable HTML converter couple of years back. The author
        wanted a simple HTML tool which would read _pure_ _plain_
        _text_ documents, like guides, tips pages, documentation,
        book mark pages etc. and convert them into HTML. Here you will
        find the specification how to format your text documents for
        *t2html.pl* perl script text to HTML converter.

        Few arguments, why plain text is the best source document format:

        o   It is readable by all, without any extra software
        o   Deliverable by email, as is.
        o   Most easily kept in version control
        o   Most easily patched ( when someone sends a diff -u ...)
        o   Most easily handed to someone else when author no longer
            maintain it. (If you have specialized tools, people
            need to learn them in order to maintain your FAQ.)

       1.2.1 Overview of features:

        o   Requires Perl 5.004 or never
        o   500K text document takes 70 seconds to convert to HTML.
        o   TF to Perl POD conversion may be in a future plan.
        o   Better linking of multiple files planned
        o   Configuration file for individual file options planned.

       1.2.2 HTML conversion

        o   minimal mark up: rendering is based on indentation level.
            Written text document looks like a "Natural Document", and is
            suitable for reading as such.
        o   Text layout with indentation rules is called Technical Format
            (TF) and document must be formatted according to it before it
            is suitable for HTML generation.
        o   Rules are simple: place heading to the left and text at column 8.
        o   Program generates *META* tags for search engines.
        o   Colored html page: <EM> <STRONG> <PRE> ...
        o   Hyperlinks and email addresses are automatically detected.
            No mark up is needed.

       1.2.3 HTML 4.01

        o   Make a single html (1 file) or *Framed* version (3 files)
        o   Sample CSS2 (Cascading Style Sheet) included in HTML code for
            document rendering. User can import his own CSS2.

       1.2.4 Link check for the text file

        o   You need LWP module in order to use this feature. (Comes with
            latest Perl)
        o   Program has switches to run Link check on your text file
            to find out any broken or moved link. Currently you
            have to manually fix the links, nut an Emacs mode to do this
            automatically is planned. The output from Link check is standard
            grep style:  *FILE:NBR:Error-Description*

       1.2.5 Splitting the text file to pieces

        o   You can split very large document into pieces, e.g. according
            to top level headings and convert each piece to HTML. This is
            also handy if you're planning to print Slides for a class:
            Split on Headings to individual files: raise the font point
            and print each file separately.

    1.3 How to convert text files into HTML?

        The TF specification can be found from the #URL<../manual><Manual>
        The command used to generate this page was:

            t2html.pl                                                     \
            --author           "Jari Aalto"                               \
            --title            "Conversion for text files"                \
            --html-body         LANG=en                                   \
            --Out                                                         \
            index.txt

    1.4 Writing a text document

        You need nothing else but a text editor where the current column
        number is displayed or editor can be configured to advance your
        TAB by 4 spaces. That's it.
        An Emacs minor mode (See package
        #URL<http://freecode.com/projects/emacs-tiny-tools><tinytf.el>) can
        make the writing documents easy. The mode will help formatting
        paragraphs, filling bullets numbering headings and keeping TOC
        up to date.

    1.5 Ripping program documentation

       1.5.1 Documentation tools in programming languages

        *Perl* is an exception within programmin languages, because it
        includes internal documentation syntax called _POD_ (Plain Old
        Syntax), with which you can embed documentation right into the
        program source. Deriving the documentation from perl programs
        is a straightforward job. Another well known language
        (invented long after Perl) is Java, which calls the embedded
        documentation *javadoc*. fro all others, there is need to
        write separate documentation.

       1.5.2 Other programming languages

        But it is possible to embed documentation inside any
        programming language: directly into the code. A small Perl
        utility can be used to extract the documentation provided it
        was written in TF format. Documentation is put at the
        beginning of the file and updated there. Program `ripdoc.pl'
        extracts the documentation which follows TF guidelines. The
        idea is that you can generate HTML documents from the embedded
        'TF pod'. The conversion goes like this:

            ripdoc.pl code.sh | t2html.pl > code.html
            ripdoc.pl code.el | t2html.pl > code.html
            ripdoc.pl code.cc | t2html.pl > code.html

        Suitable for awk, shell, sh, ksh, C++, Java, Lisp, python,
        Tcl etc. programming languages. The only criteria is that the language
        supports *one-comment-starter* and that the documentation has
        been written by using it. Languages that have *comment-start*
        and *comment-end*, like C that has /* and */, are not suitable for
        ripdoc.pl.

2.0 Other converters

    2.1 Postscript

        o   *html2ps* converter by Jan Karrman's <jan@tdb.uu.se> at
            http://www.tdb.uu.se/~jan/html2ps.html
        o   html to ps converter
            http://www.tdb.uu.se/~jan/html2ps.html
        o   html to ps converter by Charlie's Perl at
            http://www.antipope.org/charlie/webbook/essays/toolkit.html

    2.2 Texinfo

        o   See page http://www.fido.de/kama/texinfo/texinfo-en.html
            where you can find C-program *html2texinfo* program
        o   Perl program *html2texi.pl*
            http://www.cs.washington.edu/homes/mernst/software/#html2texi
            html2texi converts HTML documentation trees into Texinfo
            format.  Texinfo format can be easily converted to Info format
            (for browsing in Emacs or the stand alone Info browser), to a
            printed manual, or to HTML. Thus, html2texi.pl permits
            conversion of HTML files to Info format, and secondarily
            enables producing printed versions of Web page
            hierarchies. Unlike HTML, Info format is searchable. Since Info
            is integrated into Emacs, one can read documentation without
            starting a separate Web browser. Additionally, Info browsers
            (including Emacs) contain convenient features missing from Web
            browsers, such as easy index lookup and mouse-free browsing.

    2.3 Other text to HTML tools

        o   *asciidoc* Python program to convert text files.
            http://sourceforge.net/projects/asciidoc
        o   *t2php* Implementation in PHP language of the
            technical format. Visit
            http://rule-project.org/text/en/sw/t2php.txt
        o   *Wiki*, a simple text rule mark up.
            http://c2.com/cgi/wiki?TextFormattingRules
        o   *Zope* A Stuctured text, which seems to rely on indentation
            level as well. The tool has been written in Python language.
            See http://www.zope.org/Documentation/Articles/STX and
            http://www.zope.org/Members/millejoh/structuredText
        o   *htmlpp* by iMATIX's is at http://www.imatix.com/. This
            is like C-preprosessor where you have have complex
            and powerful text-markup commands. The base file
            ,for html generation is not easily text-readable.

            See also http://www.imatix.com/html/gslgen/index.htm GSLgen is
            a general-purpose file generator. It generates source code,
            data, or other files from an XML file and a schema file. The
            XML file defines a particular set of data. The schema file
            tells GSLgen what to do with that data

        o   *No-TagsMarkup* by Scott S. Lawton. Another interesting
            plain-text style, similar to TF, is at
            http://www.prefab.com/ssl/notagsmarkup.html . Compared to TF,
            this style needs more markup and lacks come of the advanced
            features like Frame/colour/CSS2 support.
        o   *setext* by Ian Feldman's, a simple text markup is available at
            <setext@tidbits.com>
        o   *text2html.pl* by Set Golub's Perl script is at
            http://www.cs.wustl.edu/~seth/txt2html/. This is a very good tool
            if you want to convert mail message into html quickly. Use it for
            ad hoc things.
        o   *faq2text*, A C-code (Unix) based text to HTML converter at
            http://www.fadden.com/dl-misc/#faq2html
        o   *faq2html* ftp://ftp.eyrie.org/pub/software/web/faq2html

    2.4 Other Utilities

        o    #URL<http://www.oreilly.com/catalog/docbook><DocBook - SGML online book>
        o    #URL<http://www.mathematik.uni-kl.de/~obachman/Texi2html><Texi2html>
             Perl script.
        o    #URL<http://www.w3.org/People/Raggett/tidy/><HTML tidy>
             remove extra markup.
        o    #URL<http://www.physics.purdue.edu/~hinson/ftl><FTL>
             Latex like Perl formatting
        o    #URL<http://www.cs.ust.hk/~otfried/Hyperlatex/><Hyperlatex>
             "Hyperlatex is a package that allows you to prepare documents
             in HTML, and, at the same time, to produce a neatly printed
             document from your input. Unlike some other systems that you
             may have seen, Hyperlatex is not a general LaTeX-to-HTML
             converter. In my eyes, conversion is not a solution to HTML
             authoring. A well written HTML document must differ from a
             printed copy in a number of rather subtle ways. I doubt that
             these differences can be recognized mechanically, and I
             believe that converted LaTeX can never be as readable as a
             document written in HTML.  The basic idea of Hyperlatex is to
             make it possible to write a document that will look like a
             flawless LaTeX document when printed and like a handwritten
             HTML document when viewed with an HTML browser."

        o    #URL<http://www.cs.washington.edu/homes/mernst/software/#html2texi><html2texi>
             "html2texi converts HTML documentation trees into Texinfo format.
             Texinfo format can be easily converted to Info format (for browsing
             in Emacs or the stand alone Info browser), to a printed manual, or
             to HTML. Thus, html2texi.pl permits conversion of HTML files to
             Info format, and secondarily enables producing printed versions of
             Web page hierarchies. Unlike HTML, Info format is searchable. Since
             Info is integrated into Emacs, one can read documentation without
             starting a separate Web browser. Additionally, Info browsers
             (including Emacs) contain convenient features missing from Web
             browsers, such as easy index lookup and mouse-free browsing."
        o    #URL<http://www.kfa-juelich.de/isr/1/texconv/textopc.html><RTF in PC>
        o    #URL<http://packages.debian.org/unstable/text/catdoc.html><catdoc>
             Viewing MS WORD files.
             Catdoc is simple, one C source file, compiles in any system (DOS;
             Unix). Feed MS word file to it and it gives 7bit text out of it.
        o    #URL<ftp://ftp.dante.de:/pub/tex/tools/word2x/><word2x>
             Viewing MS WORD files.
        o    #URL<http://www.csn.ul.ie/~caolan/docs/MSWordView.html><MSWordView>
             "MSWordView is a program that can understand the microsofts word
             8 binary file format (office97), it currently converts word into
             html, which can then be read with a browser."
        o    #URL<http://wwwwbs.cs.tu-berlin.de/~schwartz/pmh/><Laola>
             Viewing MS WORD files.
             "Laola(perl) does a respectable job of taking MSWord files to text
             ...LAOLA is giving access to the raw document streams of any program
             using "structured storage" technology to save its documents.
             ELSER is dealing especially with these streams as they are present
             in Word 6 and Word 7 documents."

    2.5 General Document Maintenance tools

        o   Faq maintainer toolset page is at following page:
            http://www.qucis.queensu.ca/FAQs/FAQaid/ It contains all the
            known tools to make you FAQ maintenance/posting/updating easier
            in any platform.

End