File: README

package info (click to toggle)
debiandoc-sgml 1.2.27
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 2,156 kB
  • sloc: perl: 11,898; sh: 295; makefile: 259; python: 257
file content (213 lines) | stat: -rw-r--r-- 8,473 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
This is DebianDoc-SGML, an SGML-based documentation formatting package used
for the Debian manuals.

To install it on a non-Debian system edit the Makefile and then run
`make', `make install'.

The changelog is in the debian subdirectory.

Ardo van Rangelrooij <ardo@debian.org>
Ian Jackson <ijackson@gnu.ai.mit.edu>

-----------------------------------------------------------------------------

Message to the future maintainer(s): (Osamu Aoki)

I have re-factored and extended the DebianDoc-SGML package while adding some
UTF-8 support, DebianDoc-SGML pretty print support, XHTML support, DocBook-XML
output support, Wiki support, etc. since 2005.  I have to say this has been a
steep learning experience for me who had no formal SGML education before.

In order to help future maintainer to get started quickly, I will summarize
helper information here at the end of README file here which is only be seen in
the source tree.

* package structure

(Please refer to the user documentation on this for explanation based on the
installed file location) 

This package is made with following files:

|-- COPYING                            (GPL2)
|-- Makefile
|-- README                             (This file)
|-- debian                             (Debian package meta-data)
|   |-- README.Debian                  (User documentation)
|   |-- TODO
|   |-- changelog
|   |-- compat
|   |-- control
|   |-- copyright
|   |-- debiandoc-sgml.install
|   |-- debiandoc-sgml.postinst
|   |-- debiandoc-sgml.postrm
|   |-- debiandoc-sgml.prerm
|   |-- debiandoc-sgml.sgmlcatalogs
|   `-- rules
|-- sgml                               (DTD definition)
|   |-- dtd
|   |   |-- catalog
|   |   |-- debiandoc.dcl
|   |   `-- debiandoc.dtd
|   `-- entities
|       |-- catalog
|       |-- debiandoc-lat1
|       `-- debiandoc-lat2
`-- tools
    |-- bin                            (source for executables)
    |   |-- fixlatex
    |   |-- mkconversions
    |   |-- saspconvert
    |   `-- template
    |-- lib                          
    |   |-- Format                     (output formatting engine)
    |   |   |-- Alias.pm               (alias (.pm) definition of format)
    |   |   |-- Driver.pm
    |   |   |-- Format.pm
    |   |   |-- HTML.pm                (format driver for HTML)
    |   |   |-- LaTeX.pm               (format driver for LaTeX)
    |   |   |-- Texinfo.pm
    |   |   |-- Text.pm
    |   |   |-- TextOV.pm
    |   |   `-- XML.pm
    |   |-- Locale                     (locale and format specific data)
    |   |   |-- Alias.pm               (alias definition of locale values)
    |   |   |-- SGML                   (locale independent data for SGML)
    |   |   |-- XML                    (locale independent data for XML)
    |   |   |-- convert-encoding       (conversion script for the locale data)
    |   |   |-- ca_ES.ISO8859-1        (data for the ca_ES.ISO8859-1 locale)
    |   |   |   |-- HTML               (locale specific data for HTML)
    |   |   |   |-- LaTeX              (locale specific data for LaTeX)
    |   |   |   |-- Texinfo
    |   |   |   |-- Text
    |   |   |   `-- TextOV
    ......... (directory for all locales) 
    |   `-- Map                        (Mapping for non ASCII characters)
    |       |-- Alias.pm
    |       |-- HTML.pm
    |       |-- LaTeX.pm
    |       |-- Texinfo.pm
    |       |-- Text.pm
    |       |-- TextOV.pm
    |       `-- XML.pm
    `-- man                            (manual page)
        `-- debiandoc-sgml.1

* How to add new locale
 1. Create locale named directory by copying en_US.ISO8859-1
 2. Translate phrases and make needed changes
 3. Create alternative encoding data such as UTF-8 ones 
    using convert-encoding script
 4. Adjust UTF-8 data for Unicode.
     utf-8 for HTML
     utf8  for LaTeX
 5. Add new locales to Locale/Alias.pm .

* main conversion scripts debiandoc2*

In order to make all debiandoc2* commands to be consistent, I have merged all
of them completely in to one template file 'tools/bin/template' and introduced
few new format support using existing script as my guide. 

All the debiandoc2* commands are generated by the script
'tools/bin/mkconversions' while parsing this unified script source
'tools/bin/template'.

(This infrastructure of shell/sed combination was there when I
started so please do not ask me why I did not use CPP for this.)

For the debug purpose, I provide 'make diff' which creates 'diff -u' for all
the debiandoc2* commands against the current installed version.  This
functionality is added to help developer to understand implication of the
changes made to the 'tools/bin/template' file and to avoid unintended changes
to the existing scripts when adding features.  

Basically these generated script uses SGML parser to produce output text file
such as plain text, HTML, LaTeX source, etc.  For PostScript and PDF output,
LaTeX source is further processed to produce desired results. 

Since Chinese Big5 encoding is not compatible with TeX (thus neither with
LaTeX), internal fixlatex script is run on the source before handing generated
LaTeX source to LaTeX.  This is because 2nd byte of 16bit Big5 encoding uses
ASCII ranges which makes some 16 bit character to collide with meta characters
such as \ { } used in the LaTeX context.  (The same problem should happen with
Japanese Shift-JIS encoding but we do not support this encoding now thus no
problem suffered.)

New -X option enable to use user provided Locale dependent data.  Execution of
"make test" will execute test build sequence using package source version of
Locale dependent data.  This -X is most useful when fixing Locale dependent
problem or testing new Locale data.

The use of -s option with updated fixlatex script can be used to add Japanese
Shift-JIS encoding support.  But, -X option is better choice for most case for
debugging.

For adjusting language specific data such as the LaTeX starting code:
 * study Format/LaTeX.pm ,
 * play with -X option as described in README.Debian and manpage to find out
   right /usr/share/perl5/DebianDoc_SGML/Locale/* data alternative.
 * adjust tools/lib/Locale/Alias.pm and tools/lib/Locale/xx_YY.encoding/LaTeX.pm 
   files in the source code.

* The meaning of %locale

This has following contents for LaTeX.  The Format/LaTeX.pm file use the value
defined here.

%locale = (
	   'babel' => '',
	   'inputenc' => '',
	   'abstract' => '',
	   'copyright notice' => '',
	   'before begin document' => '',
	   'after begin document' => '',
	   'before end document' => '',
	   'pdfhyperref' => ''
	   );

 * The first 2 are used to define language scheme based on the babel macro.  
   For CJK, this can be undefined.

 * The next 2 are for the word used for abstract and copyright notice in that
   pertinent language.

 * The next 3 are recent addition which provide very flexible ways to create
   proper LaTeX source.  CJK uses these (Can be omitted for European languages)

 * The last one defines how hyperref for PDF are generated with hyperref 
   package.  (We may need this to be defined otherwise for UTF-8 but I do 
   not know?) "hypertex" is the  default value if none is given.  If UTF-8
   locale, I use unicode at this moment as the value.

 * For LaTeX language dependent parameter, I use babel name of 
  "*.sty" from /usr/share/texmf-texlive/tex/generic/babel if available.
  Exception: 
   * vietnam 
   * lithuanian
 * Read "The Not So Short Introduction to LaTeX 2ε" by Tobias Oetiker 
   to get some LaTeX idea.
 * Read "The CJK package for LaTeX 2ε — Multilingual support beyond babel" 
   by Werner Lemberg to get some CJK idea.  It looks like current CJK 
   environment (2007/08) is not good enough for UTF-8.
 * Read CTAN archive for unicode.  (me too.)
   * http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=unicode
   * http://tug.ctan.org/tex-archive/macros/latex/contrib/unicode/

Similar thing can be done for HTML with %locale.  The Format/HTML.pm file use
the value for "charset" in this when generating HTML.

Package requirements:

As for required packages (especially for LaTeX processing (PS,
PDF formats)), see cjk-latex-* packages.

Please note that a ghostscript interpreter such as gs-gpl, gs-esp should
(not must) be installed too for PDF thumnail generation.

Conversion functions back to normalized SGML and XML formats are 
available.  The XML generated require some manual action.

Osamu Aoki <osamu@debian.org>        Sat, 04 Aug 2007 21:46:45 +0900