File: indexing.html

package info (click to toggle)
docbook-dsssl-doc 1.79-7
  • links: PTS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 3,724 kB
  • sloc: makefile: 2
file content (293 lines) | stat: -rw-r--r-- 6,021 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Automatic Indexing with the DocBook DSSSL Stylesheets</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"></HEAD
><BODY
CLASS="ARTICLE"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="ARTICLE"
><DIV
CLASS="TITLEPAGE"
><H1
CLASS="TITLE"
><A
NAME="AEN2"
>Automatic Indexing with the DocBook DSSSL Stylesheets</A
></H1
><H3
CLASS="AUTHOR"
><A
NAME="AEN4"
>Norman Walsh</A
></H3
><P
CLASS="PUBDATE"
>17 Nov 1998<BR></P
><DIV
><DIV
CLASS="ABSTRACT"
><P
></P
><A
NAME="AEN8"
></A
><P
>Automatic indexing is an often requested feature. This article describes
how it is implemented in the DocBook DSSSL Stylesheets.</P
><P
></P
></DIV
></DIV
><HR></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AEN10"
>Authoring for Indexing</A
></H1
><P
>There are two parts to building an index automatically, creating the
index terms and incorporating the generated index into your document.</P
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN13"
>Creating Index Terms</A
></H2
><P
>The generated index is constructed from <CODE
CLASS="SGMLTAG"
>IndexTerm</CODE
>s
in your document. DocBook <CODE
CLASS="SGMLTAG"
>IndexTerm</CODE
>s are not part of the
flow.</P
><PRE
CLASS="SCREEN"
>&lt;para&#62;
This paragraph contains an interesting thing&lt;indexterm id="thing"&#62;
&lt;primary&#62;thing&lt;/primary&#62;&lt;secondary&#62;interesting&lt;/secondary&#62;&lt;/indexterm&#62; that
will appear in the index.
&lt;/para&#62;</PRE
><P
>It is not absolutely necessary to provide an ID for each index term,
but the performance of the print backends may degrade significantly if you
have a large number of index terms that do not have IDs.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN20"
>Incorporating the Index</A
></H2
><P
>The index will be generated as a separate file. You must arrage to have
this file incorporated into your document. The easiest way to do this is by
file entity reference. At the top of your document, add an internal subset
that defines the index file entity:</P
><PRE
CLASS="SCREEN"
>&lt;!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
&lt;!ENTITY genindex.sgm SYSTEM "genindex.sgm"&#62;
]&#62;
&lt;book&#62;
...
&amp;genindex.sgm; &lt;!-- Put this after the end tag of the last chapter or appendix, or     --&#62;
               &lt;!-- wherever you want the index to appear. It must be a valid location --&#62;
               &lt;!-- for an index. --&#62;
&lt;/book&#62;</PRE
><P
>Before you can process this document, you must make sure that <TT
CLASS="FILENAME"
>genindex.sgm</TT
> exists. This is a chicken and egg problem, but it
can be solved with the <B
CLASS="COMMAND"
>collateindex.pl</B
> command:</P
><PRE
CLASS="SCREEN"
>perl collateindex.pl -N -o genindex.sgm</PRE
><P
>The <CODE
CLASS="OPTION"
>-N</CODE
> option creates a new index; <CODE
CLASS="OPTION"
>-o</CODE
>
indentifies the name of the output file. This name must be the same as the
name you specified in the internal subset.</P
></DIV
></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AEN31"
>Creating an Index</A
></H1
><P
>Creating an index is a multi-step, two-pass process:</P
><DIV
CLASS="PROCEDURE"
><OL
TYPE="1"
><LI
CLASS="STEP"
><P
>In order to create an index, you must first generate the raw index
data. This is done with the HTML Stylesheet (<SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>even if you want print
output</I
></SPAN
>).</P
><P
>Process your document with <B
CLASS="COMMAND"
>jade</B
> using the HTML Stylesheet
with the <CODE
CLASS="OPTION"
>-V html-index</CODE
> option:</P
><PRE
CLASS="SCREEN"
>jade -t sgml -d <TT
CLASS="REPLACEABLE"
><I
>html/docbook.dsl</I
></TT
> -V html-index <TT
CLASS="REPLACEABLE"
><I
>yourdocument.sgm</I
></TT
></PRE
><P
>This will produce a file called <TT
CLASS="FILENAME"
>HTML.index</TT
> that
contains raw index data.</P
><P
>If you're planning to generate your final document as a single HTML
file using the <CODE
CLASS="OPTION"
>nochunks</CODE
> option, make sure you generate
the <TT
CLASS="FILENAME"
>HTML.index</TT
> file with that option as well:</P
><PRE
CLASS="SCREEN"
>jade -t sgml -d <TT
CLASS="REPLACEABLE"
><I
>html/docbook.dsl</I
></TT
> -V html-index -V nochunks <TT
CLASS="REPLACEABLE"
><I
>yourdocument.sgm</I
></TT
></PRE
></LI
><LI
CLASS="STEP"
><P
>Generate an index document with <B
CLASS="COMMAND"
>collateindex.pl</B
>:</P
><PRE
CLASS="SCREEN"
>perl collateindex.pl -o genindex.sgm HTML.index</PRE
><P
>There are a multitude of options to <B
CLASS="COMMAND"
>collateindex.pl</B
>;
see <A
HREF="collateindex.html"
TARGET="_top"
>the reference page</A
>
for more information.</P
></LI
><LI
CLASS="STEP"
><P
>Process your original document again, using whichever stylesheet
is appropriate. The new document will contain the generated index.</P
></LI
></OL
></DIV
></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AEN61"
>Drawbacks</A
></H1
><P
>Any generated index is perhaps better than none, but there are still
a few things that <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>cannot</I
></SPAN
> be accomplished:</P
><P
></P
><OL
TYPE="1"
><LI
><P
>Duplicate page numbers are not suppressed in the index. If
the document contains three indexing hits on page 4, the generated index will
contain &ldquo;4, 4, 4&rdquo;.</P
></LI
><LI
><P
>Ranges are not automatically constructed. If the document
contains indexing hits on pages 4, 5, 6, and 7, the generated index will contain
&ldquo;4, 5, 6, 7&rdquo; instead of &ldquo;4&ndash;7&rdquo;.</P
></LI
></OL
><P
>It is possible that the TeX backend could be made smart enough to do
these things automatically. (Sebastian will probably kill me for suggesting
that). For the RTF backend, at least in MS Word, it's probably possible to
write a WordBasic macro that would automatically fix the index. (If someone
does, please pass it along).</P
></DIV
></DIV
></BODY
></HTML
>