File: doclifter.1

package info (click to toggle)
doclifter 2.7-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 688 kB
  • sloc: python: 8,307; xml: 970; makefile: 116
file content (1125 lines) | stat: -rw-r--r-- 33,204 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
'\" t
.\"     Title: doclifter
.\"    Author: [see the "Author" section]
.\" Generator: DocBook XSL Stylesheets v1.75.2 <http://docbook.sf.net/>
.\"      Date: 11/25/2010
.\"    Manual: Documentation Tools
.\"    Source: doclifter
.\"  Language: English
.\"
.TH "DOCLIFTER" "1" "11/25/2010" "doclifter" "Documentation Tools"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
doclifter \- translate troff requests into DocBook
.SH "SYNOPSIS"
.HP \w'\fBdoclifter\fR\ 'u
\fBdoclifter\fR [\-e\ \fIencoding\fR] [\-h\ \fIhintfile\fR] [\-q] [\-x] [\-v] [\-w] [\-D\ \fItoken=type\fR] [\-I\ \fIpath\fR] \fIfile\fR...
.SH "DESCRIPTION"
.PP
\fBdoclifter\fR
translates documents written in troff macros to DocBook\&. Structural subsets of the requests in
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7), and
\fBtroff\fR(1)
are supported\&.
.PP
The translation brings over all the structure of the original document at section, subsection, and paragraph level\&. Command and C function synopses are translated into DocBook markup, not just a verbatim display\&. Tables (TBL markup) are translated into DocBook table markup\&. PIC diagrams are translated into SVG\&. Troff\-level information that might have structural implications is preserved in XML comments\&.
.PP
Where possible, font\-change macros are translated into structural markup\&.
\fBdoclifter\fR
recognizes stereotyped patterns of markup and content (such as the use of italics in a FILES section to mark filenames) and lifts them\&. A means to edit, add, and save semantic hints about highlighting is supported\&.
.PP
Some cliches are recognized and lifted to structural markup even without highlighting\&. Patterns recognized include such things as URLs, email addresses, man page references, and C program listings\&.
.PP
The tag
\fB\&.in\fR
and
\fB\&.ti\fR
requests are passed through with complaints\&. They indicate presentation\-level markup that
\fBdoclifter\fR
cannot translate into structure; the output will require hand\-fixing\&.
.PP
The tag
\fB\&.ta\fR
is passed through with a complaint unless followed by text lines containing tabs, in which case the following span of lines containing tabs is lifted to a table\&.
.PP
Under some circumstances,
\fBdoclifter\fR
can even lift formatted manual pages and the text output produced by
\fBlynx\fR(1)
from HTML\&. If it finds no macros in the input, but does find a NAME section header, it tries to interpret the plain text as a manual page (skipping boilerplate headers and footers generated by
\fBlynx\fR(1))\&. Translations produced in this way will be prone to miss structural features, but this fallback is good enough for simple man pages\&.
.PP
\fBdoclifter\fR
does not do a perfect job, merely an extremely good one\&. Final polish should be applied by a human being capable of recognizing patterns too subtle for a computer\&. But
\fBdoclifter\fR
will almost always produce translations that are good enough to be usable before hand\-hacking\&.
.PP
See the
Troubleshooting
section for discussion of how to solve document conversion problems\&.
.SH "OPTIONS"
.PP
If called without arguments
\fBdoclifter\fR
acts as a filter, translating troff source input on standard input to DocBook markup on standard output\&. If called with arguments, each argument file is translated separately (but hints are retained, see below); the suffix
\&.xml
is given to the translated output\&.
.PP
\-h
.RS 4
Name a file to which information on semantic hints gathered during analysis should be written\&.
.RE
.PP
\-D
.RS 4
The
\fB\-D\fR
allows you to post a hint\&. This may be useful, for example, if
\fBdoclifter\fR
is mis\-parsing a synopsis because it doesn\*(Aqt recognize a token as a command\&. This hint is merged after hints in the input source have been read\&.
.RE
.PP
\-I
.RS 4
The
\fB\-I\fR
option adds its argument to the include path used when docfilter searches for inclusions\&. The include path is initially just the current directory\&.
.RE
.PP
\-e
.RS 4
The
\fB\-e\fR
allows you to set the encoding field to be emitted in the output XML\&. It defaults to ISO\-8859\-1 (Latin\-1)\&.
.RE
.PP
\-q
.RS 4
Normally, requests that
\fBdoclifter\fR
could not interpret (usually because they\*(Aqre presentation\-level) are passed through to XML comments in the output\&. The \-q option suppresses this\&. It also suppresses listing of macros\&. Messages about requests that are unrecognized or cannot be translated go to standard error whatever the state of this option\&. This option is intended to reduce clutter when you believe you have a clean lift of a document and want to lose the troff legacy\&.
.RE
.PP
\-x
.RS 4
The \-x option requests that
\fBdoclifter\fR
generated DocBook version 5 compatible xml content, rather than its default DocBook version 4\&.4 output\&. Inclusions and entities may not be handled correctly with this switch enabled\&.
.RE
.PP
\-v
.RS 4
The \-v option makes
\fBdoclifter\fR
noisier about what it\*(Aqs doing\&. This is mainly useful for debugging\&.
.RE
.PP
\-w
.RS 4
Enable strict portability checking\&. Multiple instances of \-w increase the strictness\&. See
the section called \(lqPORTABILITY CHECKING\(rq\&.
.RE
.SH "TRANSLATION RULES"
.PP
Overall, you can expect that font changes will be turned into
Emphasis
macros with a
Remap
attribute taken from the troff font name\&. The basic font names are R, I, B, U, CW, and SM\&.
.PP
Troff and macro\-package special character escapes are mapped into ISO character entities\&.
.PP
When
\fBdoclifter\fR
encounters a
\fB\&.so\fR
directive, it searches for the file\&. If it can get read access to the file, and open it, and the file consists entirely of command lines and comments, then it is included\&. If any of these conditions fails, an entity reference for it is generated\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a display such as is generated by
\fB\&.DS/\&.DE\fR\&. It repeatedly tries to parse first a function synopsis, and then plain text off what remains in the display\&. Thus, most inline C function prototypes will be lifted to structured markup\&.
.PP
Some notes on specific translations:
.SS "Man Translation"
.PP
\fBdoclifter\fR
does a good job on most man pages, It knows about the extended
\fBUR\fR/\fBUE\fR/\fBUN\fR
requests supported under Linux\&. If any
\fB\&.UR\fR
request is present, it will translate these but not wrap URLs outide them with
Ulink
tags\&. It also knows about the extended
\fB\&.L\fR
(literal) font markup from Bell Labs Version 8, and its friends\&.
.PP
The
\fB\&.TH\fR
macro is used to generate a
RefMeta
section\&. If present, the date/source/manual arguments (see
\fBman\fR(7)) are wrapped in
RefMiscInfo
tag pairs with those class attributes\&. Note that
\fBdoclifter\fR
does not change the date\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a synopsis section\&. It repeatedly tries to parse first a function synopsis, then a command synopsis, and then plain text off what remains in the section\&.
.PP
The following man macros are translated into emphasis tags with a remap attribute:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.L\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.BL\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.IL\fR,
\fB\&.RB\fR,
\fB\&.RI\fR,
\fB\&.RL\fR,
\fB\&.LB\fR,
\fB\&.LI\fR,
\fB\&.LR\fR,
\fB\&.SB\fR,
\fB\&.SM\fR\&. Some stereotyped patterns involving these macros are recognized and turned into semantic markup\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.LP\fR,
\fB\&.PP\fR,
\fB\&.P\fR,
\fB\&.HP\fR, and the single\-argument form of
\fB\&.IP\fR\&.
.PP
The two\-argument form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character)\&.
.PP
The following macros are translated semantically:
\fB\&.SH\fR,\fB\&.SS\fR,
\fB\&.TP\fR,
\fB\&.UR\fR,
\fB\&.UE\fR,
\fB\&.UN\fR,
\fB\&.IX\fR\&. A
\fB\&.UN\fR
call just before
\fB\&.SH\fR
or
\fB\&.SS\fR
sets the ID for the new section\&.
.PP
The
\fB\e*R\fR,
\fB\e*(Tm\fR,
\fB\e*(lq\fR, and
\fB\e*(rq\fR
symbols are translated\&.
.PP
The following (purely presentation\-level) macros are ignored:
\fB\&.PD\fR,\fB\&.DT\fR\&.
.PP
The
\fB\&.RS\fR/\fB\&.RE\fR
macros are translated differently depending on whether or not they precede list markup\&. When
\fB\&.RS\fR
occurs just before
\fB\&.TP\fR
or
\fB\&.IP\fR
the result is nested lists\&. Otherwise, the
\fB\&.RS\fR/\fB\&.RE\fR
pair is translated into a Blockquote tag\-pair\&.
.PP
\fB\&.DS\fR/\fB\&.DE\fR
is not part of the documented man macro set, but is recognized because it shows up with some frequency on legacy man pages from older Unixes\&.
.PP
Certain extension macros originally defined under Ultrix are translated structurally, including those that occasionally show up on the manual pages of Linux and other open\-source Unixes\&.
\fB\&.EX\fR/\fB\&.EE\fR
(and the synonyms
\fB\&.Ex\fR/\fB\&.Ee\fR),
\fB\&.Ds\fR/\fB\&.De\fR,

\fB\&.NT\fR/\fB\&.NE\fR,
\fB\&.PN\fR, and
\fB\&.MS\fR
are translated structurally\&.
.PP
The following extension macros used by the X distribution are also recognized and translated structurally:
\fB\&.FD\fR,
\fB\&.FN\fR,
\fB\&.IN\fR,
\fB\&.ZN\fR,
\fB\&.hN\fR, and
\fB\&.C{\fR/\fB\&.C}\fR
The
\fB\&.TA\fR
and
\fBIN\fR
requests are ignored\&.
.PP
When the man macros are active, any
\fB\&.Pp\fR
macro definition containing the request
\fB\&.PP\fR
will be ignored\&. and all instances of
\fB\&.Pp\fR
replaced with
\fB\&.PP\fR\&. Similarly,
\fB\&.Tp\fR
will be replaced with
\fB\&.TP\fR\&. This is the least painful way to deal with some frequently\-encountered stereotyped wrapper definitions that would otherwise cause serious interpretation problems
.PP
Known problem areas with man translation:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Weird uses of
\fB\&.TP\fR\&. These will sometime generate invalid XML and sometimes result in a FIXME comment in the generated XML (a warning message will also go to standard error)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
It is debatable how the man macros
\fB\&.HP\fR
and
\fB\&.IP\fR
without tag should be translated\&. We treat them as an ordinary paragraph break\&. We could visually simulate a hanging paragraph with list markup, but this would not be a structural translation\&.
.RE
.SS "Pod2man Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros produced by
\fBpod2man\fR
(\fB\&.Sh\fR,
\fB\&.Sp\fR,
\fB\&.Ip\fR,
\fB\&.Vb\fR,
\fB\&.Ve\fR) and translates them structurally\&.
.PP
The results of lifting pages produced by
\fBpod2man\fR
should be checked carefully by eyeball, especially the rendering of command and function synopses\&.
\fBPod2man\fR
generates rather perverse markup;
\fBdoclifter\fR\*(Aqs struggle to untangle it is sometimes in vain\&.
.PP
If possible, generate your DocBook from the POD sources\&. There is a
pod2docbook
module on CPAN that does this\&.
.SS "Tkman Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros used by the Tcl/Tk documentation system:
\fB\&.AP\fR,
\fB\&.AS\fR,
\fB\&.BS\fR,
\fB\&.BE\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.DS\fR,
\fB\&.DE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR,
\fB\&.UL\fR,
\fB\&.VS\fR,
\fB\&.VE\fR\&. The
\fB\&.AP\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR, and
\fB\&.UL\fR
macros are translated structurally\&.
.SS "Mandoc Translation"
.PP
\fBdoclifter\fR
should be able to do an excellent job on most
\fBmdoc\fR(7)
pages, because this macro package expresses a lot of semantic structure\&.
.PP
Known problems with mandoc translation: All
\fB\&.Bd\fR/\fB\&.Ed\fR
display blocks are translated as
LiteralLayout
tag pairs
\&.
.SS "Ms Translation"
.PP
\fBdoclifter\fR
does a good job on most ms pages\&. One weak spot to watch out for is the generation of Author and Affiliation tags\&. The heuristics used to mine this information out of the
\fB\&.AU\fR
section work for authors who format their names in the way usual for English (e\&.g\&. "M\&. E\&. Lesk", "Eric S\&. Raymond") but are quite brittle\&.
.PP
For a document to be recognized as containing ms markup, it must have the extension
\&.ms\&. This avoids problems with false positives\&.
.PP
The
\fB\&.TL\fR,
\fB\&.AU\fR,
\fB\&.AI\fR, and
\fB\&.AE\fR
macros turn into article metainformation in the expected way\&. The
\fB\&.PP\fR,
\fB\&.LP\fR,
\fB\&.SH\fR, and
\fB\&.NH\fR
macros turn into paragraph and section structure\&. The tagged form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character); the untagged version is treated as an ordinary paragraph break\&.
.PP
The
\fB\&.DS\fR/\fB\&.DE\fR
pair is translated to a
LiteralLayout
tag pair
\&. The
\fB\&.FS\fR/\fB\&.FE\fR
pair is translated to a
Footnote
tag pair\&. The
\fB\&.QP\fR/\fB\&.QS\fR/\fB\&.QE\fR
requests define
BlockQuotes\&.
.PP
The
\fB\&.UL\fR
font change is mapped to U\&.
\fB\&.SM\fR
and
\fB\&.LG\fR
become numeric plus or minus size steps suffixed to the
Remap
attribute\&.
.PP
The
\fB\&.B1\fR
and
\fB\&.B2\fR
box macros are translated to a
Sidebar
tag pair\&.
.PP
All macros relating to page footers, multicolumn mode, and keeps are ignored (\fB\&.ND\fR,
\fB\&.DA\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.MC\fR,
\fB\&.BX\fR,
\fB\&.KS\fR,
\fB\&.KE\fR,
\fB\&.KF\fR)\&. The
\fB\&.R\fR,
\fB\&.RS\fR, and
\fB\&.RE\fR
macros are ignored as well\&.
.SS "Me Translation"
.PP
Translation of me documents tends to produce crude results that need a lot of hand\-hacking\&. The format has little usable structure, and documents written in it tend to use a lot of low\-level troff macros; both these properties tend to confuse
\fBdoclifter\fR\&.
.PP
For a document to be recognized as containing me markup, it must have the extension
\&.me\&. This avoids problems with false positives\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.lp\fR,
\fB\&.pp\fR\&. The
\fB\&.ip\fR
macro is translated into a
VariableList\&. The
\fB\&.bp\fR
macro is translated into an
ItemizedList\&. The
\fB\&.np\fR
macro is translated into an
OrderedList\&.
.PP
The b, i, and r fonts are mapped to emphasis tags with B, I, and R
Remap
attributes\&. The
\fB\&.rb\fR
("real bold") font is treated the same as
\fB\&.b\fR\&.
.PP
\fB\&.q(\fR/\fB\&.q)\fR
is translated structurally
\&.
.PP
Most other requests are ignored\&.
.SS "Mm Translation"
.PP
Memorandum Macros documents translate well, as these macros carry a lot of structural information\&. The translation rules are tuned for Memorandum or Released Paper styles; information associated with external\-letter style will be preserved in comments\&.
.PP
For a document to be recognized as containing mm markup, it must have the extension
\&.mm\&. This avoids problems with false positives\&.
.PP
The following highlight macros are translated int Emphasis tags:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.R\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.RB\fR,
\fB\&.RI\fR\&.
.PP
The following macros are structurally translated:
\fB\&.AE\fR,
\fB\&.AF\fR,
\fB\&.AL\fR,
\fB\&.RL\fR,
\fB\&.APP\fR,
\fB\&.APPSK\fR,
\fB\&.AS\fR,
\fB\&.AT\fR,
\fB\&.AU\fR,
\fB\&.B1\fR,
\fB\&.B2\fR,
\fB\&.BE\fR,
\fB\&.BL\fR,
\fB\&.ML\fR,
\fB\&.BS\fR,
\fB\&.BVL\fR,
\fB\&.VL\fR,
\fB\&.DE\fR,
\fB\&.DL\fR
\fB\&.DS\fR,
\fB\&.FE\fR,
\fB\&.FS\fR,
\fB\&.H\fR,
\fB\&.HU\fR,
\fB\&.IA\fR,
\fB\&.IE\fR,
\fB\&.IND\fR,
\fB\&.LB\fR,
\fB\&.LC\fR,
\fB\&.LE\fR,
\fB\&.LI\fR,
\fB\&.P\fR,
\fB\&.RF\fR,
\fB\&.SM\fR,
\fB\&.TL\fR,
\fB\&.VERBOFF\fR,
\fB\&.VERBON\fR,
\fB\&.WA\fR,
\fB\&.WE\fR\&.
.PP
The following macros are ignored:
.PP
\ \&\fB\&.)E\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.AST\fR,
\fB\&.AV\fR,
\fB\&.AVL\fR,
\fB\&.COVER\fR,
\fB\&.COVEND\fR,
\fB\&.EF\fR,
\fB\&.EH\fR,
\fB\&.EDP\fR,
\fB\&.EPIC\fR,
\fB\&.FC\fR,
\fB\&.FD\fR,
\fB\&.HC\fR,
\fB\&.HM\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.HM\fR,
\fB\&.INITI\fR,
\fB\&.INITR\fR,
\fB\&.INDP\fR,
\fB\&.ISODATE\fR,
\fB\&.MT\fR,
\fB\&.NS\fR,
\fB\&.ND\fR,
\fB\&.OF\fR,
\fB\&.OH\fR,
\fB\&.OP\fR,
\fB\&.PGFORM\fR,
\fB\&.PGNH\fR,
\fB\&.PE\fR,
\fB\&.PF\fR,
\fB\&.PH\fR,
\fB\&.RP\fR,
\fB\&.S\fR,
\fB\&.SA\fR,
\fB\&.SP\fR,
\fB\&.SG\fR,
\fB\&.SK\fR,
\fB\&.TAB\fR,
\fB\&.TB\fR,
\fB\&.TC\fR,
\fB\&.VM\fR,
\fB\&.WC\fR\&.
.PP
The following macros generate warnings:
\fB\&.EC\fR,
\fB\&.EX\fR,
\fB\&.FG\fR,
\fB\&.GETHN\fR,
\fB\&.GETPN\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.LT\fR,
\fB\&.LD\fR,
\fB\&.LO\fR,
\fB\&.MOVE\fR,
\fB\&.MULB\fR,
\fB\&.MULN\fR,
\fB\&.MULE\fR,
\fB\&.NCOL\fR,
\fB\&.nP\fR,
\fB\&.PIC\fR,
\fB\&.RD\fR,
\fB\&.RS\fR,
\fB\&.RE\fR,
\fB\&.SETR\fR
.PP
\ \&\fB\&.BS\fR/\fB\&.BE\fR
and
\fB\&.IA\fR/\fB\&.IE\fR
pairs are passed through\&. The text inside them may need to be deleted or moved\&.
.PP
The mark argument of
\fB\&.ML\fR
is ignored; the following list id formatted as a normal
ItemizedList\&.
.PP
The contents of
\fB\&.DS\fR/\fB\&.DE\fR
or
\fB\&.DF\fR/\fB\&.DE\fR
gets turned into a
Screen
display\&. Arguments controlling presentation\-level formatting are ignored\&.
.SS "Mwww Translation"
.PP
The mwww macros are an extension to the man macros supported by
\fBgroff\fR(1)
for producing web pages\&.
.PP
The
\fBURL\fR,
\fBFTP\fR,
\fBMAILTO\fR,
\fBFTP\fR,
\fBIMAGE\fR,
\fBTAG\fR
tags are translated structurally\&. The
\fBHTMLINDEX\fR,
\fBBODYCOLOR\fR,
\fBBACKGROUND\fR,
\fBHTML\fR, an
\fBLINE\fR
tags are ignored\&.
.SS "TBL Translation"
.PP
All structural features of TBL tables are translated, including both horizontal and vertical spanning with \(oqs\(cq and \(oq^\(cq\&. The \(oql\(cq, \(oqr\(cq, and \(oqc\(cq formats are supported; the \(oqn\(cq column format is rendered as \(oqr\(cq\&. Line continuations with
T{
and
T}
are handled correctly\&. So is
\fB\&.TH\fR\&.
.PP
The
\fBexpand\fR,
\fBbox\fR,
\fBdoublebox\fR,
\fBallbox\fR,
\fBcenter\fR,
\fBleft\fR, and
\fBright\fR
options are supported\&. The GNU synonyms
\fBframe\fR
and
\fBdoubleframe\fR
are also recognized\&. But the distinction between single and double rules and boxes is lost\&.
.PP
Table continuations (\&.T&) are not supported\&.
.PP
If the first nonempty line of text immediately before a table is boldfaced, it is interpreted as a title for the table and the table is generated using a
table
and
title\&. Otherwise the table is translated with
informaltable\&.
.PP
Most other presentation\-level TBL commands are ignored\&. The \(oqb\(cq format qualifier is processed, but point size and width qualifiers are not\&.
.SS "Pic Translation"
.PP
PIC sections are translated to SVG\&.
doclifter
calls out to
\fBpic2plot\fR(1)
to accomplish this; you must have that utility installed for PIC translation to work\&.
.SS "Eqn Translation"
.PP
EQN sections are filtered into embedded MathML with
\fBeqn \-TMathML\fR
if possible, otherwise passed through enclosed in
LiteralLayout
tags\&. After a delim statement has been seen, inline eqn delimiters are translated into an XML processing instruction\&. Exception: inline eqn equations consisting of a single character are translated to an
Emphasis
with a Role attribute of eqn\&.
.SS "Troff Translation"
.PP
The troff translation is meant only to support interpretation of the macro sets\&. It is not useful standalone\&.
.PP
The
\fB\&.nf\fR
and
\fB\&.fi\fR
macros are interpreted as literal\-layout boundaries\&. Calls to the
\fB\&.so\fR
macro either cause inclusion or are translated into XML entity inclusions (see above)\&. Calls to the
\fB\&.ul\fR
and
\fB\&.cu\fR
macros cause following lines to be wrapped in an
Emphasis
tag with a
Remap
attribute of "U"\&. Calls to
\fB\&.ft\fR
generate corresponding start or end emphasis tags\&. Calls to
\fB\&.tr\fR
cause character translation on output\&. Calls to
\fB\&.bp\fR
generate a
BeginPage
tag (in paragraphed text only)\&. Calls to
\fB\&.sp\fR
generate a paragraph break (in paragraphed text only)\&. These are the only troff requests we translate to DocBook\&. The rest of the troff emulation exists because macro packages use it internally to expand macros into elements that might be structural\&.
.PP
Requests relating to macro definitions and strings (\fB\&.ds\fR,
\fB\&.as\fR,
\fB\&.de\fR,
\fB\&.am\fR,
\fB\&.rm\fR,
\fB\&.rn\fR,
\fB\&.em\fR) are processed and expanded\&. The
\fB\&.ig\fR
macro is also processed\&.
.PP
Conditional macros (\fB\&.if\fR,
\fB\&.ie\fR,
\fB\&.el\fR) are handled\&. The built\-in conditions o, n, t, e, and c are evaluated as if for
nroff
on page one of a document\&. String comparisons are evaluated by straight textual comparison\&. All numeric expressions evaluate to true\&.
.PP
The extended
groff
requests
\fBcc\fR,
\fBc2\fR,
\fBab\fR,
\fBals\fR,
\fBdo\fR,
\fBnop\fR, and
\fBreturn\fR
and
\fBshift\fR
are interpreted\&. Its
\fB\&.PSPIC\fR
extension is translated into a
MediaObject\&.
.PP
The
\fB\&.tm\fR
macro writes its arguments to standard error (with
\fB\-t\fR)\&. The
\fB\&.pm\fR
macro reports on defined macros and strings\&. These facilities may aid in debugging your translation\&.
.PP
Some troff escape sequences are lifted:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  1." 4.2
.\}
The \ee escape becomes a bare backslash, \e\&. a period, and \e\- a bare dash\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  2." 4.2
.\}
The troff escapes \e^, \e`, \e\*(Aq \e&, \e0, and \e| are lifted to equivalent ISO special spacing characters\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  3." 4.2
.\}
A \e followed by space is translated to an ISO non\-breaking space entity\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  4." 4.2
.\}
A \e~ is also translated to an ISO non\-breaking space entity; properly this should be a space that can\*(Aqt be used for a linebreak but stretches like ordinary whitepace during line adjustment, but there is no ISO or Unicode entity for that\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  5." 4.2
.\}
The \eu and \ed half\-line motion vertical motion escapes, when paired, become
\fBSuperscript\fR
or
\fBSubscript\fR
tags\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  6." 4.2
.\}
The \ec escape is handled as a line continuation\&. in circumstances where that matters (e\&.g\&. for token\-pasting)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  7." 4.2
.\}
The \ef escape for font changes is translated in various context\-dependent ways\&. First,
\fBdoclifter\fR
looks for cliches involving font changes that have semantic meaning, and lifts to a structural tag\&. If it can\*(Aqt do that, it generates an
Emphasis
tag\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 8.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  8." 4.2
.\}
The \em[] extension is translated into a
phrase
span with a remap attribute carrying the color\&. Note: Stylesheets typically won\*(Aqt render this!
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 9.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  9." 4.2
.\}
Some uses of the \eo request are translated: pairs with a letter followed by one of the characters ` \*(Aq : ^ o ~ are translated to combining forms with diacriticals acute, grave, umlaut, circumflex, ring, and tilde respectively if the corresponding Latin\-1 or Latin\-2 character exists as an ISO literal\&.
.RE
.PP
Other escapes than these will yield warnings or errors\&.
.PP
All other troff requests are ignored but passed through into XML comments\&. A few (such as
\fB\&.ce\fR) also trigger a warning message\&.
.SH "PORTABILITY CHECKING"
.PP
When portability checking is enabled,
\fBdoclifter\fR
emits portability warnings about markup which it can handle but which will break various other viewers and interpreters\&.
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  1." 4.2
.\}
At level 1, it will warn about constructions that would break
\fBman2html\fR(1), (the C program distributed with Linux
\fBman\fR(1), not the older and much less capable Perl script)\&. A close derivative of this code is used in GNOME
yelp\&. This should be the minimum level of portability you aim for, and corresponds to what is recommended on the
\fBgroff_man\fR(7)
manual page\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "  2." 4.2
.\}
At level 2, it will warn about constructions that will break portability back to the Unix classic tools (including long macro names and glyph references with \e[])\&.
.RE
.SH "SEMANTIC ANALYSIS"
.PP
\fBdoclifter\fR
keeps two lists of semantic hints that it picks up from analyzing source documents (especially from parsing command and function synopses)\&. The local list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function formal arguments
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of command options
.RE
.PP
Local hints are used to mark up the individual page from which they are gathered\&. The global list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of functions
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of commands
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function return types
.RE
.PP
If
\fBdoclifter\fR
is applied to multiple files, the global list is retained in memory\&. You can dump a report of global hints at the end of the run with the
\fB\-h\fR
option\&. The format of the hints is as follows:
.sp
.if n \{\
.RS 4
.\}
.nf
\ \&\&.\e" | mark <phrase> as <markup>
.fi
.if n \{\
.RE
.\}
.PP
where
\fB<phrase>\fR
is an item of text and
\fB<markup>\fR
is the DocBook markup text it should be wrapped with whenever it appeared either highlighted or as a word surrounded by whitespace in the source text\&.
.PP
Hints derived from earlier files are also applied to later ones\&. This behavior may be useful when lifting collections of documents that apply to a function or command library\&. What should be more useful is the fact that a hints file dumped with
\fB\-h\fR
can be one of the file arguments to
\fBdoclifter\fR; the code detects this special case and does not write XML output for such a file\&. Thus, a good procedure for lifting a large library is to generate a hints file with a first run, inspect it to delete false positives, and use it as the first input to a second run\&.
.PP
It is also possible to include a hints file directly in a troff sourcefile\&. This may be useful if you want to enrich the file by stages before converting to XML\&.
.SH "TROUBLESHOOTING"
.PP
\fBdoclifter\fR
tries to warn about problems that it can can diagnose but not fix by itself\&. When it says
"look for FIXME", do that in the generated XML; the markup around that token may be wrong\&.
.PP
Occasionally (less than 2% of the time)
\fBdoclifter\fR
will produce invalid DocBook markup even from correct troff markup\&. Usually this results from strange constructions in the source page, or macro calls that are beyond the ability of
\fBdoclifter\fR\*(Aqs macro processor to get right\&. Here are some things to watch for, and how to fix them:
.PP
\fIMalformed command synopses\&.\fR
If you get a message that says
"command synopsis parse failed", look at the XML output\&. It will contain a comment telling you what the command synopsis looked like after preprocessing, and indicate on which token the parse failed (both with a token number and a caret sign inserted in the dump of the synopsis tokens)\&. Try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error token dump tries to insert \(oq$\(cq at the point of the last nesting\-depth increase, but the code that does this is failure\-prone\&.
.PP
\fIConfusing macro calls\&.\fR
Some manual page authors replace standard requests (like
\fB\&.PP\fR,
\fB\&.SH\fR
and
\fB\&.TP\fR) with versions that do different things in
\fBnroff\fR
and
\fBtroff\fR
environments\&. While
\fBdoclifter\fR
tries to cope and usually does a good job, the quirks of [nt]roff are legion and confusing macro calls sometimes lead to bad XML being generated\&. A common symptom of such problems is unclosed
Emphasis
tags\&.
.PP
The message
"possible section nesting error"
means that the program has seen two adjacent subsection headers\&. In man pages, subsections don\*(Aqt have a depth argument, so
\fBdoclifter\fR
cannot be certain how subsections should be nested\&. Any subsection heading between the indicated line and the beginning of the next top\-level section might be wrong and require correcting by hand\&.
.PP
If you\*(Aqre translating a page that uses user\-defined macros and you get bad output, the first thing to do is simplify or eliminate the user\-defined macros\&. Replace them with stock requests where possible\&.
.SH "RETURN VALUES"
.PP
On successful completion, the program returns status 0\&. It returns 1 if some file or standard input could not be translated\&. It returns 2 if one of the input sources was a
\fB\&.so\fR
inclusion\&. It returns 3 if there is an error in reading or writing files\&. It returns 4 to indicate an internal error\&. It returns 5 when aborted by a keyboard interrupt\&.
.PP
Note that a zero return does not guarantee that the output is valid DocBook\&. It will almost always (as in, more than 96% of cases) be syntactically valid XML, but in some rare cases fixups by hand may be necessary to meet the semantics of the DocBook DTD\&. Validation problems are most likely to occur with complicated list markup\&.
.SH "BUGS AND WARNINGS"
.PP
About 4% of man pages will either make this program throw error status 1 or generate invalid XML\&. In almost all such cases the misbehavior is triggered by markup bugs in the source that are too severe to be coped with\&.
.PP
Equation number arguments of EQN calls are ignored\&.
.PP
The function\-synopsis parser is crude (it\*(Aqs not a compiler) and prone to errors\&. Function\-synopsis markup should be checked carefully by a human\&.
.PP
If a man page has both paragraphed text in a Synopsis section and also a body section before the Synopis section, bad things will happen\&.
.PP
Running text (e\&.g\&., explanatory notes) at the end of a Synopsis section cannot reliably be distinguished from synopsis\-syntax markup\&. (This problem is AI\-complete\&.)
.PP
Some firewalls put in to cope with common malformations in troff code mean that the tail end of a span between two
\fB\ef{B,I,U,(CW}\fR
or
\fB\&.ft\fR
highlight changes may not be completely covered by corresponding
Emphasis
macros if (for example) the span crosses a boundary between filled and unfilled (\fB\&.nf\fR/\fB\&.fi\fR) text\&.
.PP
The treatment of conditionals relies on the assumption that conditional macros never generate structural or font\-highlight markup that differs between the if and else branches\&. This appears to be true of all the standard macro packages, but if you roll any of your own macros you\*(Aqre on your own\&.
.PP
Macro definitions in a manual page NAME section are not interpreted\&.
.PP
In Berkeley mdoc interpretation, handling of
\fB\&.Xo\fR/\fB\&.Xc\fR
enclosures is failure\-prone\&.
.PP
Uses of \ec for line continuation sometimes are not translated, leaving the \ec in the output XML\&. The program will print a warning when this occurs\&.
.PP
It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag in running text\&. You\*(Aqll have to check for these by hand\&.
.PP
The line numbers in
\fBdoclifter\fR
error messages are unreliable in the presence of
\fB\&.EQ/\&.EN\fR,
\fB\&.PS/\&.PE\fR, and quantum fluctuations\&.
.SH "OLD MACRO SETS"
.PP
There is a conflict between Berkeley ms\*(Aqs documented
\fB\&.P1\fR
print\-header\-on\-page request and an undocumented Bell Labs use for displayed program and equation listings\&. The
\fBms\fR
translator uses the Bell Labs interpretation when
\fB\&.P2\fR
is present in the document, and otherwise ignores the request\&.
.SH "REQUIREMENTS"
.PP
The
\fBpic2plot\fR(1)
utility must be installed in order to translate PIC diagrams to SVG\&.
.SH "SEE ALSO"
.PP
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7),
\fBmwww\fR(7),
\fBtroff\fR(1)\&.
.SH "AUTHOR"
.PP
Eric S\&. Raymond
esr@thyrsus\&.com
.PP
There is a project web page at
\m[blue]\fBhttp://www\&.catb\&.org/~esr/doclifter/\fR\m[]\&.