1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125
|
'\" t
.\" Title: doclifter
.\" Author: [see the "Author" section]
.\" Generator: DocBook XSL Stylesheets v1.75.2 <http://docbook.sf.net/>
.\" Date: 11/25/2010
.\" Manual: Documentation Tools
.\" Source: doclifter
.\" Language: English
.\"
.TH "DOCLIFTER" "1" "11/25/2010" "doclifter" "Documentation Tools"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
doclifter \- translate troff requests into DocBook
.SH "SYNOPSIS"
.HP \w'\fBdoclifter\fR\ 'u
\fBdoclifter\fR [\-e\ \fIencoding\fR] [\-h\ \fIhintfile\fR] [\-q] [\-x] [\-v] [\-w] [\-D\ \fItoken=type\fR] [\-I\ \fIpath\fR] \fIfile\fR...
.SH "DESCRIPTION"
.PP
\fBdoclifter\fR
translates documents written in troff macros to DocBook\&. Structural subsets of the requests in
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7), and
\fBtroff\fR(1)
are supported\&.
.PP
The translation brings over all the structure of the original document at section, subsection, and paragraph level\&. Command and C function synopses are translated into DocBook markup, not just a verbatim display\&. Tables (TBL markup) are translated into DocBook table markup\&. PIC diagrams are translated into SVG\&. Troff\-level information that might have structural implications is preserved in XML comments\&.
.PP
Where possible, font\-change macros are translated into structural markup\&.
\fBdoclifter\fR
recognizes stereotyped patterns of markup and content (such as the use of italics in a FILES section to mark filenames) and lifts them\&. A means to edit, add, and save semantic hints about highlighting is supported\&.
.PP
Some cliches are recognized and lifted to structural markup even without highlighting\&. Patterns recognized include such things as URLs, email addresses, man page references, and C program listings\&.
.PP
The tag
\fB\&.in\fR
and
\fB\&.ti\fR
requests are passed through with complaints\&. They indicate presentation\-level markup that
\fBdoclifter\fR
cannot translate into structure; the output will require hand\-fixing\&.
.PP
The tag
\fB\&.ta\fR
is passed through with a complaint unless followed by text lines containing tabs, in which case the following span of lines containing tabs is lifted to a table\&.
.PP
Under some circumstances,
\fBdoclifter\fR
can even lift formatted manual pages and the text output produced by
\fBlynx\fR(1)
from HTML\&. If it finds no macros in the input, but does find a NAME section header, it tries to interpret the plain text as a manual page (skipping boilerplate headers and footers generated by
\fBlynx\fR(1))\&. Translations produced in this way will be prone to miss structural features, but this fallback is good enough for simple man pages\&.
.PP
\fBdoclifter\fR
does not do a perfect job, merely an extremely good one\&. Final polish should be applied by a human being capable of recognizing patterns too subtle for a computer\&. But
\fBdoclifter\fR
will almost always produce translations that are good enough to be usable before hand\-hacking\&.
.PP
See the
Troubleshooting
section for discussion of how to solve document conversion problems\&.
.SH "OPTIONS"
.PP
If called without arguments
\fBdoclifter\fR
acts as a filter, translating troff source input on standard input to DocBook markup on standard output\&. If called with arguments, each argument file is translated separately (but hints are retained, see below); the suffix
\&.xml
is given to the translated output\&.
.PP
\-h
.RS 4
Name a file to which information on semantic hints gathered during analysis should be written\&.
.RE
.PP
\-D
.RS 4
The
\fB\-D\fR
allows you to post a hint\&. This may be useful, for example, if
\fBdoclifter\fR
is mis\-parsing a synopsis because it doesn\*(Aqt recognize a token as a command\&. This hint is merged after hints in the input source have been read\&.
.RE
.PP
\-I
.RS 4
The
\fB\-I\fR
option adds its argument to the include path used when docfilter searches for inclusions\&. The include path is initially just the current directory\&.
.RE
.PP
\-e
.RS 4
The
\fB\-e\fR
allows you to set the encoding field to be emitted in the output XML\&. It defaults to ISO\-8859\-1 (Latin\-1)\&.
.RE
.PP
\-q
.RS 4
Normally, requests that
\fBdoclifter\fR
could not interpret (usually because they\*(Aqre presentation\-level) are passed through to XML comments in the output\&. The \-q option suppresses this\&. It also suppresses listing of macros\&. Messages about requests that are unrecognized or cannot be translated go to standard error whatever the state of this option\&. This option is intended to reduce clutter when you believe you have a clean lift of a document and want to lose the troff legacy\&.
.RE
.PP
\-x
.RS 4
The \-x option requests that
\fBdoclifter\fR
generated DocBook version 5 compatible xml content, rather than its default DocBook version 4\&.4 output\&. Inclusions and entities may not be handled correctly with this switch enabled\&.
.RE
.PP
\-v
.RS 4
The \-v option makes
\fBdoclifter\fR
noisier about what it\*(Aqs doing\&. This is mainly useful for debugging\&.
.RE
.PP
\-w
.RS 4
Enable strict portability checking\&. Multiple instances of \-w increase the strictness\&. See
the section called \(lqPORTABILITY CHECKING\(rq\&.
.RE
.SH "TRANSLATION RULES"
.PP
Overall, you can expect that font changes will be turned into
Emphasis
macros with a
Remap
attribute taken from the troff font name\&. The basic font names are R, I, B, U, CW, and SM\&.
.PP
Troff and macro\-package special character escapes are mapped into ISO character entities\&.
.PP
When
\fBdoclifter\fR
encounters a
\fB\&.so\fR
directive, it searches for the file\&. If it can get read access to the file, and open it, and the file consists entirely of command lines and comments, then it is included\&. If any of these conditions fails, an entity reference for it is generated\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a display such as is generated by
\fB\&.DS/\&.DE\fR\&. It repeatedly tries to parse first a function synopsis, and then plain text off what remains in the display\&. Thus, most inline C function prototypes will be lifted to structured markup\&.
.PP
Some notes on specific translations:
.SS "Man Translation"
.PP
\fBdoclifter\fR
does a good job on most man pages, It knows about the extended
\fBUR\fR/\fBUE\fR/\fBUN\fR
requests supported under Linux\&. If any
\fB\&.UR\fR
request is present, it will translate these but not wrap URLs outide them with
Ulink
tags\&. It also knows about the extended
\fB\&.L\fR
(literal) font markup from Bell Labs Version 8, and its friends\&.
.PP
The
\fB\&.TH\fR
macro is used to generate a
RefMeta
section\&. If present, the date/source/manual arguments (see
\fBman\fR(7)) are wrapped in
RefMiscInfo
tag pairs with those class attributes\&. Note that
\fBdoclifter\fR
does not change the date\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a synopsis section\&. It repeatedly tries to parse first a function synopsis, then a command synopsis, and then plain text off what remains in the section\&.
.PP
The following man macros are translated into emphasis tags with a remap attribute:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.L\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.BL\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.IL\fR,
\fB\&.RB\fR,
\fB\&.RI\fR,
\fB\&.RL\fR,
\fB\&.LB\fR,
\fB\&.LI\fR,
\fB\&.LR\fR,
\fB\&.SB\fR,
\fB\&.SM\fR\&. Some stereotyped patterns involving these macros are recognized and turned into semantic markup\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.LP\fR,
\fB\&.PP\fR,
\fB\&.P\fR,
\fB\&.HP\fR, and the single\-argument form of
\fB\&.IP\fR\&.
.PP
The two\-argument form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character)\&.
.PP
The following macros are translated semantically:
\fB\&.SH\fR,\fB\&.SS\fR,
\fB\&.TP\fR,
\fB\&.UR\fR,
\fB\&.UE\fR,
\fB\&.UN\fR,
\fB\&.IX\fR\&. A
\fB\&.UN\fR
call just before
\fB\&.SH\fR
or
\fB\&.SS\fR
sets the ID for the new section\&.
.PP
The
\fB\e*R\fR,
\fB\e*(Tm\fR,
\fB\e*(lq\fR, and
\fB\e*(rq\fR
symbols are translated\&.
.PP
The following (purely presentation\-level) macros are ignored:
\fB\&.PD\fR,\fB\&.DT\fR\&.
.PP
The
\fB\&.RS\fR/\fB\&.RE\fR
macros are translated differently depending on whether or not they precede list markup\&. When
\fB\&.RS\fR
occurs just before
\fB\&.TP\fR
or
\fB\&.IP\fR
the result is nested lists\&. Otherwise, the
\fB\&.RS\fR/\fB\&.RE\fR
pair is translated into a Blockquote tag\-pair\&.
.PP
\fB\&.DS\fR/\fB\&.DE\fR
is not part of the documented man macro set, but is recognized because it shows up with some frequency on legacy man pages from older Unixes\&.
.PP
Certain extension macros originally defined under Ultrix are translated structurally, including those that occasionally show up on the manual pages of Linux and other open\-source Unixes\&.
\fB\&.EX\fR/\fB\&.EE\fR
(and the synonyms
\fB\&.Ex\fR/\fB\&.Ee\fR),
\fB\&.Ds\fR/\fB\&.De\fR,
\fB\&.NT\fR/\fB\&.NE\fR,
\fB\&.PN\fR, and
\fB\&.MS\fR
are translated structurally\&.
.PP
The following extension macros used by the X distribution are also recognized and translated structurally:
\fB\&.FD\fR,
\fB\&.FN\fR,
\fB\&.IN\fR,
\fB\&.ZN\fR,
\fB\&.hN\fR, and
\fB\&.C{\fR/\fB\&.C}\fR
The
\fB\&.TA\fR
and
\fBIN\fR
requests are ignored\&.
.PP
When the man macros are active, any
\fB\&.Pp\fR
macro definition containing the request
\fB\&.PP\fR
will be ignored\&. and all instances of
\fB\&.Pp\fR
replaced with
\fB\&.PP\fR\&. Similarly,
\fB\&.Tp\fR
will be replaced with
\fB\&.TP\fR\&. This is the least painful way to deal with some frequently\-encountered stereotyped wrapper definitions that would otherwise cause serious interpretation problems
.PP
Known problem areas with man translation:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Weird uses of
\fB\&.TP\fR\&. These will sometime generate invalid XML and sometimes result in a FIXME comment in the generated XML (a warning message will also go to standard error)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
It is debatable how the man macros
\fB\&.HP\fR
and
\fB\&.IP\fR
without tag should be translated\&. We treat them as an ordinary paragraph break\&. We could visually simulate a hanging paragraph with list markup, but this would not be a structural translation\&.
.RE
.SS "Pod2man Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros produced by
\fBpod2man\fR
(\fB\&.Sh\fR,
\fB\&.Sp\fR,
\fB\&.Ip\fR,
\fB\&.Vb\fR,
\fB\&.Ve\fR) and translates them structurally\&.
.PP
The results of lifting pages produced by
\fBpod2man\fR
should be checked carefully by eyeball, especially the rendering of command and function synopses\&.
\fBPod2man\fR
generates rather perverse markup;
\fBdoclifter\fR\*(Aqs struggle to untangle it is sometimes in vain\&.
.PP
If possible, generate your DocBook from the POD sources\&. There is a
pod2docbook
module on CPAN that does this\&.
.SS "Tkman Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros used by the Tcl/Tk documentation system:
\fB\&.AP\fR,
\fB\&.AS\fR,
\fB\&.BS\fR,
\fB\&.BE\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.DS\fR,
\fB\&.DE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR,
\fB\&.UL\fR,
\fB\&.VS\fR,
\fB\&.VE\fR\&. The
\fB\&.AP\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR, and
\fB\&.UL\fR
macros are translated structurally\&.
.SS "Mandoc Translation"
.PP
\fBdoclifter\fR
should be able to do an excellent job on most
\fBmdoc\fR(7)
pages, because this macro package expresses a lot of semantic structure\&.
.PP
Known problems with mandoc translation: All
\fB\&.Bd\fR/\fB\&.Ed\fR
display blocks are translated as
LiteralLayout
tag pairs
\&.
.SS "Ms Translation"
.PP
\fBdoclifter\fR
does a good job on most ms pages\&. One weak spot to watch out for is the generation of Author and Affiliation tags\&. The heuristics used to mine this information out of the
\fB\&.AU\fR
section work for authors who format their names in the way usual for English (e\&.g\&. "M\&. E\&. Lesk", "Eric S\&. Raymond") but are quite brittle\&.
.PP
For a document to be recognized as containing ms markup, it must have the extension
\&.ms\&. This avoids problems with false positives\&.
.PP
The
\fB\&.TL\fR,
\fB\&.AU\fR,
\fB\&.AI\fR, and
\fB\&.AE\fR
macros turn into article metainformation in the expected way\&. The
\fB\&.PP\fR,
\fB\&.LP\fR,
\fB\&.SH\fR, and
\fB\&.NH\fR
macros turn into paragraph and section structure\&. The tagged form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character); the untagged version is treated as an ordinary paragraph break\&.
.PP
The
\fB\&.DS\fR/\fB\&.DE\fR
pair is translated to a
LiteralLayout
tag pair
\&. The
\fB\&.FS\fR/\fB\&.FE\fR
pair is translated to a
Footnote
tag pair\&. The
\fB\&.QP\fR/\fB\&.QS\fR/\fB\&.QE\fR
requests define
BlockQuotes\&.
.PP
The
\fB\&.UL\fR
font change is mapped to U\&.
\fB\&.SM\fR
and
\fB\&.LG\fR
become numeric plus or minus size steps suffixed to the
Remap
attribute\&.
.PP
The
\fB\&.B1\fR
and
\fB\&.B2\fR
box macros are translated to a
Sidebar
tag pair\&.
.PP
All macros relating to page footers, multicolumn mode, and keeps are ignored (\fB\&.ND\fR,
\fB\&.DA\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.MC\fR,
\fB\&.BX\fR,
\fB\&.KS\fR,
\fB\&.KE\fR,
\fB\&.KF\fR)\&. The
\fB\&.R\fR,
\fB\&.RS\fR, and
\fB\&.RE\fR
macros are ignored as well\&.
.SS "Me Translation"
.PP
Translation of me documents tends to produce crude results that need a lot of hand\-hacking\&. The format has little usable structure, and documents written in it tend to use a lot of low\-level troff macros; both these properties tend to confuse
\fBdoclifter\fR\&.
.PP
For a document to be recognized as containing me markup, it must have the extension
\&.me\&. This avoids problems with false positives\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.lp\fR,
\fB\&.pp\fR\&. The
\fB\&.ip\fR
macro is translated into a
VariableList\&. The
\fB\&.bp\fR
macro is translated into an
ItemizedList\&. The
\fB\&.np\fR
macro is translated into an
OrderedList\&.
.PP
The b, i, and r fonts are mapped to emphasis tags with B, I, and R
Remap
attributes\&. The
\fB\&.rb\fR
("real bold") font is treated the same as
\fB\&.b\fR\&.
.PP
\fB\&.q(\fR/\fB\&.q)\fR
is translated structurally
\&.
.PP
Most other requests are ignored\&.
.SS "Mm Translation"
.PP
Memorandum Macros documents translate well, as these macros carry a lot of structural information\&. The translation rules are tuned for Memorandum or Released Paper styles; information associated with external\-letter style will be preserved in comments\&.
.PP
For a document to be recognized as containing mm markup, it must have the extension
\&.mm\&. This avoids problems with false positives\&.
.PP
The following highlight macros are translated int Emphasis tags:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.R\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.RB\fR,
\fB\&.RI\fR\&.
.PP
The following macros are structurally translated:
\fB\&.AE\fR,
\fB\&.AF\fR,
\fB\&.AL\fR,
\fB\&.RL\fR,
\fB\&.APP\fR,
\fB\&.APPSK\fR,
\fB\&.AS\fR,
\fB\&.AT\fR,
\fB\&.AU\fR,
\fB\&.B1\fR,
\fB\&.B2\fR,
\fB\&.BE\fR,
\fB\&.BL\fR,
\fB\&.ML\fR,
\fB\&.BS\fR,
\fB\&.BVL\fR,
\fB\&.VL\fR,
\fB\&.DE\fR,
\fB\&.DL\fR
\fB\&.DS\fR,
\fB\&.FE\fR,
\fB\&.FS\fR,
\fB\&.H\fR,
\fB\&.HU\fR,
\fB\&.IA\fR,
\fB\&.IE\fR,
\fB\&.IND\fR,
\fB\&.LB\fR,
\fB\&.LC\fR,
\fB\&.LE\fR,
\fB\&.LI\fR,
\fB\&.P\fR,
\fB\&.RF\fR,
\fB\&.SM\fR,
\fB\&.TL\fR,
\fB\&.VERBOFF\fR,
\fB\&.VERBON\fR,
\fB\&.WA\fR,
\fB\&.WE\fR\&.
.PP
The following macros are ignored:
.PP
\ \&\fB\&.)E\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.AST\fR,
\fB\&.AV\fR,
\fB\&.AVL\fR,
\fB\&.COVER\fR,
\fB\&.COVEND\fR,
\fB\&.EF\fR,
\fB\&.EH\fR,
\fB\&.EDP\fR,
\fB\&.EPIC\fR,
\fB\&.FC\fR,
\fB\&.FD\fR,
\fB\&.HC\fR,
\fB\&.HM\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.HM\fR,
\fB\&.INITI\fR,
\fB\&.INITR\fR,
\fB\&.INDP\fR,
\fB\&.ISODATE\fR,
\fB\&.MT\fR,
\fB\&.NS\fR,
\fB\&.ND\fR,
\fB\&.OF\fR,
\fB\&.OH\fR,
\fB\&.OP\fR,
\fB\&.PGFORM\fR,
\fB\&.PGNH\fR,
\fB\&.PE\fR,
\fB\&.PF\fR,
\fB\&.PH\fR,
\fB\&.RP\fR,
\fB\&.S\fR,
\fB\&.SA\fR,
\fB\&.SP\fR,
\fB\&.SG\fR,
\fB\&.SK\fR,
\fB\&.TAB\fR,
\fB\&.TB\fR,
\fB\&.TC\fR,
\fB\&.VM\fR,
\fB\&.WC\fR\&.
.PP
The following macros generate warnings:
\fB\&.EC\fR,
\fB\&.EX\fR,
\fB\&.FG\fR,
\fB\&.GETHN\fR,
\fB\&.GETPN\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.LT\fR,
\fB\&.LD\fR,
\fB\&.LO\fR,
\fB\&.MOVE\fR,
\fB\&.MULB\fR,
\fB\&.MULN\fR,
\fB\&.MULE\fR,
\fB\&.NCOL\fR,
\fB\&.nP\fR,
\fB\&.PIC\fR,
\fB\&.RD\fR,
\fB\&.RS\fR,
\fB\&.RE\fR,
\fB\&.SETR\fR
.PP
\ \&\fB\&.BS\fR/\fB\&.BE\fR
and
\fB\&.IA\fR/\fB\&.IE\fR
pairs are passed through\&. The text inside them may need to be deleted or moved\&.
.PP
The mark argument of
\fB\&.ML\fR
is ignored; the following list id formatted as a normal
ItemizedList\&.
.PP
The contents of
\fB\&.DS\fR/\fB\&.DE\fR
or
\fB\&.DF\fR/\fB\&.DE\fR
gets turned into a
Screen
display\&. Arguments controlling presentation\-level formatting are ignored\&.
.SS "Mwww Translation"
.PP
The mwww macros are an extension to the man macros supported by
\fBgroff\fR(1)
for producing web pages\&.
.PP
The
\fBURL\fR,
\fBFTP\fR,
\fBMAILTO\fR,
\fBFTP\fR,
\fBIMAGE\fR,
\fBTAG\fR
tags are translated structurally\&. The
\fBHTMLINDEX\fR,
\fBBODYCOLOR\fR,
\fBBACKGROUND\fR,
\fBHTML\fR, an
\fBLINE\fR
tags are ignored\&.
.SS "TBL Translation"
.PP
All structural features of TBL tables are translated, including both horizontal and vertical spanning with \(oqs\(cq and \(oq^\(cq\&. The \(oql\(cq, \(oqr\(cq, and \(oqc\(cq formats are supported; the \(oqn\(cq column format is rendered as \(oqr\(cq\&. Line continuations with
T{
and
T}
are handled correctly\&. So is
\fB\&.TH\fR\&.
.PP
The
\fBexpand\fR,
\fBbox\fR,
\fBdoublebox\fR,
\fBallbox\fR,
\fBcenter\fR,
\fBleft\fR, and
\fBright\fR
options are supported\&. The GNU synonyms
\fBframe\fR
and
\fBdoubleframe\fR
are also recognized\&. But the distinction between single and double rules and boxes is lost\&.
.PP
Table continuations (\&.T&) are not supported\&.
.PP
If the first nonempty line of text immediately before a table is boldfaced, it is interpreted as a title for the table and the table is generated using a
table
and
title\&. Otherwise the table is translated with
informaltable\&.
.PP
Most other presentation\-level TBL commands are ignored\&. The \(oqb\(cq format qualifier is processed, but point size and width qualifiers are not\&.
.SS "Pic Translation"
.PP
PIC sections are translated to SVG\&.
doclifter
calls out to
\fBpic2plot\fR(1)
to accomplish this; you must have that utility installed for PIC translation to work\&.
.SS "Eqn Translation"
.PP
EQN sections are filtered into embedded MathML with
\fBeqn \-TMathML\fR
if possible, otherwise passed through enclosed in
LiteralLayout
tags\&. After a delim statement has been seen, inline eqn delimiters are translated into an XML processing instruction\&. Exception: inline eqn equations consisting of a single character are translated to an
Emphasis
with a Role attribute of eqn\&.
.SS "Troff Translation"
.PP
The troff translation is meant only to support interpretation of the macro sets\&. It is not useful standalone\&.
.PP
The
\fB\&.nf\fR
and
\fB\&.fi\fR
macros are interpreted as literal\-layout boundaries\&. Calls to the
\fB\&.so\fR
macro either cause inclusion or are translated into XML entity inclusions (see above)\&. Calls to the
\fB\&.ul\fR
and
\fB\&.cu\fR
macros cause following lines to be wrapped in an
Emphasis
tag with a
Remap
attribute of "U"\&. Calls to
\fB\&.ft\fR
generate corresponding start or end emphasis tags\&. Calls to
\fB\&.tr\fR
cause character translation on output\&. Calls to
\fB\&.bp\fR
generate a
BeginPage
tag (in paragraphed text only)\&. Calls to
\fB\&.sp\fR
generate a paragraph break (in paragraphed text only)\&. These are the only troff requests we translate to DocBook\&. The rest of the troff emulation exists because macro packages use it internally to expand macros into elements that might be structural\&.
.PP
Requests relating to macro definitions and strings (\fB\&.ds\fR,
\fB\&.as\fR,
\fB\&.de\fR,
\fB\&.am\fR,
\fB\&.rm\fR,
\fB\&.rn\fR,
\fB\&.em\fR) are processed and expanded\&. The
\fB\&.ig\fR
macro is also processed\&.
.PP
Conditional macros (\fB\&.if\fR,
\fB\&.ie\fR,
\fB\&.el\fR) are handled\&. The built\-in conditions o, n, t, e, and c are evaluated as if for
nroff
on page one of a document\&. String comparisons are evaluated by straight textual comparison\&. All numeric expressions evaluate to true\&.
.PP
The extended
groff
requests
\fBcc\fR,
\fBc2\fR,
\fBab\fR,
\fBals\fR,
\fBdo\fR,
\fBnop\fR, and
\fBreturn\fR
and
\fBshift\fR
are interpreted\&. Its
\fB\&.PSPIC\fR
extension is translated into a
MediaObject\&.
.PP
The
\fB\&.tm\fR
macro writes its arguments to standard error (with
\fB\-t\fR)\&. The
\fB\&.pm\fR
macro reports on defined macros and strings\&. These facilities may aid in debugging your translation\&.
.PP
Some troff escape sequences are lifted:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
The \ee escape becomes a bare backslash, \e\&. a period, and \e\- a bare dash\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
The troff escapes \e^, \e`, \e\*(Aq \e&, \e0, and \e| are lifted to equivalent ISO special spacing characters\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
A \e followed by space is translated to an ISO non\-breaking space entity\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
A \e~ is also translated to an ISO non\-breaking space entity; properly this should be a space that can\*(Aqt be used for a linebreak but stretches like ordinary whitepace during line adjustment, but there is no ISO or Unicode entity for that\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
The \eu and \ed half\-line motion vertical motion escapes, when paired, become
\fBSuperscript\fR
or
\fBSubscript\fR
tags\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
The \ec escape is handled as a line continuation\&. in circumstances where that matters (e\&.g\&. for token\-pasting)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 7." 4.2
.\}
The \ef escape for font changes is translated in various context\-dependent ways\&. First,
\fBdoclifter\fR
looks for cliches involving font changes that have semantic meaning, and lifts to a structural tag\&. If it can\*(Aqt do that, it generates an
Emphasis
tag\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 8.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 8." 4.2
.\}
The \em[] extension is translated into a
phrase
span with a remap attribute carrying the color\&. Note: Stylesheets typically won\*(Aqt render this!
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 9.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 9." 4.2
.\}
Some uses of the \eo request are translated: pairs with a letter followed by one of the characters ` \*(Aq : ^ o ~ are translated to combining forms with diacriticals acute, grave, umlaut, circumflex, ring, and tilde respectively if the corresponding Latin\-1 or Latin\-2 character exists as an ISO literal\&.
.RE
.PP
Other escapes than these will yield warnings or errors\&.
.PP
All other troff requests are ignored but passed through into XML comments\&. A few (such as
\fB\&.ce\fR) also trigger a warning message\&.
.SH "PORTABILITY CHECKING"
.PP
When portability checking is enabled,
\fBdoclifter\fR
emits portability warnings about markup which it can handle but which will break various other viewers and interpreters\&.
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
At level 1, it will warn about constructions that would break
\fBman2html\fR(1), (the C program distributed with Linux
\fBman\fR(1), not the older and much less capable Perl script)\&. A close derivative of this code is used in GNOME
yelp\&. This should be the minimum level of portability you aim for, and corresponds to what is recommended on the
\fBgroff_man\fR(7)
manual page\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
At level 2, it will warn about constructions that will break portability back to the Unix classic tools (including long macro names and glyph references with \e[])\&.
.RE
.SH "SEMANTIC ANALYSIS"
.PP
\fBdoclifter\fR
keeps two lists of semantic hints that it picks up from analyzing source documents (especially from parsing command and function synopses)\&. The local list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function formal arguments
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of command options
.RE
.PP
Local hints are used to mark up the individual page from which they are gathered\&. The global list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of functions
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of commands
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function return types
.RE
.PP
If
\fBdoclifter\fR
is applied to multiple files, the global list is retained in memory\&. You can dump a report of global hints at the end of the run with the
\fB\-h\fR
option\&. The format of the hints is as follows:
.sp
.if n \{\
.RS 4
.\}
.nf
\ \&\&.\e" | mark <phrase> as <markup>
.fi
.if n \{\
.RE
.\}
.PP
where
\fB<phrase>\fR
is an item of text and
\fB<markup>\fR
is the DocBook markup text it should be wrapped with whenever it appeared either highlighted or as a word surrounded by whitespace in the source text\&.
.PP
Hints derived from earlier files are also applied to later ones\&. This behavior may be useful when lifting collections of documents that apply to a function or command library\&. What should be more useful is the fact that a hints file dumped with
\fB\-h\fR
can be one of the file arguments to
\fBdoclifter\fR; the code detects this special case and does not write XML output for such a file\&. Thus, a good procedure for lifting a large library is to generate a hints file with a first run, inspect it to delete false positives, and use it as the first input to a second run\&.
.PP
It is also possible to include a hints file directly in a troff sourcefile\&. This may be useful if you want to enrich the file by stages before converting to XML\&.
.SH "TROUBLESHOOTING"
.PP
\fBdoclifter\fR
tries to warn about problems that it can can diagnose but not fix by itself\&. When it says
"look for FIXME", do that in the generated XML; the markup around that token may be wrong\&.
.PP
Occasionally (less than 2% of the time)
\fBdoclifter\fR
will produce invalid DocBook markup even from correct troff markup\&. Usually this results from strange constructions in the source page, or macro calls that are beyond the ability of
\fBdoclifter\fR\*(Aqs macro processor to get right\&. Here are some things to watch for, and how to fix them:
.PP
\fIMalformed command synopses\&.\fR
If you get a message that says
"command synopsis parse failed", look at the XML output\&. It will contain a comment telling you what the command synopsis looked like after preprocessing, and indicate on which token the parse failed (both with a token number and a caret sign inserted in the dump of the synopsis tokens)\&. Try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error token dump tries to insert \(oq$\(cq at the point of the last nesting\-depth increase, but the code that does this is failure\-prone\&.
.PP
\fIConfusing macro calls\&.\fR
Some manual page authors replace standard requests (like
\fB\&.PP\fR,
\fB\&.SH\fR
and
\fB\&.TP\fR) with versions that do different things in
\fBnroff\fR
and
\fBtroff\fR
environments\&. While
\fBdoclifter\fR
tries to cope and usually does a good job, the quirks of [nt]roff are legion and confusing macro calls sometimes lead to bad XML being generated\&. A common symptom of such problems is unclosed
Emphasis
tags\&.
.PP
The message
"possible section nesting error"
means that the program has seen two adjacent subsection headers\&. In man pages, subsections don\*(Aqt have a depth argument, so
\fBdoclifter\fR
cannot be certain how subsections should be nested\&. Any subsection heading between the indicated line and the beginning of the next top\-level section might be wrong and require correcting by hand\&.
.PP
If you\*(Aqre translating a page that uses user\-defined macros and you get bad output, the first thing to do is simplify or eliminate the user\-defined macros\&. Replace them with stock requests where possible\&.
.SH "RETURN VALUES"
.PP
On successful completion, the program returns status 0\&. It returns 1 if some file or standard input could not be translated\&. It returns 2 if one of the input sources was a
\fB\&.so\fR
inclusion\&. It returns 3 if there is an error in reading or writing files\&. It returns 4 to indicate an internal error\&. It returns 5 when aborted by a keyboard interrupt\&.
.PP
Note that a zero return does not guarantee that the output is valid DocBook\&. It will almost always (as in, more than 96% of cases) be syntactically valid XML, but in some rare cases fixups by hand may be necessary to meet the semantics of the DocBook DTD\&. Validation problems are most likely to occur with complicated list markup\&.
.SH "BUGS AND WARNINGS"
.PP
About 4% of man pages will either make this program throw error status 1 or generate invalid XML\&. In almost all such cases the misbehavior is triggered by markup bugs in the source that are too severe to be coped with\&.
.PP
Equation number arguments of EQN calls are ignored\&.
.PP
The function\-synopsis parser is crude (it\*(Aqs not a compiler) and prone to errors\&. Function\-synopsis markup should be checked carefully by a human\&.
.PP
If a man page has both paragraphed text in a Synopsis section and also a body section before the Synopis section, bad things will happen\&.
.PP
Running text (e\&.g\&., explanatory notes) at the end of a Synopsis section cannot reliably be distinguished from synopsis\-syntax markup\&. (This problem is AI\-complete\&.)
.PP
Some firewalls put in to cope with common malformations in troff code mean that the tail end of a span between two
\fB\ef{B,I,U,(CW}\fR
or
\fB\&.ft\fR
highlight changes may not be completely covered by corresponding
Emphasis
macros if (for example) the span crosses a boundary between filled and unfilled (\fB\&.nf\fR/\fB\&.fi\fR) text\&.
.PP
The treatment of conditionals relies on the assumption that conditional macros never generate structural or font\-highlight markup that differs between the if and else branches\&. This appears to be true of all the standard macro packages, but if you roll any of your own macros you\*(Aqre on your own\&.
.PP
Macro definitions in a manual page NAME section are not interpreted\&.
.PP
In Berkeley mdoc interpretation, handling of
\fB\&.Xo\fR/\fB\&.Xc\fR
enclosures is failure\-prone\&.
.PP
Uses of \ec for line continuation sometimes are not translated, leaving the \ec in the output XML\&. The program will print a warning when this occurs\&.
.PP
It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag in running text\&. You\*(Aqll have to check for these by hand\&.
.PP
The line numbers in
\fBdoclifter\fR
error messages are unreliable in the presence of
\fB\&.EQ/\&.EN\fR,
\fB\&.PS/\&.PE\fR, and quantum fluctuations\&.
.SH "OLD MACRO SETS"
.PP
There is a conflict between Berkeley ms\*(Aqs documented
\fB\&.P1\fR
print\-header\-on\-page request and an undocumented Bell Labs use for displayed program and equation listings\&. The
\fBms\fR
translator uses the Bell Labs interpretation when
\fB\&.P2\fR
is present in the document, and otherwise ignores the request\&.
.SH "REQUIREMENTS"
.PP
The
\fBpic2plot\fR(1)
utility must be installed in order to translate PIC diagrams to SVG\&.
.SH "SEE ALSO"
.PP
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7),
\fBmwww\fR(7),
\fBtroff\fR(1)\&.
.SH "AUTHOR"
.PP
Eric S\&. Raymond
esr@thyrsus\&.com
.PP
There is a project web page at
\m[blue]\fBhttp://www\&.catb\&.org/~esr/doclifter/\fR\m[]\&.
|