1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992
|
.\" this document requires the tmac.wrprc macros
.\"
.\" $(TROFF) $(MSMACROS) tmac.wrprc thisfile
.\"
.\" revision date - change whenever this file is edited
.ds RD 9 March 1997
.\"
.EH 'tc2html Notes'- % -''
.OH ''- % -'tc2html Notes'
.OF 'Revision date:\0\0\*(RD''Printed:\0\0\n(dy \*(MO 19\n(yr'
.EF 'Revision date:\0\0\*(RD''Printed:\0\0\n(dy \*(MO 19\n(yr'
.\"
.de St \" troffcvt special text
\\&\\$3\fB@\\$1\fR\\$2
..
.de Cl \" troffcvt or RTF control
\\&\\$3\fB\e\\$1\fR\\$2
..
.de Rq \" troff request
\\&\\$3\fB\.\\$1\fR\\$2
..
.de Es \" troff escape
\\&\\$3\fB\e\\$1\fR\\$2
..
.TL
.ps +2
tc2html Notes
.ps
.AU
Paul DuBois
.H*ahref mailto:dubois@primate.wisc.edu
dubois@primate.wisc.edu
.H*aend
.AI
.H*ahref http://www.primate.wisc.edu/
Wisconsin Regional Primate Research Center
.H*aend
Revision date:\0\0\*(RD
.\"
.H*toc*title "Table of Contents"
.\"
.Ah Introduction
.\"
.LP
.I tc2html
is a postprocessor for converting
.I troffcvt
output to HTML.
It's used by the
.I troff2html
front end.
This document describes how
.I tc2html
works and some of the design issues involved in writing it.
.LP
In general, the goal of
.I tc2html
is that you should get reasonable HTML output
with no need for special treatment of the
.I troff
input file.
The most important thing is that you use a standard macro package.
However, there are some additional principles you can follow that will
improve the quality of the HTML that
.I tc2html
generates.
For example, it's possible to embed hypertext links in your
.I troff
source with a little prior planning.
Techniques for such things are discussed in the section
.H*ahref #better-html
``Generating Better HTML.''
.H*aend
If you're not interested in implementation details, you can skip directly
to that section.
.\"
.Ah "Output Format"
.\"
.LP
.I tc2html
reads output from
.I troffcvt
and produces an HTML document that has the following general form:
.Ps
<HTML>
<HEAD>
<TITLE>\fItitle text\fP</TITLE>
</HEAD>
<BODY>
<H1>\fItitle text\fP</H1>
\fIbody text\fP
</BODY>
</HTML>
.Pe
The document HEAD part may be missing if
.I tc2html
detects no title in the input.
In this case the initial heading at the beginning of the document BODY
part also will be missing.
The entire document BODY may be missing or empty
if the input document is empty.
.\"
.Ah "Determining Input Document Structure"
.\"
.LP
HTML documents typically are highly structured, being written in terms of
elements such as headers, paragraphs, lists, and displays (preformatted
text).
But
.I troffcvt
output normally contains very little structural
information beyond markers like those for
inter-paragraph spacing and line breaks (in the form of
.Cl space
and
.Cl break
control lnes).
The result when
.I tc2html
reads such
.I troffcvt
output is that it produces HTML that is relatively unstructured
\*- just a lot of text broken by occasional <P> or <BR> markers.
.LP
However, if your document is marked up using macros from
a macro package such as
.B \-ms
or
.B \-man ,
it's possible to get output from
.I troffcvt
that's much more suitable for
.I tc2html .
The trick is to map
.I troff
requests to HTML
structure markers, rather than trying to guess the structure from the low-level
.I troffcvt
output that normally results from those requests.
This is accomplished using the following strategy:
.Ls B
.Li
Extend the
.I troffcvt
output language by defining an
.Cl html
control that provides information to
.I tc2html
about structural elements within the
.I troffcvt
output.
For example,
.Cl html
.B para
indicates the beginning of a paragraph.
.Li
Provide (in a
.I troffcvt
action file) a set of
HTML-specific macros that generate the appropriate
.Cl html
controls for the various structural elements.
For example,
.Rq H*para
generates
.Cl html
.B para .
.Li
For the important structure-related macros in your
macro package, redefine them (in a
.I troffcvt
action file)
so they're expressed in terms of the HTML-specific macros.
(It's posssible, of course, to redefine the macros from the macro package
so they generate the
.Cl html
controls themselves.
But having the
.Cl html
controls available through a set of macros allows the macros
to be
invoked directly in your document.
This is important for some HTML constructs that have no
.I troff
analog, such as hyperlinks.)
.Le
Note that ``extending'' the
.I troffcvt
output language to include the
.Cl html
control is done using request definitions in an action file.
Source-level changes to
.I troffcvt
itself are not needed.
.LP
The effect of the strategy outlined above
is to remap the macros in your macro package
from their usual actions onto actions that produce document structure
information that
.I tc2html
can recognize.
For this to work well, all the important structure-related
macros in a macro package must be redefined, so
the redefinition files used for
.I tc2html
tend to be more extensive than those used for other postprocessors.
This is really the source of most of the work involved in getting
.I tc2html
to function well.
Once a set of redefinitions is written for a given macro package,
translation from
.I troff
to HTML is a straighforward process that usually generates fairly reasonable
HTML.
.LP
Here's an example of how the strategy described above works in practice.
The
.Rq LP
macro in the
.B \-ms
macro package means ``begin paragraph.''
But
.Rq LP
typically is implemented by executing several other requests
(restore font, margins, adjustment, spacing, point size, etc.), and the
.I troffcvt
output you'd get by processing those requests
really contains nothing that specifically indicates a paragraph.
To work around this, we use the fact that
.I tc2html
interprets
.Cl html
.B para
as indicating a paragraph beginning, and define
a macro to generate that control:
.Ps
req H*para eol output-control "html para"
.Pe
Then we can redefine the
.Rq LP
macro in terms of the
.Rq H*para
macro:
.Ps
req LP eol \e
break center 0 fill adjust b font R \e
push-string ".H*para\en"
.Pe
The
.B break ,
.B fill ,
.B adjust ,
and
.B font
actions cause
.I troffcvt
to adjust its internal state to match the effect that the
.Rq LP
macro normally has.
The call to
.Rq H*para
results in
.Cl html
.B para
in the output, so that
.I tc2html
can recognize the paragraph beginning.
.LP
The
.Cl html
markers that
.I tc2html
recognizes are shown below:
.Ps
.ta 2.75i
\ehtml title \fRBegin document title\fP
\ehtml header \f(CIN\fP \fRBegin level \f(CIN\fP header\f(CW
\ehtml header-end \fREnd header (any level)\fP
\ehtml para \fRBegin paragraph\fP
\ehtml blockquote \fRBegin block quote\fP
\ehtml blockquote-end \fREnd block quote\fP
\ehtml list \fRBegin list\fP
\ehtml list-end \fREnd list\fP
\ehtml list-item \fRBegin list item\fP
\ehtml display \fRBegin display (preformatted text)\fP
\ehtml display-end \fREnd display\fP
\ehtml display-indent \f(CIN\fP \fRSet display indent to \f(CIN\fP spaces\f(CW
\ehtml definition-term \fRBegin definition list term\fP
\ehtml definition-desc \fRBegin definition list description\fP
\ehtml shift-right \fRShift left margin right\fP
\ehtml shift-left \fRShift left margin left\fP
\ehtml anchor-href \f(CIURL\fP \fRBegin HREF anchor for link to \f(CIURL\f(CW
\ehtml anchor-name \f(CILABEL\fP \fRBegin NAME anchor with label \f(CILABEL\f(CW
\ehtml anchor-toc \f(CIN\fP \fRBegin NAME anchor for level \f(CIN\fP TOC entry\f(CW
\ehtml anchor-end \fREnd anchor (any kind)\fP
.Pe
The
.I troff -level
macros used to generate the
.Cl html
controls are shown below.
These macros are defined in the action file
.I actions-html :
.Ps
.ta 2.75i
\&.H*title \fRBegin document title\fP
\&.H*header \f(CIN\fP \fRBegin level \f(CIN\fP header\f(CW
\&.H*header-end \fREnd header (any level)\fP
\&.H*para \fRBegin paragraph\fP
\&.H*bq \fRBegin block quote\fP
\&.H*bq-end \fREnd block quote\fP
\&.H*list \fRBegin list\fP
\&.H*list-end \fREnd list\fP
\&.H*list-item \fRBegin list item\fP
\&.H*disp \fRBegin display (preformatted text)\fP
\&.H*disp-end \fREnd display\fP
\&.H*disp-indent \f(CIN\fP \fRSet display indent to \f(CIN\fP spaces\f(CW
\&.H*dterm \fRBegin definition list term\fP
\&.H*ddesc \fRBegin definition list description\fP
\&.H*shift-right \fRShift left margin right\fP
\&.H*shift-left \fRShift left margin left\fP
\&.H*ahref \f(CIURL\fP \fRBegin HREF anchor for link to \f(CIURL\f(CW
\&.H*aname \f(CILABEL\fP \fRBegin NAME anchor with label \f(CILABEL\f(CW
\&.H*atoc \f(CIN\fP \fRBegin NAME anchor for level \f(CIN\fP TOC entry\f(CW
\&.H*aend \fREnd anchor (any kind)\fP
.Pe
Note that since these names are longer than two characters, they cannot
be used in compatibility mode.
.\"
.Ah "Invoking tc2html"
.\"
.LP
The
.Cl html
controls are defined in a file
.I actions-html
that you can access on the
.I troffcvt
command line using
.B \-a
.B actions-html .
If you use a macro package
.B \-m \f[BI]xx\fP,
you specify it on the command line, along with the general
and HTML-specific
.I troffcvt
redefinitions for that macro package; these are in the action files
.I tc.mxx
and
.I tc.mxx-html .
Thus, to translate a file that you'd normally process using
.B \-ms ,
the command would look like this:
.Ps
% \f(CBtroffcvt -a actions.html -ms -a tc.ms -a tc.ms-html\fP \f(CImyfile.ms\fP \f(CB\e
| tc2html >\fP \f(CImyfile.html\fP
.Pe
That's pretty ugly, of course; it's better to use a wrapper script like
.I troff2html
that supplies the necessary options for you:
.Ps
% \f(CBtroff2httml -ms\fP \f(CImyfile.ms\fP \f(CB>\fP \f(CImyfile.html\fP
.Pe
.\"
.Ah "Implementation of Various HTML Constructs"
.\"
.LP
This section provides some specifics on how several
.I troff
concepts are turned into HTML elements.
It should be considered illustrative rather than exhaustive.
.\"
.H*aname title-collection
.H*aend
.Bh "Document Titles"
.\"
.LP
Title macros are implemented in terms of
.Rq H*title ,
which generates an
.Cl html
.B title
control.
When
.I tc2html
sees this control, it goes into document HEAD collection mode.
If the document contains a title, the
.Cl html
.B title
line must be the first
.Cl html
control that
.I tc2html
sees.
Should any other
.Cl html
control or document text occur first,
.I tc2html
assumes no title is present.
Any leading document whitespace
.Cl space "" (
or
.Cl break
lines) occurring prior to the title is skipped.
.LP
The title is terminated by the next
.Cl html
line with a structural marker, such as
.Cl html
.B para .
The title text is used to produce the TITLE in the document HEAD part
and the initial header in the document BODY part.
.Cl space
and
.Cl break
lines within the title do
.I not
terminate title text collection; instead, they are turned into spaces in
the title and into <P> and <BR> in the initial header.
Consider the following
.I troff
input (using
.B \-ms
macros):
.Ps
\&.TL
My
\&.sp
Title
\&.LP
This is a line
.Pe
This is converted by
.I troffcvt
into the following:
.Ps
\ehtml title
My
\espace
Title
\ebreak
\ehtml para
This is a line.
.Pe
The output from
.I troffcvt
is converted in turn by
.I tc2html
into this HTML:
.Ps
<HEAD>
<TITLE>
My Title
</TITLE>
</HEAD>
<BODY>
<H2>
My
<P>
Title
</H2>
<P>
This is a line.
.Pe
.B \-T
.I title
may be specified on the
.I tc2html
or
.I troff2html
command line to specify a title explicitly.
It overrides the title in the document if there is one.
.\"
.Bh "Standard Paragraphs"
.\"
.LP
The ``standard'' paragraph is a paragraph with the first line flush left.
There is no mechanism for writing paragraphs with an indented first line;
they're treated simply as standard paragraphs.
.LP
The standard paragraph is implemented in terms of
.Rq H*para ,
which generates an
.Cl html
.B para
control.
This is turned by
.I tc2html
into <P>.
.LP
In the document BODY part,
.Cl space
is also interpreted as a paragraph marker, but
during document title collection,
.Cl space
is treated as described above under ``\c
.H*ahref #title-collection
Document Titles
.H*aend
\&.''
.\"
.Bh "Indented Paragraphs"
.\"
.LP
Indented paragraphs (with or without a hanging tag)
are implemented using definition lists (<DL>...</DL>).
The tag is written as a definition term (<DT>...</DT>)
and the paragraph body is written as a definition description (<DD>...</DD>).
If there is no tag, the term part is empty.
.LP
Indented paragraph macros are implemented in terms of
.Rq H*dterm
and
.Rq H*ddesc ,
which generate
.Cl html
.B definition-term
and
.Cl html
.B definition-desc
controls.
.LP
One problem with mapping indented paragraphs onto definition lists
is that it's not always clear from the
.I troff
input where the list ends.
In HTML, the definition list is a container for which you must write
both a beginning and ending tag, but in
.I troff
only the beginnings of paragraphs are specified.
This problem is handled (perhaps poorly) by closing the list when other HTML
structural elements like a standard paragraph or a header are seen.
Suppose you write something like this:
.Ps
\&.IP (i)
Para 1
\&.IP (ii)
Para 2
\&.LP
Para 3
.Pe
This is converted by
.I troffcvt
into the following:
.Ps
\ehtml definition-term
(i)
\ehtml definition-desc
Para 1
\ebreak
\ehtml definition-term
(ii)
\ehtml definition-desc
Para 2
\ebreak
\ehtml para
Para 3
\ebreak
.Pe
When
.I tc2html
sees the first
.Cl definition-term ,
it begins a definition list.
The second
.Cl definition-term
continues the same list.
The
.Cl html
.B para
(resulting from the
.Rq LP )
is part of a different structural element, so
.I tc2html
closes the list and begins a standard paragraph.
The resulting HTML looks like this:
.Ps
<DL>
<DT>
(i)
</DT>
<DD>
Para 1<BR>
</DD>
<DT>
(ii)
</DT>
<DD>
Para 2<BR>
</DD>
</DL>
<P>
Para 3<BR>
.Pe
.\"
.Bh "Right and Left Shifts"
.\"
.LP
In
.I troff ,
the left margin can be shifted right and left, e.g., as is done with the
.B \-ms
and
.B \-man
packages using
.Rq RS
and
.Rq RE .
HTML has no good way of shifting the margin, so shifts are performed
using <UL> and </UL>.
This is admittedly a hack, but it works reasonably well.
Shift macros are redefined to be implemented in terms of
.Rq H*shift*right
and
.Rq H*shift*left ,
which generate
.Cl html
.B shift-right
and
.Cl html
.B shift-left
controls.
These in turn are converted by
.I tc2html
to <UL> and </UL>.
.\"
.Bh "Displays"
.\"
.LP
Displays are implemented as preformatted text (<PRE>...</PRE>).
Tabstops are respected within displays, although they must be approximated
since characters widths are unknown.
.I tc2html
assumes 10 characters/inch for determining the width of tabstops.
.LP
Display macros are implemented in terms of
.Rq H*disp
and
.Rq H*disp*end .
Preformatted text in HTML has no additional indent relative to the left
margin, but
.I troff
displays often are indented a bit.
To handle this,
.Rq H*disp*indent
.I N
can be used to set the display indent to
.I N
spaces.
.LP
.Rq H*disp ,
.Rq H*disp*end ,
and
.Rq H*disp*indent
generate
.Cl html
.B display ,
.Cl html
.B display-end ,
and
.Cl html
.B display-indent
controls.
The first two of these are converted by
.I tc2html
into <PRE> and </PRE>.
.Cl html
.B display-indent
generates no output itself, but causes
.I tc2html
to add spaces to the beginning of each line of a display.
.LP
Centered and right-justified displays are not implemented.
They're treated as regular displays.
.\"
.Bh "Tables"
.\"
.LP
If your input document has tables written in the
.I tbl
language, preprocess the document with
.I tblcvt
rather than with
.I tbl .
Your output will look better that way.
.LP
Table cell borders are hard to do well.
In
.I tbl
you can put a border on any cell boundary, but in HTML a table has either
no borders or borders around every cell.
Currently,
.I tc2html
puts borders around every cell.
.\"
.Bh "Font Handling"
.\"
.LP
Fonts are handled in
.I tc2html
by means of a table that associates four tags with each font name.
The first two tags are used to turn the font on and off in normal text.
The second two tags are used to turn the font on and off in displays.
This table is read at runtime from the
.I html-fonts
file.
Here's an example of what the file might look like:
.Ps
.ta .5i +1.25i +1.5i +1.25i
R "" "" "" ""
I <I> </I> <I> </I>
B <B> </B> <B> </B>
BI <B><I> </I></B> <B><I> </I></B>
C <TT> </TT> "" ""
CW <TT> </TT> "" ""
CI <TT><I> </I></TT> <I> </I>
CB <TT><B> </B></TT> <B> </B>
CBI <TT><B><I> </I></B></TT> <B><I> </I></B>
.Pe
The difference between the tags for regular text and display text
is that, since browsers implicitly switch the font to monospaced
font in displays, the only thing that can be done for font changes there
is to change the style attributes.
.LP
The initial font when
.I tc2html
begins is
.Cw R
(roman).
When a font change occurs, the new font's begin tag is written out
after terminating the previous font by writing its end tag.
Using the font table just shown, this input:
.Ps
\efont R
abc
\efont I
def
\efont CW
ghi
\efont R
jkl
.Pe
becomes this output:
.Ps
abc<I>def</I><TT>ghi</TT>jkl
.Pe
.\"
.Bh "Tabs"
.\"
.LP
Tabs are ignored except in displays.
Adding extra space to tab over
has no effect in regular paragraphs anyway, because browsers typically
collapse runs of spaces.
.LP
Right-justified and centered tabs are
treated as left-justified tabs.
That is, they're completely botched.
.\"
.H*aname better-html
.H*aend
.Ah "Generating Better HTML"
.\"
.LP
This section describes how you can embed hypertext links in your
.I troff
source and how to produce a table of contents containing clickable
links to the main sections of your document.
.\"
.Bh "Generating Hypertext Links"
.\"
.LP
The
.Cl html
controls used to generate hypertext links are:
.Ps
\ehtml anchor-href \f(CIURL\fP
\ehtml anchor-name \f(CILABEL\fP
\ehtml anchor-end
.Pe
The first two controls generate opening
\f(CW<A HREF=\f(CIURL\f(CI>\fR
and
\f(CW<A NAME=\f(CILABEL\f(CI>\fR
tags; the third generates a closing
.Cw "</A>"
tag.
.LP
To embed hypertext links in your
.I troff
source, you can use the macros
.Rq H*ahref
and
.Rq H*aend ,
or
.Rq H*aname
and
.Rq H*aend .
To write an HREF link, the
.I troff
source looks like this:
.Ps
\&.H*ahref http://www.some.host/some/path
hypertext link
\&.H*aend
.Pe
The resulting HTML looks like this:
.Ps
<A HREF="http://www.some.host/some/path">
hypertext link</A>
.Pe
To write a NAME link, the
.I troff
source looks like this:
.Ps
\&.H*aname my-name
name link
\&.H*aend
.Pe
The resulting HTML looks like this:
.Ps
<A NAME="my-name">
name link</A>
.Pe
Section-header macros are usually redefined to generate a NAME
anchor for the table of contents, so don't surround a section header
with anchor-generating macros.
You'll end up with nested anchors, which
.I tc2html
disallows.
You can generate a NAME link for a section (e.g., so that you refer
to it using a specific name) as long as you don't write the link like this:
.Ps
\&.H*aname better-html
\&.SH "Generating Better HTML"
\&.H*aend
.Pe
Instead, write it like this:
.Ps
\&.H*aname better-html
\&.H*aend
\&.SH "Generating Better HTML"
.Pe
Unfortunately, some browsers don't seem able to jump to
.Cw NAME
anchors unless there is some text between the
.Cw "<A NAME>"
and
.Cw </A>
tags.
.LP
You can't make a section header a hypertext link.
You'd have to put the header (which generates a NAME link for the TOC)
between the
.Rq H*ahref
and
.Rq H*aend
macros, which would result in nested anchors.
.\"
.Bh "Generating a Table of Contents"
.\"
.LP
Putting a table of contents (TOC) into an HTML document requires some
postprocessing of the
.I tc2html
output.
The TOC entries can't be written to the beginning of the document
because they're not all known until the input has been read entirely.
The approach adopted with
.I tc2html
is as follows:
.Ls B
.Li
Write a marker to the document indicating the desired TOC position.
You do this using a special macro, described below.
.Li
Collect TOC entries in memory as the input is processed.
.Li
Write the TOC contents as a list near the end of the document.
.Li
Run
.I tc2html-toc ,
a script that examines the HTML document and moves the TOC contents
to the location indicated by the TOC position marker.
.Le
If you run
.I tc2html
directly, you must also run
.I tc2html-toc
directly.
If you use
.I troff2html ,
.I tc2html-toc
is run for you automatically.
.LP
The
.Cl html
controls used to generate TOC entries are:
.Ps
\ehtml anchor-toc \f(CIN\fP
\ehtml anchor-end
.Pe
Text occurring between
.Cl html
.B anchor-toc
and
.Cl html
.B anchor-end
pairs is written to the output, but it's also
collected and remembered.
When
.I tc2html
encounters end of file on its input,
it writes the TOC entries to the output between two other HTML comments:
.Ps
<!-- TOC BEGIN -->
\fITOC entries\fP
<!-- TOC END -->
.Pe
If you want to generate a TOC entry explicitly in your
.I troff
source, use
.Rq H*atoc
and
.Rq H*aend .
For example:
.Ps
\&.H*atoc 1
My TOC Entry
\&.H*aend
.Pe
The argument to
.Rq H*atoc
is the TOC entry level (1, 2, 3, ...).
.LP
It's unnecessary to invoke TOC macros directly if
the section-header macros in your macro package are redefined
to invoke the TOC macros for you.
For example, the
.Rq SH
for the
.B \-ms
package is redefined like this in the
.I tc.ms-html
action file:
.Ps
req SH parse-macro-args eol \e
break fill adjust b \e
push-string ".H*atoc 1\en" \e
push-string ".H*header 2\en" \e
push-string "$1\en" \e
push-string ".H*header*end\en" \e
push-string ".H*aend\en"
.Pe
To specify the TOC title and generate the TOC position marker, use the
.Rq H*toc*title
macro.
Invoke it as shown below, passing the title of your TOC as the first argument:
.Ps
\&.H*toc*title "Table of Contents"
.Pe
.Rq H*toc*title
writes the TOC title to the output followed by a special HTML comment:
.Ps
Table of Contents
<!-- INSERT TOC HERE -->
.Pe
The INSERT TOC HERE comment
is used by
.I tc2html-toc ,
along with the TOC BEGIN and TOC END comments,
to find the TOC entries and move them to the desired location.
.LP
Action files that provide macro package redefinitions for
.I tc2html
can try to place an advisory TOC location marker in the document.
This is used if you don't specify a location marker explicitly with
.Rq H*toc*title :
.Ps
<!-- INSERT TOC HERE, MAYBE -->
.Pe
For instance, the
.B \-man
redefinitions put out this marker when the
.Rq TH
macro has been seen.
The marker causes a TOC to be placed
after the title line and the first man page section, unless one is specified
explicitly.
No TOC title is written with the advisory marker however, so the TOC
will be ``title-less.''
|