1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552
|
.\" $Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp $
.\"
.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd $Mdocdate: April 24 2020 $
.Dt MANDOC_HTML 3
.Os
.Sh NAME
.Nm mandoc_html
.Nd internals of the mandoc HTML formatter
.Sh SYNOPSIS
.In sys/types.h
.Fd #include """mandoc.h"""
.Fd #include """roff.h"""
.Fd #include """out.h"""
.Fd #include """html.h"""
.Ft void
.Fn print_gen_decls "struct html *h"
.Ft void
.Fn print_gen_comment "struct html *h" "struct roff_node *n"
.Ft void
.Fn print_gen_head "struct html *h"
.Ft struct tag *
.Fo print_otag
.Fa "struct html *h"
.Fa "enum htmltag tag"
.Fa "const char *fmt"
.Fa ...
.Fc
.Ft void
.Fo print_tagq
.Fa "struct html *h"
.Fa "const struct tag *until"
.Fc
.Ft void
.Fo print_stagq
.Fa "struct html *h"
.Fa "const struct tag *suntil"
.Fc
.Ft void
.Fn html_close_paragraph "struct html *h"
.Ft enum roff_tok
.Fo html_fillmode
.Fa "struct html *h"
.Fa "enum roff_tok tok"
.Fc
.Ft int
.Fo html_setfont
.Fa "struct html *h"
.Fa "enum mandoc_esc font"
.Fc
.Ft void
.Fo print_text
.Fa "struct html *h"
.Fa "const char *word"
.Fc
.Ft void
.Fo print_tagged_text
.Fa "struct html *h"
.Fa "const char *word"
.Fa "struct roff_node *n"
.Fc
.Ft char *
.Fo html_make_id
.Fa "const struct roff_node *n"
.Fa "int unique"
.Fc
.Ft struct tag *
.Fo print_otag_id
.Fa "struct html *h"
.Fa "enum htmltag tag"
.Fa "const char *cattr"
.Fa "struct roff_node *n"
.Fc
.Ft void
.Fn print_endline "struct html *h"
.Sh DESCRIPTION
The mandoc HTML formatter is not a formal library.
However, as it is compiled into more than one program, in particular
.Xr mandoc 1
and
.Xr man.cgi 8 ,
and because it may be security-critical in some contexts,
some documentation is useful to help to use it correctly and
to prevent XSS vulnerabilities.
.Pp
The formatter produces HTML output on the standard output.
Since proper escaping is usually required and best taken care of
at one central place, the language-specific formatters
.Po
.Pa *_html.c ,
see
.Sx FILES
.Pc
are not supposed to print directly to
.Dv stdout
using functions like
.Xr printf 3 ,
.Xr putc 3 ,
.Xr puts 3 ,
or
.Xr write 2 .
Instead, they are expected to use the output functions declared in
.Pa html.h
and implemented as part of the main HTML formatting engine in
.Pa html.c .
.Ss Data structures
These structures are declared in
.Pa html.h .
.Bl -tag -width Ds
.It Vt struct html
Internal state of the HTML formatter.
.It Vt struct tag
One entry for the LIFO stack of HTML elements.
Members include
.Fa "enum htmltag tag"
and
.Fa "struct tag *next" .
.El
.Ss Private interface functions
The function
.Fn print_gen_decls
prints the opening
.Aq Pf \&! Ic DOCTYPE
declaration.
.Pp
The function
.Fn print_gen_comment
prints the leading comments, usually containing a Copyright notice
and license, as an HTML comment.
It is intended to be called right after opening the
.Aq Ic HTML
element.
Pass the first
.Dv ROFFT_COMMENT
node in
.Fa n .
.Pp
The function
.Fn print_gen_head
prints the opening
.Aq Ic META
and
.Aq Ic LINK
elements for the document
.Aq Ic HEAD ,
using the
.Fa style
member of
.Fa h
unless that is
.Dv NULL .
It uses
.Fn print_otag
which takes care of properly encoding attributes,
which is relevant for the
.Fa style
link in particular.
.Pp
The function
.Fn print_otag
prints the start tag of an HTML element with the name
.Fa tag ,
optionally including the attributes specified by
.Fa fmt .
If
.Fa fmt
is the empty string, no attributes are written.
Each letter of
.Fa fmt
specifies one attribute to write.
Most attributes require one
.Va char *
argument which becomes the value of the attribute.
The arguments have to be given in the same order as the attribute letters.
If an argument is
.Dv NULL ,
the respective attribute is not written.
.Bl -tag -width 1n -offset indent
.It Cm c
Print a
.Cm class
attribute.
.It Cm h
Print a
.Cm href
attribute.
This attribute letter can optionally be followed by a modifier letter.
If followed by
.Cm R ,
it formats the link as a local one by prefixing a
.Sq #
character.
If followed by
.Cm I ,
it interpretes the argument as a header file name
and generates a link using the
.Xr mandoc 1
.Fl O Cm includes
option.
If followed by
.Cm M ,
it takes two arguments instead of one, a manual page name and
section, and formats them as a link to a manual page using the
.Xr mandoc 1
.Fl O Cm man
option.
.It Cm i
Print an
.Cm id
attribute.
.It Cm \&?
Print an arbitrary attribute.
This format letter requires two
.Vt char *
arguments, the attribute name and the value.
The name must not be
.Dv NULL .
.It Cm s
Print a
.Cm style
attribute.
If present, it must be the last format letter.
It requires two
.Va char *
arguments.
The first is the name of the style property, the second its value.
The name must not be
.Dv NULL .
The
.Cm s
.Ar fmt
letter can be repeated, each repetition requiring an additional pair of
.Va char *
arguments.
.El
.Pp
.Fn print_otag
uses the private function
.Fn print_encode
to take care of HTML encoding.
If required by the element type, it remembers in
.Fa h
that the element is open.
The function
.Fn print_tagq
is used to close out all open elements up to and including
.Fa until ;
.Fn print_stagq
is a variant to close out all open elements up to but excluding
.Fa suntil .
The function
.Fn html_close_paragraph
closes all open elements that establish phrasing context,
thus returning to the innermost flow context.
.Pp
The function
.Fn html_fillmode
switches to fill mode if
.Fa want
is
.Dv ROFF_fi
or to no-fill mode if
.Fa want
is
.Dv ROFF_nf .
Switching from fill mode to no-fill mode closes the current paragraph
and opens a
.Aq Ic PRE
element.
Switching in the opposite direction closes the
.Aq Ic PRE
element, but does not open a new paragraph.
If
.Fa want
matches the mode that is already active, no elements are closed nor opened.
If
.Fa want
is
.Dv TOKEN_NONE ,
the mode remains as it is.
.Pp
The function
.Fn html_setfont
selects the
.Fa font ,
which can be
.Dv ESCAPE_FONTROMAN ,
.Dv ESCAPE_FONTBOLD ,
.Dv ESCAPE_FONTITALIC ,
.Dv ESCAPE_FONTBI ,
or
.Dv ESCAPE_FONTCW ,
for future text output and internally remembers
the font that was active before the change.
If the
.Fa font
argument is
.Dv ESCAPE_FONTPREV ,
the current and the previous font are exchanged.
This function only changes the internal state of the
.Fa h
object; no HTML elements are written yet.
Subsequent text output will write font elements when needed.
.Pp
The function
.Fn print_text
prints HTML element content.
It uses the private function
.Fn print_encode
to take care of HTML encoding.
If the document has requested a non-standard font, for example using a
.Xr roff 7
.Ic \ef
font escape sequence,
.Fn print_text
wraps
.Fa word
in an HTML font selection element using the
.Fn print_otag
and
.Fn print_tagq
functions.
.Pp
The function
.Fn print_tagged_text
is a variant of
.Fn print_text
that wraps
.Fa word
in an
.Aq Ic A
element of class
.Qq permalink
if
.Fa n
is not
.Dv NULL
and yields a segment identifier when passed to
.Fn html_make_id .
.Pp
The function
.Fn html_make_id
allocates a string to be used for the
.Cm id
attribute of an HTML element and/or as a segment identifier for a URI in an
.Aq Ic A
element.
If
.Fa n
contains a
.Fa tag
attribute, it is used; otherwise, child nodes are used.
If
.Fa n
is an
.Ic \&Sh ,
.Ic \&Ss ,
.Ic \&Sx ,
.Ic SH ,
or
.Ic SS
node, the resulting string is the concatenation of the child strings;
for other node types, only the first child is used.
Bytes not permitted in URI-fragment strings are replaced by underscores.
If any of the children to be used is not a text node,
no string is generated and
.Dv NULL
is returned instead.
If the
.Fa unique
argument is non-zero, deduplication is performed by appending an
underscore and a decimal integer, if necessary.
If the
.Fa unique
argument is 1, this is assumed to be the first call for this tag
at this location, typically for use by
.Dv NODE_ID ,
so the integer is incremented before use.
If the
.Fa unique
argument is 2, this is ssumed to be the second call for this tag
at this location, typically for use by
.Dv NODE_HREF ,
so the existing integer, if any, is used without incrementing it.
.Pp
The function
.Fn print_otag_id
opens a
.Fa tag
element of class
.Fa cattr
for the node
.Fa n .
If the flag
.Dv NODE_ID
is set in
.Fa n ,
it attempts to generate an
.Cm id
attribute with
.Fn html_make_id .
If the flag
.Dv NODE_HREF
is set in
.Fa n ,
an
.Aq Ic A
element of class
.Qq permalink
is added:
outside if
.Fa n
generates an element that can only occur in phrasing context,
or inside otherwise.
This function is a wrapper around
.Fn html_make_id
and
.Fn print_otag ,
automatically chosing the
.Fa unique
argument appropriately and setting the
.Fa fmt
arguments to
.Qq chR
and
.Qq ci ,
respectively.
.Pp
The function
.Fn print_endline
makes sure subsequent output starts on a new HTML output line.
If nothing was printed on the current output line yet, it has no effect.
Otherwise, it appends any buffered text to the current output line,
ends the line, and updates the internal state of the
.Fa h
object.
.Pp
The functions
.Fn print_eqn ,
.Fn print_tbl ,
and
.Fn print_tblclose
are not yet documented.
.Sh RETURN VALUES
The functions
.Fn print_otag
and
.Fn print_otag_id
return a pointer to a new element on the stack of HTML elements.
When
.Fn print_otag_id
opens two elements, a pointer to the outer one is returned.
The memory pointed to is owned by the library and is automatically
.Xr free 3 Ns d
when
.Fn print_tagq
is called on it or when
.Fn print_stagq
is called on a parent element.
.Pp
The function
.Fn html_fillmode
returns
.Dv ROFF_fi
if fill mode was active before the call or
.Dv ROFF_nf
otherwise.
.Pp
The function
.Fn html_make_id
returns a newly allocated string or
.Dv NULL
if
.Fa n
lacks text data to create the attribute from.
The caller is responsible for
.Xr free 3 Ns ing
the returned string after using it.
.Pp
In case of
.Xr malloc 3
failure, these functions do not return but call
.Xr err 3 .
.Sh FILES
.Bl -tag -width mandoc_aux.c -compact
.It Pa main.h
declarations of public functions for use by the main program,
not yet documented
.It Pa html.h
declarations of data types and private functions
for use by language-specific HTML formatters
.It Pa html.c
main HTML formatting engine and utility functions
.It Pa mdoc_html.c
.Xr mdoc 7
HTML formatter
.It Pa man_html.c
.Xr man 7
HTML formatter
.It Pa tbl_html.c
.Xr tbl 7
HTML formatter
.It Pa eqn_html.c
.Xr eqn 7
HTML formatter
.It Pa roff_html.c
.Xr roff 7
HTML formatter, handling requests like
.Ic br ,
.Ic ce ,
.Ic fi ,
.Ic ft ,
.Ic nf ,
.Ic rj ,
and
.Ic sp .
.It Pa out.h
declarations of data types and private functions
for shared use by all mandoc formatters,
not yet documented
.It Pa out.c
private functions for shared use by all mandoc formatters
.It Pa mandoc_aux.h
declarations of common mandoc utility functions, see
.Xr mandoc 3
.It Pa mandoc_aux.c
implementation of common mandoc utility functions
.El
.Sh SEE ALSO
.Xr mandoc 1 ,
.Xr mandoc 3 ,
.Xr man.cgi 8
.Sh AUTHORS
.An -nosplit
The mandoc HTML formatter was written by
.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
It is maintained by
.An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
who also wrote this manual.
|