1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657
|
NAME
Pod::2::DocBook - Convert Pod data to DocBook SGML
SYNOPSIS
use Pod::2::DocBook;
my $parser = Pod::2::DocBook->new(
title => 'My Article',
doctype => 'article',
base_id => 'article42'
fix_double_quotes => 1,
spaces => 3,
id_version => 2,
);
$parser->parse_from_file ('my_article.pod', 'my_article.sgml');
DESCRIPTION
Pod::2::DocBook is a module for translating Pod-formatted documents to
DocBook 4.2 SGML (see <http://www.docbook.org/>). It is primarily a back
end for pod2docbook, but, as a Pod::Parser subclass, it can be used on
its own. The only public extensions to the Pod::Parser interface are
options available to "new()":
doctype
This option sets the output document's doctype. The currently
supported types are article, chapter, refentry and section. Special
processing is performed when the doctype is set to refentry (see
"Document Types"). You *must* set this option in order to get valid
DocBook output.
fix_double_quotes
If this option is set to a true value, pairs of double quote
characters ('"') in ordinary paragraphs will be replaced with
<quote> and </quote>. See "Ordinary Paragraphs" for details.
header
If this option is set to a true value, Pod::2::DocBook will emit a
DOCTYPE as the first line of output.
spaces
Pod::2::DocBook produces pretty-printed output. This option sets the
number of spaces per level of indentation in the output.
title
This option sets the output document's title.
The rest of this document only describes issues specific to
Pod::2::DocBook; for details on invoking the parser, specifically the
"new()", "parse_from_file()" and "parse_from_filehandle()" methods, see
Pod::Parser.
METHODS
use base 'Pod::Parser';
initialize()
Initialize parser.
begin_pod()
Output docbook header stuff.
end_pod()
Output docbook footer. Will print also errors if any in a comment block.
commans($command, $paragraph, $line_num)
Process POD commands.
textblock ($paragraph, $line_num)
Process text block.
verbatim($paragraph, $line_num)
Process verbatim text block.
interior_sequence($command, $argument, $seq)
Process formatting commands.
error_msg
Returns parser error message(s) if any occured.
make_id($text)
default id format -
Function will construct an element id string. Id string is composed of
"join (':', $parser->{base_id}, $text)", where $text in most cases is
the pod heading text.
version 2 id format -
having ':' in id was not a best choice. (Xerces complains - Attribute
value "lib.Moose.Manual.pod:NAME" of type ID must be an NCName when
namespaces are enabled.) To not break backwards compatibity switch with
<id_version = 2>> in constructor for using '-' instead.
The xml id string has strict format. Checkout "cleanup_id" function for
specification.
make_uniq_id($text)
Calls "$parser->make_id($text)" and checks if such id was already
generated. If so, generates new one by adding _i1 (or _i2, i3, ...) to
the id string. Return value is new uniq id string.
cleanup_id($id_string)
This function is used internally to remove/change any illegal characters
from the elements id string. (see
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Name for the id string
specification)
$id_string =~ s/<!\[CDATA\[(.+?)\]\]>/$1/g; # keep just inside of CDATA
$id_string =~ s/<.+?>//g; # remove tags
$id_string =~ s/^\s*//; # ltrim spaces
$id_string =~ s/\s*$//; # rtrim spaces
$id_string =~ tr{/ }{._}; # replace / with . and spaces with _
$id_string =~ s/[^\-_a-zA-Z0-9\.: ]//g; # closed set of characters allowed in id string
In the worst case when the $id_string after clean up will not conform
with the specification, warning will be printed out and random number
with leading colon will be used.
POD TO DOCBOOK TRANSLATION
Pod is a deceptively simple format; it is easy to learn and very
straightforward to use, but it is suprisingly expressive. Nevertheless,
it is not nearly as expressive or complex as DocBook. In most cases,
given some Pod, the analogous DocBook markup is obvious, but not always.
This section describes how Pod::2::DocBook treats Pod input so that Pod
authors may make informed choices. In every case, Pod::2::DocBook
strives to make easy things easy and hard things possible.
The primary motivation behind Pod::2::DocBook is to facilitate
single-source publishing. That is, you should be able to generate man
pages, web pages, PDF and PostScript documents, or any other format your
SGML and/or Pod tools can produce, from the same Pod source, without the
need for hand-editing any intermediate files. This may not always be
possible, or you may simply choose to render Pod to DocBook and use that
as your single source. To satisfy the first requirement, Pod::2::DocBook
always processes the entire Pod source and tries very hard to produce
valid DocBook markup, even in the presence of malformed Pod (see
"DIAGNOSTICS"). To satisfy the second requirement (and to be a little
nifty), Pod::2::DocBook pretty-prints its output. If you're curious
about what specific output to expect, read on.
Document Types
DocBook's structure is very modular; many of its document types can be
embedded directly into other documents. Accordingly, Pod::2::DocBook
will generate four different document types: article, chapter, refentry,
and section. This makes it easy, for instance, to write all the chapters
of a book in separate Pod documents, translate them into DocBook markup
and later glue them together before processing the entire book. You
could do the same with each section in an article, or you could write
the entire article in a single Pod document. Other document types, such
as book and set, do not map easily from Pod, because they require
structure for which there is no Pod equivalent. But given sections and
chapters, making larger documents becomes much simpler.
The refentry document type is a little different from the others.
Sections, articles, and chapters are essentially composed of nested
sections. But a refentry has specialized elements for the *NAME* and
*SYNOPSIS* sections. To accommodate this, Pod::2::DocBook performs extra
processing on the Pod source when the doctype is set to refentry. You
probably don't have to do anything to your document to assist the
processing; typical man page conventions cover the requirements. Just
make sure that the *NAME* and *SYNOPSIS* headers are both =head1s, that
"NAME" and "SYNOPSIS" are both uppercase, and that =head1 NAME is the
first line of Pod source.
Ordinary Paragraphs
Ordinary paragraphs in a Pod document translate naturally to DocBook
paragraphs. Specifically, after any formatting codes are processed, the
characters "<", ">" and "&" are translated to their respective SGML
character entities, and the paragraph is wrapped in <para> and </para>.
For example, given this Pod paragraph:
Here is some text with I<italics> & an ampersand.
Pod::2::DocBook would produce DocBook markup similar to this:
<para>
Here is some text with <emphasis role="italic">italics</emphasis>
& an ampersand.
</para>
Depending on your final output format, you may sometimes want double
quotes in ordinary paragraphs to show up ultimately as "smart quotes"
(little 66s and 99s). Pod::2::DocBook offers a convenient mechanism for
handling double quotes in ordinary paragraphs and letting your SGML
toolchain manage their presentation: the fix_double_quotes option to
"new()". If this option is set to a true value, Pod::2::DocBook will
replace pairs of double quotes in ordinary paragraphs (and *only* in
ordinary paragraphs) with <quote> and </quote>.
For example, given this Pod paragraph:
Here is some text with I<italics> & an "ampersand".
Pod::2::DocBook, with fix_double_quotes set, would produce DocBook
markup similar to this:
<para>
Here is some text with <emphasis role="italic">italics</emphasis>
& an <quote>ampersand</quote>.
</para>
If you have a paragraph with an odd number of double quotes, the last
one will be left untouched, which may or may not be what you want. If
you have such a document, replace the unpaired double quote character
with E<quot>, and Pod::2::DocBook should be able to give you the output
you expect. Also, if you have any =begin docbook ... =end docbook
regions (see "Embedded DocBook Markup") in your Pod, you are responsible
for managing your own quotes in those regions.
Verbatim Paragraphs
Verbatim paragraphs translate even more naturally; perlpodspec mandates
that absolutely no processing should be performed on them. So
Pod::2::DocBook simply marks them as CDATA and wraps them in <screen>
and </screen>. They are not indented the way ordinary paragraphs are,
because they treat whitespace as significant.
For example, given this verbatim paragraph (imagine there's leading
whitespace in the source):
my $i = 10;
while (<> && $i--) {
print "$i: $_";
}
Pod::2::DocBook would produce DocBook markup similar to this:
<screen><![CDATA[my $i = 10;
while (<> && $i--) {
print "$i: $_";
}]] ></screen>
Multiple contiguous verbatim paragraphs are treated as a single *screen*
element, with blank lines separating the paragraphs, as dictated by
perlpodspec.
Command Paragraphs
"=head1 Heading Text"
"=head2 Heading Text"
"=head3 Heading Text"
"=head4 Heading Text"
All of the Pod heading commands produce DocBook *section* elements,
with the heading text as titles. Pod::2::DocBook (perlpod) only
allows for 4 heading levels, but DocBook allows arbitrary nesting;
see "Embedded DocBook Markup" if you need more than 4 levels.
Pod::2::DocBook only looks at relative heading levels to determine
nesting. For example, this bit of Pod:
=head1 1
Contents of section 1
=head2 1.1
Contents of section 1.1
and this bit of Pod:
=head1 1
Contents of section 1
=head3 1.1
Contents of section 1.1
both produce the same DocBook markup, which will look something like
this:
<section id="article-My-Article-1"><title>1</title>
<para>
Contents of section 1
</para>
<section id="article-My-Article-1-1"><title>1.1</title>
<para>
Contents of section 1.1
</para>
</section>
</section>
Note that Pod::2::DocBook automatically generates section
identifiers from your doctype, document title and section title. It
does the same when you make internal links (see "Formatting Codes",
ensuring that if you supply the same link text as you did for the
section title, the resulting identifiers will be the same.
"=over indentlevel"
"=item stuff..."
"=back"
"=over" ... "=back" regions are somewhat complex, in that they can
lead to a variety of DocBook constructs. In every case,
*indentlevel* is ignored by Pod::2::DocBook, since that's best left
to your stylesheets.
An "=over" ... "=back" region with no "=item"s represents indented
text and maps directly to a DocBook *blockquote* element. Given this
source:
=over 4
This text should be indented.
=back
Pod::2::DocBook will produce DocBook markup similar to this:
<blockquote>
<para>
This text should be indented.
</para>
</blockquote>
Inside an "=over" ... "=back" region, "=item" commands generate
lists. The text that follows the first "=item" determines the type
of list that will be output:
* "*" (an asterisk) produces <itemizedlist>
* "1" or "1." produces <orderedlist numeration="arabic">
* "a" or "a." produces <orderedlist numeration="loweralpha">
* "A" or "A." produces <orderedlist numeration="upperalpha">
* "i" or "i." produces <orderedlist numeration="lowerroman">
* "I" or "I." produces <orderedlist numeration="upperroman">
* anything else produces <variablelist>
Since the output from each of these is relatively verbose, the best
way to see examples is to actually render some Pod into DocBook.
"=pod"
"=cut"
Pod::Parser recognizes these commands, and, therefore, so does
Pod::2::DocBook, but they don't produce any output.
"=begin formatname"
"=end formatname"
"=for formatname text..."
Pod::2::DocBook supports two formats: docbook, explained in
"Embedded DocBook Markup", and table, explained in "Simple Tables".
"=encoding encodingname"
This command is currently not supported. If Pod::2::DocBook
encounters a document that contains "=encoding", it will ignore the
command and report an error ("unknown command `%s' at line %d in
file %s").
Embedded DocBook Markup
There are a wide range of DocBook structures for which there is no Pod
equivalent. For these, you will have to provide your own markup using
=begin docbook ... =end docbook or =for docbook .... Pod::2::DocBook
will directly output whatever text you provide, unprocessed, so it's up
to you to ensure that it's valid DocBook.
Images, footnotes and many inline elements are obvious candidates for
embedded markup. Another possible use is nesting sections more than
four-deep. For example, given this source:
=head1 1
This is Section 1
=head2 1.1
This is Section 1.1
=head3 1.1.1
This is Section 1.1.1
=head4 1.1.1.1
This is Section 1.1.1.1
=begin docbook
<section>
<title>1.1.1.1.1</title>
<para>This is Section 1.1.1.1.1</para>
</section>
=end docbook
Pod::2::DocBook will generate DocBook markup similar to this:
<section id="article-My-Article-1"><title>1</title>
<para>
This is Section 1
</para>
<section id="article-My-Article-1-1"><title>1.1</title>
<para>
This is Section 1.1
</para>
<section id="article-My-Article-1-1-1"><title>1.1.1</title>
<para>
This is Section 1.1.1
</para>
<section id="article-My-Article-1-1-1-1"><title>1.1.1.1</title>
<para>
This is Section 1.1.1.1
</para>
<section>
<title>1.1.1.1.1</title>
<para>This is Section 1.1.1.1.1</para>
</section>
</section>
</section>
</section>
</section>
Simple Tables
Pod::2::DocBook also provides a mechanism for generating basic tables
with =begin table and =end docbook. If you have simple tabular data or a
CSV file exported from some application, Pod::2::DocBook makes it easy
to generate a table from your data. The syntax is intended to be simple,
so DocBook's entire table feature set is not represented, but even if
you do need more complex table markup than Pod::2::DocBook produces, you
can rapidly produce some markup which you can hand-edit and then embed
directly in your Pod with =begin docbook ... =end docbook. Each table
definition spans multiple lines, so there is no equivalent =for table
command.
The first line of a table definition gives the table's title. The second
line gives a list of comma-separated column specifications (really just
column alignments), each of which can be left, center or right. The
third line is a list of comma-separated column headings, and every
subsequent line consists of comma-separated row data. If any of your
data actually contain commas, you can enclose them in double quotes; if
they also contain double quotes, you must escape the inner quotes with
backslashes (typical CSV stuff).
Here's an example:
=begin table
Sample Table
left,center,right
Powers of Ten,Planets,Dollars
10,Earth,$1
100,Mercury,$5
1000,Mars,$10
10000,Venus,$20
100000,"Jupiter, Saturn",$50
=end table
And here's what Pod::2::DocBook would do with it:
<table>
<title>Sample Table</title>
<tgroup cols="3">
<colspec align="left">
<colspec align="center">
<colspec align="right">
<thead>
<row>
<entry>Powers of Ten</entry>
<entry>Planets</entry>
<entry>Dollars</entry>
</row>
</thead>
<tbody>
<row>
<entry>10</entry>
<entry>Earth</entry>
<entry>$1</entry>
</row>
<row>
<entry>100</entry>
<entry>Mercury</entry>
<entry>$5</entry>
</row>
<row>
<entry>1000</entry>
<entry>Mars</entry>
<entry>$10</entry>
</row>
<row>
<entry>10000</entry>
<entry>Venus</entry>
<entry>$20</entry>
</row>
<row>
<entry>100000</entry>
<entry>Jupiter, Saturn</entry>
<entry>$50</entry>
</row>
</tbody>
</tgroup>
</table>
Formatting Codes
Pod formatting codes render directly into DocBook as inline elements:
* "I<text>"
<emphasis role="italic">text</emphasis>
* "B<text>"
<emphasis role="bold">text</emphasis>
* "C<code>"
<literal role="code"><![CDATA[code]] ></literal>
* "L<name>"
<citerefentry><refentrytitle>name</refentrytitle></citerefentry>
* "L<name(n)>"
<citerefentry><refentrytitle>name</refentrytitle>
<manvolnum>n</manvolnum></citerefentry>
* "L<name/"sec">" or "L<name/sec>"
<quote>sec</quote> in <citerefentry>
<refentrytitle>name</refentrytitle></citerefentry>
* "L<name(n)/"sec">" or "L<name(n)/sec>"
<quote>sec</quote> in <citerefentry>
<refentrytitle>name</refentrytitle><manvolnum>n</manvolnum>
</citerefentry>
* "L</"sec">" or "L</sec>" or "L<"sec">"
<link linkend="article-My-Article-sec"><quote>sec</quote></link>
* "L<text|name>"
text
* "L<text|name/"sec">" or "L<text|name/sec>"
text
* "L<text|/"sec">" or "L<text|/sec>" or "L<text|"sec">"
<link linkend="article-My-Article-sec"><quote>text</quote></link>
* "L<scheme:...>"
<ulink url="scheme:...">scheme:...</ulink>
* "E<verbar>"
|
* "E<sol>"
/
* "E<number>"
&#number;
* any other "E<escape>"
&escape;
* "F<filename>"
<filename>filename</filename>
* "S<text with spaces>"
text with spaces
* "X<topic name>"
<indexterm><primary>topic name</primary></indexterm>
DIAGNOSTICS
Pod::2::DocBook makes every possible effort to produce valid DocBook
markup, even with malformed POD source. Any processing errors will be
noted in comments at the end of the output document. Even when errors
occur, Pod::2::DocBook always reads the entire input document and never
exits with a non-zero status.
unknown command `%s' at line %d in file %s
See "Command Paragraph" in perlpod for a list of valid commands. The
command referenced in the error message was ignored.
formatting code `%s' nested within `%s' at line %d in file %s
See "Formatting Codes" in perlpod for details on which formatting
codes can be nested. The offending code was translated into the
output document as the raw text inside its angle brackets.
unknown formatting code `%s' at line in file %s
The input contained a formatting code not listed in perlpod; it was
translated into the output document as the raw text inside the angle
brackets.
empty L<> at line %d in file %s
Self-explanatory.
invalid escape `%s' at line %d in file %s
Self-explanatory; it was translated into the output document as the
raw text inside the angle brackets.
=item must be inside an =over ... =back section at line %d in file %s
Self-explanatory. The `=item' referenced in the error was ignored.
`=end %s' found but current region opened with `=begin %s'
The closest `=end' command to the referenced `=begin' didn't match;
processing continued as if the mismatched `=end' wasn't there.
no matching `=end' for `=begin %s'
Pod::2::DocBook reached the end of its input without finding an
`=end' command to match the `=begin' referenced in the error;
end-of-file processing continued.
unknown colspec `%s' in table at line %d in file %s
See "Simple Tables" for a list of supported column specifications.
encountered unknown state `%s' (this should never happen)
The state referred to is an internal variable used to properly
manage nested DocBook structures. You should indeed never see this
message, but if you do, you should contact the module's author.
SEE ALSO
pod2docbook, perlpod, Pod::DocBook, SVN repo -
<https://cle.sk/repos/pub/cpan/Pod-2-DocBook/>,
<http://www.ohloh.net/projects/pod-2-docbook>, doc/ +
examples/pod2docbook-docbook/ for Pod::2::DocBook DocBook documentation
DocBook related links: <http://www.docbook.org/>,
<http://www.sagehill.net/docbookxsl/>,
<http://developers.cogentrts.com/cogent/prepdoc/pd-axfrequentlyuseddocbo
oktags.html>
AUTHOR
Alligator Descartes <descarte@symbolstone.org> wrote a module called
Pod::2::DocBook, which was later maintained by Jan Iven
<jan.iven@cern.ch>. That module was based on the original pod2html by
Tom Christiansen <tchrist@mox.perl.com>.
Nandu Shah <nandu@zvolve.com> wrote Pod::DocBook, which is unrelated to
the previous module (even though they both perform the same function).
(<http://search.cpan.org/~nandu/Pod-DocBook-1.2/>)
Jozef Kutej <jkutej@cpan.org> renamed the module to Pod::2::DocBook
because Nandus version was buried in the CPAN archive as an
"UNAUTHORIZED RELEASE".
COPYRIGHT
Copyright 2004, Nandu Shah <nandu@zvolve.com>
Copyright 2008, Jozef Kutej <jkutej@cpan.org>
This library is free software; you may redistribute it and/or modify it
under the same terms as Perl itself
|