1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745
|
<!DOCTYPE PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>SWI-Prolog SGML/XML parser</title>
</head>
<body>
<center>
<h1>SWI-Prolog SGML/XML parser<br>
<font size="+0"><em>Version 1.0.14, March 2001</em></font></h1>
<br>
<a href="mailto:jan@swi.psy.uva.nl"><em>Jan Wielemaker</em></a><br>
<em> SWI, <br>
University of Amsterdam <br>
The Netherlands
</em>
</center>
<blockquote>
<hr>
Markup languages are an increasingly important method for
data-representation and exchange. This article documents the package
<b>sgml2pl</b>, a foreign library for SWI-Prolog to parse SGML and XML
documents, returning information on both the document and the document's
DTD. The parser is designed to be small, fast and flexible.
<hr>
</blockquote>
<h1>Table of Content</h1>
<ul>
<li><a href="#sec:sec-1">Introduction</a>
<li><a href="#sec:sec-2">Bluffer's Guide</a>
<ul>
<li><a href="#sec:sec-2.1">`Goodies' Predicates</a>
</ul>
<li><a href="#sec:sec-3">Predicate Reference</a>
<ul>
<li><a href="#sec:sec-3.1">Loading Structured Documents</a>
<li><a href="#sec:space">Handling white-space</a>
<li><a href="#sec:xml">XML documents</a>
<ul>
<li><a href="#sec:xmlns">XML Namespaces</a>
</ul>
<li><a href="#sec:sec-3.4">DTD-Handling</a>
<ul>
<li><a href="#sec:sec-3.4.1">The DOCTYPE declaration</a>
</ul>
<li><a href="#sec:implicitdtd">Extracting a DTD</a>
<li><a href="#sec:sec-3.6">Parsing Primitives</a>
<ul>
<li><a href="#sec:sec-3.6.1">Partial Parsing</a>
</ul>
</ul>
<li><a href="#sec:indexaccess">Processing Indexed Files</a>
<li><a href="#sec:sec-5">External entities</a>
<li><a href="#sec:sec-6">Unsupported features</a>
<li><a href="#sec:sec-7">Installation</a>
<ul>
<li><a href="#sec:sec-7.1">Unix systems</a>
</ul>
<li><a href="#sec:sec-8">Acknowledgements</a>
</ul>
<p>
<h1><a name="sec:sec-1">Introduction</a></h1>
<p>
Markup languages have recently regained popularity for two reasons. One
is document exchange, which is largely based on HTML, an instance of
SGML, and the other is for data exchange between programs, which is often
based on XML, which can be considered a simplified and rationalised
version of SGML.
<p>
James Clark's SP parser is a flexible SGML and XML parser. Unfortunately
it has some drawbacks. It is very big, not very fast, cannot work under
event-driven input and is generally hard to program beyond the scope of
the well designed generic interface. The generic interface however does
not provide access to the DTD, does not allow for flexible handling of
input or parsing the DTD independently of a document instance.
<p>
The parser described in this document is small (less than 50 kbytes
executable on a Pentium or 80 kbytes on a SPARC), fast (between 2 and 5
times faster than SP), provides access to the DTD, and provides flexible
input handling.
<p>
The document output is equal to the output produced by <b><em>xml2pl</em></b>,
an SP interface to SWI-Prolog written by Anjo Anjewierden.
<p>
<h1><a name="sec:sec-2">Bluffer's Guide</a></h1>
<p>
This package allows you to parse SGML, XML and HTML data into a Prolog
data structure. The high-level interface defined in <b>sgml</b>
provides access at the file-level, while the low-level interface defined
in the foreign module works with Prolog streams. Please use the source
of <b>sgml.pl</b> as a starting point for dealing with data from
other sources than files, such as SWI-Prolog resources, network-sockets,
character strings, <em>etc.</em> The first example below loads an HTML file.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Demo</title>
</head>
<body>
<h1 align=center>This is a demo</title>
<p>Paragraphs in HTML need not be closed.
<p>This is called `omitted-tag' handling.
</body>
</html>
</pre>
</td></tr>
</table>
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
?- load_html_file('test.html', Term),
pretty_print(Term).
[ element(html,
[],
[ element(head,
[],
[ element(title,
[],
[ 'Demo'
])
]),
element(body,
[],
[ '\n',
element(h1,
[ align = center
],
[ 'This is a demo'
]),
'\n\n',
element(p,
[],
[ 'Paragraphs in HTML need not be closed.\n'
]),
element(p,
[],
[ 'This is called `omitted-tag\' handling.'
])
])
])
].
</pre>
</td></tr>
</table>
<p>
The document is represented as a list, each element being an atom to
represent <a name="const:CDATA"><b><tt>CDATA</tt></b></a> or a term <code>element(Name, Attributes, Content)</code>.
Entities (e.g. <code>&lt;</code>) are returned as part of <a name="const:CDATA%2"><b><tt>CDATA</tt></b></a>,
unless they cannot be represented. See <a href="#load_sgml_file/2"><b>load_sgml_file/2</b></a> for
details.
<p>
<h2><a name="sec:sec-2.1">`Goodies' Predicates</a></h2>
<p>
These predicates are for basic use of the library, converting entire and
self-contained files in SGML, HTML, or XML into a structured term. They
are based on <a href="#load_structure/3"><b>load_structure/3</b></a>.
<dl>
<dt> <br>
<b><a name="load_sgml_file/2">load_sgml_file(<var>+File, -ListOfContent</var>)</a></b><dd>
Same as <code>load_structure(File, ListOfContent, [dialect(sgml)])</code>.
<dt> <br>
<b><a name="load_xml_file/2">load_xml_file(<var>+File, -ListOfContent</var>)</a></b><dd>
Same as <code>load_structure(File, ListOfContent, [dialect(xml)])</code>.
<dt> <br>
<b><a name="load_html_file/2">load_html_file(<var>+File, -Content</var>)</a></b><dd>
Load <var>File</var> and parse as HTML. Implemented as:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
load_html_file(File, Term) :-
dtd(html, DTD),
load_structure(File, Term,
[ dtd(DTD),
dialect(sgml)
]).
</pre>
</td></tr>
</table>
</dl>
<p>
<h1><a name="sec:sec-3">Predicate Reference</a></h1>
<p>
<h2><a name="sec:sec-3.1">Loading Structured Documents</a></h2>
<p>
SGML or XML files are loaded through the common predicate
<a href="#load_structure/3"><b>load_structure/3</b></a>. This is a predicate with many options. For
simplicity a number of commonly used shorthands are provided: <a href="#load_sgml_file/2"><b>load_sgml_file/2</b></a>, <a href="#load_xml_file/2"><b>load_xml_file/2</b></a>, and
<a href="#load_html_file/2"><b>load_html_file/2</b></a>.
<dl>
<dt> <br>
<b><a name="load_structure/3">load_structure(<var>+File, -ListOfContent, +Options</var>)</a></b><dd>
Load the XML file <var>File</var> and return the resulting structure in <var>ListOfContent</var>. <var>Options</var> is a list of options controlling the
conversion process.
<p>
A proper XML document contains only a single toplevel element whose name
matches the document type. Nevertheless, a list is returned for
consistency with the representation of element content. The <var>ListOfContent</var> consists of the following types:
<dl>
<dt> <br>
<b><var>Atom</var></b><dd>
Atoms are used to represent <a name="const:CDATA%3"><b><tt>CDATA</tt></b></a>. Note
this is possible in SWI-Prolog, as there is no length-limit on atoms and
atom garbage collection is provided.
<dt> <br>
<b>element(<var>Name, ListAttributes, ListOfContent</var>)</b><dd>
<var>Name</var> is the name of the element. Using SGML, which is
case-insensitive, all element names are returned as lowercase atoms.
<p>
<var>ListOfAttributes</var> is a list of <var>Name</var>=<var>Value</var> pairs for
attributes that appeared in the source. No information is returned on
other attributes, such as <a name="const:fixed"><b><tt>fixed</tt></b></a> or <a name="const:default"><b><tt>default</tt></b></a> attributes. See
<a href="#dtd_property/2"><b>dtd_property/2</b></a> for accessing the DTD for this information.
Attributes of type <a name="const:CDATA%4"><b><tt>CDATA</tt></b></a> are returned literal. Multi-valued
attributes (<a name="const:NAMES"><b><tt>NAMES</tt></b></a>, <em>etc.</em>) are returned as a list of atoms.
Handling attributes of the types <a name="const:NUMBER"><b><tt>NUMBER</tt></b></a> and <a name="const:NUMBERS"><b><tt>NUMBERS</tt></b></a> depends on
the setting of the <code>number(+NumberMode)</code> attribute through
<a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a> or <a href="#load_structure/3"><b>load_structure/3</b></a>. By
default they are returned as atoms, but automatic conversion to Prolog
integers is supported. <var>ListOfContent</var> defines the content for the
element.
<dt> <br>
<b>entity(<var>Code</var>)</b><dd>
If a character-entity (e.g. <code>&#913;</code>) is encoutered that
cannot be represented in the Prolog character set, this term is
returned, representing the referred character code.
<dt> <br>
<b>entity(<var>Name</var>)</b><dd>
If an entity refers to a character-entity holding a single character,
but this character cannot be represented in the Prolog character set,
this term is returned. For example, the HTML input text
<code>&Alpha; &lt; &Beta;</code>
is returned as below. Please note that entity names are case sensitive
in both SGML and XML.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
[ entity('Alpha'), ' < ', entity('Beta') ]
</pre>
</td></tr>
</table>
<p>
This is a special case of <code>entity(Code)</code>, intended to handle
special symbols by their name rather than character code.
<dt> <br>
<b>sdata(<var>Text</var>)</b><dd>
If an entity with declared content-type <a name="const:SDATA"><b><tt>SDATA</tt></b></a> is encountered, this
term is returned holding the data in <var>Text</var>.
<dt> <br>
<b>ndata(<var>Text</var>)</b><dd>
If an entity with declared content-type <a name="const:NDATA"><b><tt>NDATA</tt></b></a> is encountered, this
term is returned holding the data in <var>Text</var>.
<dt> <br>
<b>pi(<var>Text</var>)</b><dd>
If a processing instruction is encountered (<code><?...?></code>), <var>Text</var> holds the text of the processing instruction. Please note that the
<code><?xml ...?></code> instruction is handled internally.
</dl>
<p>
The <var>Options</var> list controls the conversion process. Currently
defined options are:
<dl>
<dt> <br>
<b>dtd(<var>?DTD</var>)</b><dd>
Reference to a DTD object. If specified, the <code><!DOCTYPE ...></code>
declaration is ignored and the document is parsed and validated against
the provided DTD. If provided as a variable, the created DTD is
returned. See <a href="#sec:implicitdtd">implicitdtd</a>.
<dt> <br>
<b>dialect(<var>+Dialect</var>)</b><dd>
Specify the parsing dialect. Supported are <a name="const:sgml"><b><tt>sgml</tt></b></a> (default), <a name="const:xml"><b><tt>xml</tt></b></a>
and <a name="const:xmlns"><b><tt>xmlns</tt></b></a>. See <a href="#sec:xml">xml</a> for details on the differences.
<dt> <br>
<b>shorttag(<var>+Bool</var>)</b><dd>
Define whether SHORTTAG abbreviation is accepted. The default is true
for SGML mode and false for the XML modes. Without SHORTTAG, a <a name="const:/"><b><tt>/</tt></b></a>
is accepted with warning as part of an unquoted attribute-value, though
<a name="const:/>"><b><tt>/></tt></b></a> still closes the element-tag in XML mode. It may be set to
false for parsing HTML documents to allow for unquoted URLs containing
<a name="const:/%2"><b><tt>/</tt></b></a>.
<dt> <br>
<b>space(<var>+SpaceMode</var>)</b><dd>
Sets the `space-handling-mode' for the initial environment. This mode is
inherited by the other environments, which can override the inherited
value using the XML reserved attribute <b><tt>xml:space</tt></b>. See <a href="#sec:space">space</a>.
<dt> <br>
<b>number(<var>+NumberMode</var>)</b><dd>
Determines how attributes of type <a name="const:NUMBER%2"><b><tt>NUMBER</tt></b></a> and <a name="const:NUMBERS%2"><b><tt>NUMBERS</tt></b></a> are handled.
If <a name="const:token"><b><tt>token</tt></b></a> (default) they are passed as an atom. If <a name="const:integer"><b><tt>integer</tt></b></a> the
parser attempts to convert the value to an integer. If successful, the
attribute is passed as a Prolog integer. Otherwise it is still passed
as an atom. Note that SGML defines a numeric attribute to be a sequence
of digits. The <a name="const:-"><b><tt>-</tt></b></a> sign is not allowed and <code>1</code> is different from
<code>01</code>. For this reason the default is to handle numeric attributes as
tokens. If conversion to integer is enabled, negative values are silently
accepted.
<dt> <br>
<b>defaults(<var>+Bool</var>)</b><dd>
Determines how default and fixed values from the DTD are used. By
default, defaults are included in the output if they do not appear in
the source. If <a name="const:false"><b><tt>false</tt></b></a>, only the attributes occurring in the source
are emitted.
<dt> <br>
<b>file(<var>+Name</var>)</b><dd>
Sets the name of the file on which errors are reported. Sets the
linenumber to 1.
<dt> <br>
<b>line(<var>+Line</var>)</b><dd>
Sets the starting line-number for reporting errors.
<dt> <br>
<b>max_errors(<var>+Max</var>)</b><dd>
Sets the maximum number of errors. If this number is reached, an
exception of the format below is raised. The default is 50.
<blockquote>
<code>error(limit_exceeded(max_errors, Max), _)</code>
</blockquote>
</dl>
</dl>
<p>
<h2><a name="sec:space">Handling white-space</a></h2>
<p>
SGPL2PL has four modes for handling white-space. The initial mode can be
switched using the <code>space(SpaceMode)</code> option to
<a href="#load_structure/3"><b>load_structure/3</b></a> and <a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a>. In XML
mode, the mode is further controlled by the <b><tt>xml:space</tt></b> attribute,
which may be specified both in the DTD and in the document. The defined
modes are:
<dl>
<dt> <br>
<b>space(<var>sgml</var>)</b><dd>
In SGML, newlines at the start and end of an element are removed.<a href="#fn-1" name="txt:fn-1"><sup>1</sup></a> This is the default mode for
the SGML dialect.
<dt> <br>
<b>space(<var>preserve</var>)</b><dd>
White space is passed literally to the application. This mode leaves all
white space handling to the application. This is the default mode for
the XML dialect.
<dt> <br>
<b>space(<var>default</var>)</b><dd>
In addition to <a name="const:sgml%2"><b><tt>sgml</tt></b></a> space-mode, all consequtive white-space is
reduced to a single space-character. This mode canonises all white
space.
<dt> <br>
<b>space(<var>remove</var>)</b><dd>
In addition to <a name="const:default%2"><b><tt>default</tt></b></a>, all leading and trailing white-space is
removed from <a name="const:CDATA%5"><b><tt>CDATA</tt></b></a> objects. If, as a result, the <a name="const:CDATA%6"><b><tt>CDATA</tt></b></a>
becomes empty, nothing is passed to the application. This mode is
especially handy for processing `data-oriented' documents, such as RDF.
It is not suitable for normal text documents. Consider the HTML
fragment below. When processed in this mode, the spaces between the
three modified words are lost. This mode is not part of any standard;
XML 1.0 allows only <a name="const:default%3"><b><tt>default</tt></b></a> and <a name="const:preserve"><b><tt>preserve</tt></b></a>.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
Consider adjacent <b>bold</b> <ul>and</ul> <it>italic</it> words.
</pre>
</td></tr>
</table>
</dl>
<p>
<h2><a name="sec:xml">XML documents</a></h2>
<p>
The parser can operate in two modes: <a name="const:sgml%3"><b><tt>sgml</tt></b></a> mode and <a name="const:xml%2"><b><tt>xml</tt></b></a> mode, as
defined by the <code>dialect(Dialect)</code> option. Regardless of this
option, if the first line of the document reads as below, the parser is
switched automatically into XML mode.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
<?xml ... ?>
</pre>
</td></tr>
</table>
<p>
Currently switching to XML mode implies:
<ul>
<li><em>XML empty elements</em><br>
The construct <code><element [attribute...] /></code> is recognised as
an empty element.
<li><em>Predefined entities</em><br>
The following entitities are predefined: <a name="const:lt"><b><tt>lt</tt></b></a> (<code><</code>), <a name="const:gt"><b><tt>gt</tt></b></a>
(<code>></code>), <a name="const:amp"><b><tt>amp</tt></b></a> (<code>&</code>), <a name="const:apos"><b><tt>apos</tt></b></a> (<code>'</code>)
and <a name="const:quot"><b><tt>quot</tt></b></a> (<code>"</code>).
<li><em>Case sensitivity</em><br>
In XML mode, names are treated case-sensitive, except for the DTD
reserved names (i.e. <code>ELEMENT</code>, <em>etc.</em>).
<li><em>Character classes</em><br>
In XML mode, underscores (<code>_</code>) and colon (<code>:</code>) are
allowed in names.
<li><em>White-space handling</em><br>
White space mode is set to <a name="const:preserve%2"><b><tt>preserve</tt></b></a>. In addition to setting
white-space handling at the toplevel the XML reserved attribute
<b><tt>xml:space</tt></b> is honoured. It may appear both in the document and the
DTD. The <a name="const:remove"><b><tt>remove</tt></b></a> extension is honoured as <b><tt>xml:space</tt></b> value. For
example, the DTD statement below ensures that the <b><tt>pre</tt></b> element
preserves space, regardless of the default processing mode.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
<!ATTLIST pre xml:space nmtoken #fixed preserve>
</pre>
</td></tr>
</table>
</ul>
<p>
<h3><a name="sec:xmlns">XML Namespaces</a></h3>
<p>
Using the <b><em>dialect</em></b> <a name="const:xmlns%2"><b><tt>xmlns</tt></b></a>, the parser will interpret XML
namespaces. In this case, the names of elements are returned as a term
of the format
<blockquote>
<var>URL</var><a name="const::"><b><tt>:</tt></b></a><var>LocalName</var>
</blockquote>
<p>
If an identifier has no namespace and there is no default namespace it
is returned as a simple atom. If an identifier has a namespace but this
namespace is undeclared, the namespace name rather than the related URL
is returned.
<p>
Attributes declaring namespaces (<code>xmlns:<var>ns</var>=<var>url</var></code>)
are reported as if <a name="const:xmlns%3"><b><tt>xmlns</tt></b></a> were not a defined resource.
<p>
In many cases, getting attribute-names as <code><var>url</var>:<var>name</var></code>
is not desirable. Such terms are hard to unify and sometimes multiple
URLs may be mapped to the same identifier. This may happen due to poor
version management, poor standardisation or because the the application
doesn't care too much about versions. This package defines two
call-backs that can be set using <a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a> to deal
with this problem.
<p>
The call-back <a name="const:xmlns%4"><b><tt>xmlns</tt></b></a> is called as XML namespaces are noticed.
It can be used to extend a canonical mapping for later use
by the <a name="const:urlns"><b><tt>urlns</tt></b></a> call-back. The following illustrates this behaviour.
Any namespace containing <a name="const:rdf-syntax"><b><tt>rdf-syntax</tt></b></a> in its URL or that is used as
<a name="const:rdf"><b><tt>rdf</tt></b></a> namespace is canonised to <a name="const:rdf%2"><b><tt>rdf</tt></b></a>. This implies that any
attribute and element name from the RDF namespace appears as
<code>rdf:<var>name</var></code>.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
:- dynamic
xmlns/3.
on_xmlns(rdf, URL, _Parser) :- !,
asserta(xmlns(URL, rdf, _)).
on_xmlns(_, URL, _Parser) :-
sub_atom(URL, _, _, _, 'rdf-syntax'), !,
asserta(xmlns(URL, rdf, _)).
load_rdf_xml(File, Term) :-
load_structure(File, Term,
[ dialect(xmlns),
call(xmlns, on_xmlns),
call(urlns, xmlns)
]).
</pre>
</td></tr>
</table>
<p>
<h2><a name="sec:sec-3.4">DTD-Handling</a></h2>
<p>
The DTD (<b>D</b>ocument <b>T</b>ype <b>D</b>efinition) is a separate
entity in sgml2pl, that can be created, freed, defined and inspected.
Like the parser itself, it is filled by opening it as a Prolog output
stream and sending data to it. This section summarises the predicates
for handling the DTD.
<dl>
<dt> <br>
<b><a name="new_dtd/2">new_dtd(<var>+DocType, -DTD</var>)</a></b><dd>
Creates an empty DTD for the named <var>DocType</var>. The returned
DTD-reference is an opaque term that can be used in the other predicates
of this package.
<dt> <br>
<b><a name="free_dtd/1">free_dtd(<var>+DTD</var>)</a></b><dd>
Deallocate all resources associated to the DTD. Further use of <var>DTD</var>
is invalid.
<dt> <br>
<b><a name="load_dtd/2">load_dtd(<var>+DTD, +File</var>)</a></b><dd>
Define the DTD by loading the SGML-DTD file <var>File</var>. This predicate
is defined using the low-level <a href="#open_dtd/3"><b>open_dtd/3</b></a> predicate:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
load_dtd(DTD, DtdFile) :-
open_dtd(DTD, [], DtdOut),
open(DtdFile, read, DtdIn),
copy_stream_data(DtdIn, DtdOut),
close(DtdIn),
close(DtdOut).
</pre>
</td></tr>
</table>
<dt> <br>
<b><a name="open_dtd/3">open_dtd(<var>+DTD, +Options, -OutStream</var>)</a></b><dd>
Open a DTD as an output stream. The option-list is currently
empty. See <a href="#load_dtd/2"><b>load_dtd/2</b></a> for an example.
<dt> <br>
<b><a name="dtd/2">dtd(<var>+DocType, -DTD</var>)</a></b><dd>
Find the DTD representing the indicated <b><em>doctype</em></b>. This predicate
uses a cache of DTD objects. If a doctype has no associated dtd, it
searches for a file using the file search path <code>dtd</code> using the call:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
...,
absolute_file_name(dtd(Type),
[ extensions([dtd]),
access(read)
], DtdFile),
...
</pre>
</td></tr>
</table>
<dt> <br>
<b><a name="dtd_property/2">dtd_property(<var>+DTD, ?Property</var>)</a></b><dd>
This predicate is used to examine the content of a DTD. Property is one
of:
<dl>
<dt> <br>
<b>doctype(<var>DocType</var>)</b><dd>
An atom representing the document-type defined by this DTD.
<dt> <br>
<b>elements(<var>ListOfElements</var>)</b><dd>
A list of atoms representing the names of the elements in this DTD.
<dt> <br>
<b>element(<var>Name, Omit, Content</var>)</b><dd>
The DTD contains an element with the given name. <var>Omit</var> is a term
of the format <code>omit(OmitOpen, OmitClose)</code>, where both
arguments are booleans (<a name="const:true"><b><tt>true</tt></b></a> or <a name="const:false%2"><b><tt>false</tt></b></a> representing whether the
open- or close-tag may be omitted. <var>Content</var> is the content-model
of the element represented as a Prolog term. This term takes the
following form:
<dl>
<dt> <br>
<b><a name="const:empty"><b><tt>empty</tt></b></a></b><dd>
The element has no content.
<dt> <br>
<b><a name="const:cdata"><b><tt>cdata</tt></b></a></b><dd>
The element contains non-parsed character data. All data up to the
matching end-tag is included in the data (<b><em>declared content</em></b>).
<dt> <br>
<b><a name="const:rcdata"><b><tt>rcdata</tt></b></a></b><dd>
As <a name="const:cdata%2"><b><tt>cdata</tt></b></a>, but entity-references are expanded.
<dt> <br>
<b><a name="const:any"><b><tt>any</tt></b></a></b><dd>
The element may contain any number of any element from the DTD in
any order.
<dt> <br>
<b><a name="const:#pcdata"><b><tt>#pcdata</tt></b></a></b><dd>
The element contains parsed character data .
<dt> <br>
<b><var>element</var></b><dd>
An element with this name.
<dt> <br>
<b>*(<var>SubModel</var>)</b><dd>
0 or more appearances.
<dt> <br>
<b>?(<var>SubModel</var>)</b><dd>
0 or one appearance.
<dt> <br>
<b>+(<var>SubModel</var>)</b><dd>
1 or more appearances.
<dt> <br>
<b>,(<var>SubModel1, SubModel2</var>)</b><dd>
<var>SubModel1</var> followed by <var>SubModel2</var>.
<dt> <br>
<b>&(<var>SubModel1, SubModel2</var>)</b><dd>
<var>SubModel1</var> and <var>SubModel2</var> in any order.
<dt> <br>
<b>|(<var>SubModel1, SubModel2</var>)</b><dd>
<var>SubModel1</var> or <var>SubModel2</var>.
</dl>
<dt> <br>
<b>attributes(<var>Element, ListOfAttributes</var>)</b><dd>
<var>ListOfAttributes</var> is a list of atoms representing the attributes
of the element <var>Element</var>.
<dt> <br>
<b>attribute(<var>Element, Attribute, Type, Default</var>)</b><dd>
Query an element. <var>Type</var> is one of <a name="const:cdata%3"><b><tt>cdata</tt></b></a>, <a name="const:entity"><b><tt>entity</tt></b></a>, <a name="const:id"><b><tt>id</tt></b></a>,
<a name="const:idref"><b><tt>idref</tt></b></a>, <a name="const:name"><b><tt>name</tt></b></a>, <a name="const:nmtoken"><b><tt>nmtoken</tt></b></a>, <a name="const:notation"><b><tt>notation</tt></b></a>, <a name="const:number"><b><tt>number</tt></b></a> or
<a name="const:nutoken"><b><tt>nutoken</tt></b></a>. For DTD types that allow for a list, the notation
<code>list(Type)</code> is used. Finally, the DTD construct
<code>(a|b|...)</code> is mapped to the term
<code>nameof(ListOfValues)</code>.
<p>
<var>Default</var> describes the sgml default. It is one <a name="const:required"><b><tt>required</tt></b></a>,
<a name="const:current"><b><tt>current</tt></b></a>, <a name="const:conref"><b><tt>conref</tt></b></a> or <a name="const:implied"><b><tt>implied</tt></b></a>. If a real default is present, it
is one of <code>default(Value)</code> or <code>fixed(Value)</code>.
<dt> <br>
<b>entities(<var>ListOfEntities</var>)</b><dd>
<var>ListOfEntities</var> is a list of atoms representing the names of the
defined entities.
<dt> <br>
<b>entity(<var>Name, Value</var>)</b><dd>
<var>Name</var> is the name of an entity with given value. Value is one of
<dl>
<dt> <br>
<b><var>Atom</var></b><dd>
If the value is atomic, it represents the
literal value of the entity.
<dt> <br>
<b>system(<var>Url</var>)</b><dd>
<var>Url</var> is the URL of the system external entity.
<dt> <br>
<b>public(<var>Id, Url</var>)</b><dd>
For external public entities, <var>Id</var> is the identifier. If an URL is
provided this is returned in <var>Url</var>. Otherwise this argument is
unbound.
</dl>
<dt> <br>
<b>notations(<var>ListOfNotations</var>)</b><dd>
Returns a list holding the names of all <a name="const:NOTATION"><b><tt>NOTATION</tt></b></a> declarations.
<dt> <br>
<b>notation(<var>Name, Decl</var>)</b><dd>
Unify <var>Decl</var> with a list if <code>system(+File)</code> and/or
<code>public(+PublicId)</code>.
</dl>
</dl>
<p>
<h3><a name="sec:sec-3.4.1">The DOCTYPE declaration</a></h3>
<p>
As this parser allows for processing partial documents and process the
DTD separately, the DOCTYPE declaration plays a special role.
<p>
If a document has no DOCTYPE declaraction, the parser returns a list
holding all elements and CDATA found. If the document has a DOCTYPE
declaraction, the parser will open the element defined in the DOCTYPE as
soon as the first real data is encountered.
<p>
<h2><a name="sec:implicitdtd">Extracting a DTD</a></h2>
<p>
Some documents have no DTD. One of the neat facilities of this library
is that it builds a DTD while parsing a document with an <b><em>implicit</em></b> DTD. The resulting DTD contains all elements encountered in
the document. For each element the content model is a disjunction of
elements and possibly <code>#PCDATA</code> that can be repeated. Thus,
if we found element <b><tt>y</tt></b> and CDATA in element <b><tt>x</tt></b>, the model
is:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
<!ELEMENT x - - (y|#PCDATA)*>
</pre>
</td></tr>
</table>
<p>
Any encountered attribute is added to the attribute list with the type
<a name="const:CDATA%7"><b><tt>CDATA</tt></b></a> and default <a name="const:#IMPLIED"><b><tt>#IMPLIED</tt></b></a>.
<p>
The example below extracts the elements used in an unknown XML document.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
elements_in_xml_document(File, Elements) :-
load_structure(File, _,
[ dialect(xml),
dtd(DTD)
]),
dtd_property(DTD, elements(Elements)),
free_dtd(DTD).
</pre>
</td></tr>
</table>
<p>
<h2><a name="sec:sec-3.6">Parsing Primitives</a></h2>
<dl>
<dt> <br>
<b><a name="new_sgml_parser/2">new_sgml_parser(<var>-Parser, +Options</var>)</a></b><dd>
Creates a new parser. A parser can be used one or multiple times for
parsing documents or parts thereof. It may be bound to a DTD or the DTD
may be left implicit, in which case it is created from the document
prologue or parsing is performed without a DTD. Options:
<dl>
<dt> <br>
<b>dtd(<var>?DTD</var>)</b><dd>
If specified with an initialised DTD, this DTD is used for parsing the
document, regardless of the document prologue. If specified using as a
variable, a reference to the created DTD is returned. This DTD may be
created from the document prologue or build implicitely from the
document's content.
</dl>
<dt> <br>
<b><a name="free_sgml_parser/1">free_sgml_parser(<var>+Parser</var>)</a></b><dd>
Destroy all resources related to the parser. This does not destroy the
DTD if the parser was created using the <code>dtd(DTD)</code> option.
<dt> <br>
<b><a name="set_sgml_parser/2">set_sgml_parser(<var>+Parser, +Option</var>)</a></b><dd>
Sets attributes to the parser. Currently defined attributes:
<dl>
<dt> <br>
<b>file(<var>File</var>)</b><dd>
Sets the file for reporting errors and warnings. Sets the line to 1.
<dt> <br>
<b>line(<var>Line</var>)</b><dd>
Sets the current line. Useful if the stream is not at the start of the
(file) object for generating proper line-numbers.
<dt> <br>
<b>charpos(<var>Offset</var>)</b><dd>
Sets the current character location. See also the <code>file(File)</code>
option.
<dt> <br>
<b>dialect(<var>Dialect</var>)</b><dd>
Set the markup dialect. Known dialects:
<dl>
<dt> <br>
<b><a name="const:sgml%4"><b><tt>sgml</tt></b></a></b><dd>
The default dialect is to process as SGML. This implies markup is
case-insensitive and standard SGML abbreviation is allowed (abreviated
attributes and omitted tags).
<dt> <br>
<b><a name="const:xml%3"><b><tt>xml</tt></b></a></b><dd>
This dialect is selected automatically if the processing instruction
<code><?xml ...></code> is encountered. See <a href="#sec:xml">xml</a> for details.
<dt> <br>
<b><a name="const:xmlns%5"><b><tt>xmlns</tt></b></a></b><dd>
Process file as XML file with namespace support. See <a href="#sec:xmlns">xmlns</a> for
details.
</dl>
<dt> <br>
<b>space(<var>SpaceMode</var>)</b><dd>
Define the initial handling of white-space in PCDATA. This attribute is
described in <a href="#sec:space">space</a>.
<dt> <br>
<b>number(<var>NumberMode</var>)</b><dd>
If <a name="const:token%2"><b><tt>token</tt></b></a> (default), attributes of type number are passed as a Prolog atom.
If <a name="const:integer%2"><b><tt>integer</tt></b></a>, such attributes are translated into Prolog integers. If
the conversion fails (e.g. due to overflow) a warning is issued and the
value is passed as an atom.
<dt> <br>
<b>doctype(<var>Element</var>)</b><dd>
Defines the toplevel element expected. If a <code><!DOCTYPE</code>
declaration has been parsed, the default is the defined doctype. The
parser can be instructed to accept the first element encountered as the
toplevel using <code>doctype(_)</code>. This feature is especially
useful when parsing part of a document (see the <a name="const:parse"><b><tt>parse</tt></b></a> option to
<a href="#sgml_parse/2"><b>sgml_parse/2</b></a>.
</dl>
<dt> <br>
<b><a name="get_sgml_parser/2">get_sgml_parser(<var>+Parser, -Option</var>)</a></b><dd>
Retrieve infomation on the current status of the parser. Notably useful
if the parser is used in the call-back mode. Currently defined options:
<dl>
<dt> <br>
<b>file(<var>-File</var>)</b><dd>
Current file-name. Note that this may be different from the provided
file if an external entity is being loaded.
<dt> <br>
<b>charpos(<var>-CharPos</var>)</b><dd>
Offset from where the parser started its processing in the file-object.
See <a href="#sec:indexaccess">indexaccess</a>.
<dt> <br>
<b>charpos(<var>-Start, -End</var>)</b><dd>
Character offsets of the start and end of the source processed causing the
current call-back. Used in <b>PceEmacs</b> to for colouring text in
SGML and XML modes.
<dt> <br>
<b>source(<var>-Stream</var>)</b><dd>
Prolog stream being processed. May be used in the <a name="const:on_begin"><b><tt>on_begin</tt></b></a>, <em>etc.</em>
callbacks from <a href="#sgml_parse/2"><b>sgml_parse/2</b></a>.
<dt> <br>
<b>dialect(<var>-Dialect</var>)</b><dd>
Return the current dialect used by the parser (<a name="const:sgml%5"><b><tt>sgml</tt></b></a>, <a name="const:xml%4"><b><tt>xml</tt></b></a> or <a name="const:xmlns%6"><b><tt>xmlns</tt></b></a>).
<dt> <br>
<b>event_class(<var>-Class</var>)</b><dd>
The <b><em>event class</em></b> can be requested in call-back events. It
denotes the cause of the event, providing useful information for syntax
highlighting. Defined values are:
<dl>
<dt> <br>
<b><a name="const:explicit"><b><tt>explicit</tt></b></a></b><dd>
The code generating this event is explicitely present in the
document.
<dt> <br>
<b><a name="const:omitted"><b><tt>omitted</tt></b></a></b><dd>
The current event is caused by the insertion of an omitted tag.
This may be a normal event in SGML mode or an error in XML mode.
<dt> <br>
<b><a name="const:shorttag"><b><tt>shorttag</tt></b></a></b><dd>
The current event (<a name="const:begin"><b><tt>begin</tt></b></a> or <a name="const:end"><b><tt>end</tt></b></a>) is caused by an
element written down using the <b><em>shorttag</em></b> notation
(<code><tag/value/></code>.
<dt> <br>
<b><a name="const:shortref"><b><tt>shortref</tt></b></a></b><dd>
The current event is caused by the expansion of a
<b><em>shortref</em></b>. This allows for highlighting shortref strings
in the source-text.
</dl>
<dt> <br>
<b>doctype(<var>-Element</var>)</b><dd>
Return the defined document-type (= toplevel element). See also
<a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a>.
<dt> <br>
<b>dtd(<var>-DTD</var>)</b><dd>
Return the currently used DTD. See dtd_property/2 for obtaining information
on the DTD such as element and attribute properties.
<dt> <br>
<b>context(<var>-StackOfElements</var>)</b><dd>
Returns the stack of currently open elements as a list. The head of this
list is the current element. This can be used to determine the context
of, for example, CDATA events in call-back mode. The elements
are passed as atoms. Currently no access to the attributes is provided.
<dt> <br>
<b>allowed(<var>-Elements</var>)</b><dd>
Determines which elements may be inserted at the current location. This
information is returned as a list of element-names. If character data is
allowed in the current location, <a name="const:#pcdata%2"><b><tt>#pcdata</tt></b></a> is part of <var>Elements</var>.
If no element is open, the <b><em>doctype</em></b> is returned.
<p>
This option is intended to support syntax-sensitive editors. Such an
editor should load the DTD, find an appropriate starting point and then
feed all data between the starting point and the caret into the parser.
Next it can use this option to determine the elements allowed at this
point. Below is a code fragment illustrating this use given a parser
with loaded DTD, an input stream and a start-location.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
...,
seek(In, Start, bof, _),
set_sgml_parser(Parser, charpos(Start)),
set_sgml_parser(Parser, doctype(_)),
Len is Caret - Start,
sgml_parse(Parser,
[ source(In),
content_length(Len),
parse(input) % do not complete document
]),
get_sgml_parser(Parser, allowed(Allowed)),
...
</pre>
</td></tr>
</table>
</dl>
<p>
<dt> <br>
<b><a name="sgml_parse/2">sgml_parse(<var>+Parser, +Options</var>)</a></b><dd>
Parse an XML file. The parser can operate in two input and two output
modes. Output is either a structured term as described with
<a href="#load_structure/2"><b>load_structure/2</b></a> or call-backs on predefined events. The
first is especially suitable for manipulating not-too-large documents,
while the latter provides a primitive means for handling very large
documents.
<p>
Input is a stream. A full description of the option-list is below.
<dl>
<dt> <br>
<b>document(<var>+Term</var>)</b><dd>
A variable that will be unified with a list describing the content of
the document (see <a href="#load_structure/2"><b>load_structure/2</b></a>).
<dt> <br>
<b>source(<var>+Stream</var>)</b><dd>
An input stream that is read. This option <em>must</em> be given.
<dt> <br>
<b>content_length(<var>+Characters</var>)</b><dd>
Stop parsing after <var>Characters</var>. This option is useful to parse
input embedded in <em>envelopes</em>, such as the HTTP protocol.
<dt> <br>
<b>parse(<var>Unit</var>)</b><dd>
Defines how much of the input is parsed. This option is used to parse
only parts of a file.
<dl>
<dt> <br>
<b><a name="const:file"><b><tt>file</tt></b></a></b><dd>
Default. Parse everything upto the end of the input.
<dt> <br>
<b><a name="const:element"><b><tt>element</tt></b></a></b><dd>
The parser stops after reading the first element. Using <code>source(Stream)</code>, this implies reading is stopped as soon as the
element is complete, and another call may be issued on the same stream
to read the next element.
<dt> <br>
<b><a name="const:content"><b><tt>content</tt></b></a></b><dd>
The value <a name="const:content%2"><b><tt>content</tt></b></a> is like <a name="const:element%2"><b><tt>element</tt></b></a> but assumes the element has
already been opened. It may be used in a call-back from
<code>call(<a name="const:on_begin%2"><b><tt>on_begin</tt></b></a>, Pred)</code> to parse individual elements after
validating their headers.
<dt> <br>
<b><a name="const:declaration"><b><tt>declaration</tt></b></a></b><dd>
This may be used to stop the parser after reading the first
declaration. This is especially useful to parse only the <code>doctype</code>
declaration.
<dt> <br>
<b><a name="const:input"><b><tt>input</tt></b></a></b><dd>
This option is intended to be used in conjunction with the
<code>allowed(Elements)</code> option of <a href="#get_sgml_parser/2"><b>get_sgml_parser/2</b></a>.
It disables the parser's default to complete the parse-tree by closing
all open elements.
</dl>
<dt> <br>
<b>max_errors(<var>+MaxErrors</var>)</b><dd>
Set the maximum number of errors. If this number is exceeded further
writes to the stream will yield an I/O error exception. Printing of
errors is suppressed after reaching this value. The default is 100.
<dt> <br>
<b>syntax_errors(<var>+ErrorMode</var>)</b><dd>
Defines how syntax errors are handled.
<dl>
<dt> <br>
<b>print</b><dd>
Default. Pass messages to <b>print_message/2</b>.
<dt> <br>
<b>quiet</b><dd>
Suppress all messages.
</dl>
<dt> <br>
<b>call(<var>+Event, :PredicateName</var>)</b><dd>
Issue call-backs on the specified events. <var>PredicateName</var> is the
name of the predicate to call on this event, possibly prefixed with a
module identifier. The defined events are:
<dl>
<dt> <br>
<b>begin</b><dd>
An open-tag has been parsed. The named handler is called with three
arguments: <code><var>Handler</var>(+Tag, +Attributes, +Parser)</code>.
<dt> <br>
<b>end</b><dd>
A close-tag has been parsed. The named handler is called with two
arguments: <code><var>Handler</var>(+Tag, +Parser)</code>.
<dt> <br>
<b>cdata</b><dd>
CDATA has been parsed. The named handler is called with two arguments: <code><var>Handler</var>(+CDATA, +Parser)</code>, where CDATA is an atom representing
the data.
<dt> <br>
<b>entity</b><dd>
An entity that cannot be represented as CDATA has been parsed. The named
handler is called with two arguments: <code><var>Handler</var>(+NameOrCode,
+Parser)</code>.
<dt> <br>
<b>pi</b><dd>
A processing instruction has been parsed. The named handler is called
with two arguments: <code><var>Handler</var>(+Text, +Parser)</code>, where
<var>Text</var> is the text of the processing instruction.
<dt> <br>
<b>decl</b><dd>
A declaration (<code><!...></code>) has been read. The named handler is
called with two arguments: <code><var>Handler</var>(+Text, +Parser)</code>,
where <var>Text</var> is the text of the declaration with comments removed.
<p>
This option is expecially useful for highlighting declarations and comments in
editor support, where the location of the declaration is extracted using
<a href="#get_sgml_parser/2"><b>get_sgml_parser/2</b></a>.
<dt> <br>
<b>error</b><dd>
An error has been encountered. the named handler is called with three
arguments: <code><var>Handler</var>(+Severity, +Message, +Parser)</code>, where
<var>Severity</var> is one of <a name="const:warning"><b><tt>warning</tt></b></a> or <a name="const:error"><b><tt>error</tt></b></a> and <var>Message</var> is
an atom representing the diagnostic message. The location of the error
can be determined using <a href="#get_sgml_parser/2"><b>get_sgml_parser/2</b></a>
<p>
If this option is present, errors and warnings are not reported using
<b>print_message/3</b>
<dt> <br>
<b>xmlns</b><dd>
When parsing an in <a name="const:xmlns%7"><b><tt>xmlns</tt></b></a> mode, a new namespace declaraction is
pushed on the environment. The named handler is called with three
arguments: <code><var>Handler</var>(+NameSpace, +URL, +Parser)</code>.
See <a href="#sec:xmlns">xmlns</a> for details.
<dt> <br>
<b>urlns</b><dd>
When parsing an in <a name="const:xmlns%8"><b><tt>xmlns</tt></b></a> mode, this predicate can be used to map a
url into either a canonical URL for this namespace or another internal
identifier. See <a href="#sec:xmlns">xmlns</a> for details.
</dl>
</dl>
</dl>
<p>
<h3><a name="sec:sec-3.6.1">Partial Parsing</a></h3>
<p>
In some cases, part of a document needs to be parsed. One option is to
use <a href="#load_structure/2"><b>load_structure/2</b></a> or one of its variations and extract
the desired elements from the returned structure. This is a clean
solution, especially on small and medium-sized documents. It however is
unsuitable for parsing really big documents. Such documents can only be
handled with the call-back output interface realised by the
<code>call(Event, Action)</code> option of <a href="#sgml_parse/2"><b>sgml_parse/2</b></a>.
Event-driven processing is not very natural in Prolog.
<p>
The SGML2PL library allows for a mixed approach. Consider the case where
we want to process all descriptions from RDF elements in a document. The
code below calls <code>process_rdf_description(Element)</code> on each element
that is directly inside an RDF element.
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
:- dynamic
in_rdf/0.
load_rdf(File) :-
retractall(in_rdf),
open(File, read, In),
new_sgml_parser(Parser, []),
set_sgml_parser(Parser, file(File)),
set_sgml_parser(Parser, dialect(xml)),
sgml_parse(Parser,
[ source(In),
call(begin, on_begin),
call(end, on_end)
]),
close(In).
on_end('RDF', _) :-
retractall(in_rdf).
on_begin('RDF', _, _) :-
assert(in_rdf).
on_begin(Tag, Attr, Parser) :-
in_rdf, !,
sgml_parse(Parser,
[ document(Content),
parse(content)
]),
process_rdf_description(element(Tag, Attr, Content)).
</pre>
</td></tr>
</table>
<p>
<h1><a name="sec:indexaccess">Processing Indexed Files</a></h1>
<p>
In some cases applications which to process small portions of large
SGML, XML or RDF files. For example, the <em>OpenDirectory</em> project
by Netscape has produced a 90MB RDF file representing the main index.
The parser described here can process this document as a unit, but
loading takes 85 seconds on a Pentium-II 450 and the resulting term
requires about 70MB global stack. One option is to process the entire
document and output it as a Prolog fact-base of RDF triplets, but in
many cases this is undesirable. Another example is a large SGML file
containing online documentation. The application normally wishes to
provide only small portions at a time to the user. Loading the entire
document into memory is then undesirable.
<p>
Using the <code>parse(element)</code> option, we open a file, seek
(using <b>seek/4</b>) to the position of the element and
read the desired element.
<p>
The index can be built using the call-back interface of
<a href="#sgml_parse/2"><b>sgml_parse/2</b></a>. For example, the following code makes an
index of the <b> structure.rdf</b> file of the OpenDirectory
project:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
:- dynamic
location/3. % Id, File, Offset
rdf_index(File) :-
retractall(location(_,_)),
open(File, read, In, [type(binary)]),
new_sgml_parser(Parser, []),
set_sgml_parser(Parser, file(File)),
set_sgml_parser(Parser, dialect(xml)),
sgml_parse(Parser,
[ source(In),
call(begin, index_on_begin)
]),
close(In).
index_on_begin(_Element, Attributes, Parser) :-
memberchk('r:id'=Id, Attributes),
get_sgml_parser(Parser, charpos(Offset)),
get_sgml_parser(Parser, file(File)),
assert(location(Id, File, Offset)).
</pre>
</td></tr>
</table>
<p>
The following code extracts the RDF element with required id:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
rdf_element(Id, Term) :-
location(Id, File, Offset),
load_structure(File, Term,
[ dialect(xml),
offset(Offset),
parse(element)
]).
</pre>
</td></tr>
</table>
<p>
<h1><a name="sec:sec-5">External entities</a></h1>
<p>
While processing an SGML document the document may refer to external
data. This occurs in three places: external parameter entities, normal
external entities and the <a name="const:DOCTYPE"><b><tt>DOCTYPE</tt></b></a> declaration. The current version
of this tool deals rather primitively with external data. External
entities can only be loaded from a file and the mapping between the
entity names and the file is done using a <b><em>catalog</em></b> file in a
format compatible with that used by James Clark's SP Parser,
based on the SGML Open (now OASIS) specification.
<p>
Catalog files can be specified using two primitives: the predicate
<a href="#sgml_register_catalog_file/2"><b>sgml_register_catalog_file/2</b></a> or the environment variable
<b>SGML_CATALOG_FILES</b> (compatible with the SP package).
<dl>
<dt> <br>
<b><a name="sgml_register_catalog_file/2">sgml_register_catalog_file(<var>+File, +Location</var>)</a></b><dd>
Register the indicated <var>File</var> as a catalog file. <var>Location</var> is
either <a name="const:start"><b><tt>start</tt></b></a> or <a name="const:end%2"><b><tt>end</tt></b></a> and defines whether the catalog is
considered first or last. This predicate has no effect if <var>File</var> is
already part of the catalog.
<p>
If no files are registered using this predicate, the first query on the
catalog examines <b>SGML_CATALOG_FILES</b> and fills the catalog with
all files in this path.
</dl>
<p>
Two types of lines are used by this package.
<p>
<pre>
<a name="const:DOCTYPE%2"><b><tt>DOCTYPE</tt></b></a> <var>doctype</var> <var>file</var>
<a name="const:PUBLIC"><b><tt>PUBLIC</tt></b></a> <code>"</code><var>Id</var><code>"</code> <var>file</var>
</pre>
<p>
The specified <var>file</var> path is taken relative to the location of the
catolog file. For the <a name="const:DOCTYPE%3"><b><tt>DOCTYPE</tt></b></a> declaraction, <b>sgml2pl</b> first makes
an attempt to resolve the <a name="const:SYSTEM"><b><tt>SYSTEM</tt></b></a> or <a name="const:PUBLIC%2"><b><tt>PUBLIC</tt></b></a> identifier. If this
fails it tries to resolve the <var>doctype</var> using the provided catalog
files.
<p>
Strictly speaking, <b>sgml2pl</b> breaks the rules for XML, where
system identifiers must be Universal Resource Indicators, not local file
names. Simple uses of relative URIs will work correctly under UNIX
and Windows.
<p>
In the future we will design a call-back mechanism for locating and
processing external entities, so Prolog-based file-location and Prolog
resources can be used to store external entities.
<p>
<h1><a name="sec:sec-6">Unsupported features</a></h1>
<p>
The current parser is rather limited. While it is able to deal with many
serious documents, it omits several less-used features of SGML and XML.
Known missing SGML features include
<ul>
<li><em>NOTATION on entities</em><br>
Though notation is parsed, notation attributes on external entity
declarations are not handed to the user.
<li><em>NOTATION attributes</em><br>
SGML notations may have attributes, declared using
<code><!ATTLIST #NOTATION name attributes></code>. Those data attributes
are provided when you declare an external CDATA, NDATA, or SDATA entity.
<p>
XML does not include external CDATA, NDATA, or SDATA entities,
nor any of the other uses to which data attributes are put in SGML,
so it doesn't include data attributes for notations either.
<p>
Sgml2pl does not support this feature and is unlikely to;
you should be aware that SGML documents using this feature cannot
be converted faithfully to XML.
<li><em>SHORTTAG</em><br>
The SGML SHORTTAG syntax is only partially implemented. Currently,
<code><tag/content/</code> is a valid abbreviation for
<code><tag>content</tag></code>, which can also be written as
<code><tag>content</></code>.
Empty start tags (<code><></code>), unclosed start tags
(<code><a<b</code>) and unclosed end tags (<code></a<b</code>) are not
supported.
<li><em>SGML declaration</em><br>
The `SGML declaration' is fixed, though most of the parameters are
handled through indirections in the implementation.
<li><em>The DATATAG feature</em><br>
It is regarded as superseeded by SHORTREF, which is supported.
(SP does not support it either.)
<li><em>The RANK feature</em><br>
It is regarded as obsolete.
<li><em>The LINK feature</em><br>
It is regarded as too complicated.
<li><em>The CONCUR feature</em><br>
Concurrent markup allows a document to be tagged according to more than
one DTD at the same time. It is not supported.
</ul>
<p>
In XML mode the parser recognises SGML constructs that are not allowed
in XML. Also various extensions of XML over SGML are not yet realised.
In particular, XInclude is not implemented because the designers of
XInclude can't make up their minds whether to base it on elements or
attributes yet, let alone details.
<p>
<h1><a name="sec:sec-7">Installation</a></h1>
<p>
<h2><a name="sec:sec-7.1">Unix systems</a></h2>
<p>
Installation on Unix system uses the commonly found <code>configure</code>,
<code>make</code> and <code>make install</code> sequence. SWI-Prolog should be
installed before building this package. If SWI-Prolog is not installed
as <b>pl</b>, the environment variable <b>PL</b> must be set to
the name of the SWI-Prolog executable. Installation is now accomplished
using:
<p>
<table width="90%" align="center" border="2" bgcolor="#f0f0f0">
<tr><td>
<pre>
% ./configure
% make
% make install
</pre>
</td></tr>
</table>
<p>
This installs the foreign libraries in <b>$PLBASE/lib/$PLARCH</b> and
the Prolog library files in <b>$PLBASE/library</b>, where
<b>$PLBASE</b> refers to the SWI-Prolog `home-directory'.
<p>
<h1><a name="sec:sec-8">Acknowledgements</a></h1>
<p>
The Prolog representation for parsed documents is based on the
SWI-Prolog interface to SP by Anjo Anjewierden.
<p>
Richard O'Keefe has put a lot of effort testing and providing bug
reports consisting of an illustrative example and explanation of the
standard. He also made many suggestions for improving this document.
<h1><a name="Summary">Summary of Predicates</a></h1>
<table border="2">
<tr><td><a href="#dtd/2"><b>dtd/2</b></a></td><td>Find or build a DTD for a document type</td></tr>
<tr><td><a href="#dtd_property/2"><b>dtd_property/2</b></a></td><td>Query elements, entities and attributes in a DTD</td></tr>
<tr><td><a href="#free_dtd/1"><b>free_dtd/1</b></a></td><td>Free a DTD object</td></tr>
<tr><td><a href="#free_sgml_parser/1"><b>free_sgml_parser/1</b></a></td><td>Destroy a parser</td></tr>
<tr><td><a href="#get_sgml_parser/2"><b>get_sgml_parser/2</b></a></td><td>Get parser options</td></tr>
<tr><td><a href="#load_dtd/2"><b>load_dtd/2</b></a></td><td>Read DTD information from a file</td></tr>
<tr><td><a href="#load_html_file/2"><b>load_html_file/2</b></a></td><td>Parse HTML file into Prolog term</td></tr>
<tr><td><a href="#load_sgml_file/2"><b>load_sgml_file/2</b></a></td><td>Parse SGML file into Prolog term </td></tr>
<tr><td><a href="#load_structure/3"><b>load_structure/3</b></a></td><td>Parse XML/SGML/HTML file into Prolog term</td></tr>
<tr><td><a href="#load_xml_file/2"><b>load_xml_file/2</b></a></td><td>Parse XML file into Prolog term</td></tr>
<tr><td><a href="#new_dtd/2"><b>new_dtd/2</b></a></td><td>Create a DTD object</td></tr>
<tr><td><a href="#new_sgml_parser/2"><b>new_sgml_parser/2</b></a></td><td>Create a new parser</td></tr>
<tr><td><a href="#open_dtd/3"><b>open_dtd/3</b></a></td><td>Open a DTD object as an output stream</td></tr>
<tr><td><a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a></td><td>Set parser options (dialect, source, <em>etc.</em>)</td></tr>
<tr><td><a href="#sgml_parse/2"><b>sgml_parse/2</b></a></td><td>Parse the input</td></tr>
<tr><td><a href="#sgml_register_catalog_file/2"><b>sgml_register_catalog_file/2</b></a></td><td>Register a catalog file</td></tr>
</table>
<h1>Footnotes</h1>
<dl>
<dt><a href="#txt:fn-1" name="fn-1">1</a><dd>
In
addition, newlines at the end of lines containing only markup should be
deleted. This is not yet implemented.
</dl>
<h1>Index</h1>
<ul>
<li><a href="#const:#IMPLIED"><b><tt>#IMPLIED</tt></b></a>
<li><a href="#const:#pcdata"><b><tt>#pcdata</tt></b></a> [ <a href="#const:#pcdata%2">2</a>]
<li><a href="#const:-"><b><tt>-</tt></b></a>
<li><a href="#const:/"><b><tt>/</tt></b></a> [ <a href="#const:/%2">2</a>]
<li><a href="#const:/>"><b><tt>/></tt></b></a>
<li><a href="#const::"><b><tt>:</tt></b></a>
<li><a href="#const:CDATA"><b><tt>CDATA</tt></b></a> [ <a href="#const:CDATA%2">2</a> <a href="#const:CDATA%3">3</a> <a href="#const:CDATA%4">4</a> <a href="#const:CDATA%5">5</a> <a href="#const:CDATA%6">6</a> <a href="#const:CDATA%7">7</a>]
<li><a href="#const:DOCTYPE"><b><tt>DOCTYPE</tt></b></a> [ <a href="#const:DOCTYPE%2">2</a> <a href="#const:DOCTYPE%3">3</a>]
<li><a href="#const:NAMES"><b><tt>NAMES</tt></b></a>
<li><a href="#const:NDATA"><b><tt>NDATA</tt></b></a>
<li><a href="#const:NOTATION"><b><tt>NOTATION</tt></b></a>
<li><a href="#const:NUMBER"><b><tt>NUMBER</tt></b></a> [ <a href="#const:NUMBER%2">2</a>]
<li><a href="#const:NUMBERS"><b><tt>NUMBERS</tt></b></a> [ <a href="#const:NUMBERS%2">2</a>]
<li><a href="#const:PUBLIC"><b><tt>PUBLIC</tt></b></a> [ <a href="#const:PUBLIC%2">2</a>]
<li><a href="#const:SDATA"><b><tt>SDATA</tt></b></a>
<li><a href="#const:SYSTEM"><b><tt>SYSTEM</tt></b></a>
<li><a href="#const:amp"><b><tt>amp</tt></b></a>
<li><a href="#const:any"><b><tt>any</tt></b></a>
<li><a href="#const:apos"><b><tt>apos</tt></b></a>
<li><a href="#const:begin"><b><tt>begin</tt></b></a>
<li><a href="#const:cdata"><b><tt>cdata</tt></b></a> [ <a href="#const:cdata%2">2</a> <a href="#const:cdata%3">3</a>]
<li><a href="#const:conref"><b><tt>conref</tt></b></a>
<li><a href="#const:content"><b><tt>content</tt></b></a> [ <a href="#const:content%2">2</a>]
<li><a href="#const:current"><b><tt>current</tt></b></a>
<li><a href="#const:declaration"><b><tt>declaration</tt></b></a>
<li><a href="#const:default"><b><tt>default</tt></b></a> [ <a href="#const:default%2">2</a> <a href="#const:default%3">3</a>]
<li><a href="#dtd/2"><b>dtd/2</b></a>
<li><a href="#dtd_property/2"><b>dtd_property/2</b></a>
<li><a href="#const:element"><b><tt>element</tt></b></a> [ <a href="#const:element%2">2</a>]
<li><a href="#const:empty"><b><tt>empty</tt></b></a>
<li><a href="#const:end"><b><tt>end</tt></b></a> [ <a href="#const:end%2">2</a>]
<li><a href="#const:entity"><b><tt>entity</tt></b></a>
<li><a href="#const:error"><b><tt>error</tt></b></a>
<li><a href="#const:explicit"><b><tt>explicit</tt></b></a>
<li><a href="#const:false"><b><tt>false</tt></b></a> [ <a href="#const:false%2">2</a>]
<li><a href="#const:file"><b><tt>file</tt></b></a>
<li><a href="#const:fixed"><b><tt>fixed</tt></b></a>
<li><a href="#free_dtd/1"><b>free_dtd/1</b></a>
<li><a href="#free_sgml_parser/1"><b>free_sgml_parser/1</b></a>
<li><a href="#get_sgml_parser/2"><b>get_sgml_parser/2</b></a>
<li><a href="#const:gt"><b><tt>gt</tt></b></a>
<li><a href="#const:id"><b><tt>id</tt></b></a>
<li><a href="#const:idref"><b><tt>idref</tt></b></a>
<li><a href="#const:implied"><b><tt>implied</tt></b></a>
<li><a href="#const:input"><b><tt>input</tt></b></a>
<li><a href="#const:integer"><b><tt>integer</tt></b></a> [ <a href="#const:integer%2">2</a>]
<li><a href="#load_dtd/2"><b>load_dtd/2</b></a>
<li><a href="#load_html_file/2"><b>load_html_file/2</b></a>
<li><a href="#load_sgml_file/2"><b>load_sgml_file/2</b></a>
<li><a href="#load_structure/3"><b>load_structure/3</b></a>
<li><a href="#load_xml_file/2"><b>load_xml_file/2</b></a>
<li><a href="#const:lt"><b><tt>lt</tt></b></a>
<li><a href="#const:name"><b><tt>name</tt></b></a>
<li><a href="#new_dtd/2"><b>new_dtd/2</b></a>
<li><a href="#new_sgml_parser/2"><b>new_sgml_parser/2</b></a>
<li><a href="#const:nmtoken"><b><tt>nmtoken</tt></b></a>
<li><a href="#const:notation"><b><tt>notation</tt></b></a>
<li><a href="#const:number"><b><tt>number</tt></b></a>
<li><a href="#const:nutoken"><b><tt>nutoken</tt></b></a>
<li><a href="#const:omitted"><b><tt>omitted</tt></b></a>
<li><a href="#const:on_begin"><b><tt>on_begin</tt></b></a> [ <a href="#const:on_begin%2">2</a>]
<li><a href="#open_dtd/3"><b>open_dtd/3</b></a>
<li><a href="#const:parse"><b><tt>parse</tt></b></a>
<li><a href="#const:preserve"><b><tt>preserve</tt></b></a> [ <a href="#const:preserve%2">2</a>]
<li><a href="#const:quot"><b><tt>quot</tt></b></a>
<li><a href="#const:rcdata"><b><tt>rcdata</tt></b></a>
<li><a href="#const:rdf"><b><tt>rdf</tt></b></a> [ <a href="#const:rdf%2">2</a>]
<li><a href="#const:rdf-syntax"><b><tt>rdf-syntax</tt></b></a>
<li><a href="#const:remove"><b><tt>remove</tt></b></a>
<li><a href="#const:required"><b><tt>required</tt></b></a>
<li><a href="#set_sgml_parser/2"><b>set_sgml_parser/2</b></a>
<li><a href="#const:sgml"><b><tt>sgml</tt></b></a> [ <a href="#const:sgml%2">2</a> <a href="#const:sgml%3">3</a> <a href="#const:sgml%4">4</a> <a href="#const:sgml%5">5</a>]
<li><a href="#sgml_parse/2"><b>sgml_parse/2</b></a>
<li><a href="#sgml_register_catalog_file/2"><b>sgml_register_catalog_file/2</b></a>
<li><a href="#const:shortref"><b><tt>shortref</tt></b></a>
<li><a href="#const:shorttag"><b><tt>shorttag</tt></b></a>
<li><a href="#const:start"><b><tt>start</tt></b></a>
<li><a href="#const:token"><b><tt>token</tt></b></a> [ <a href="#const:token%2">2</a>]
<li><a href="#const:true"><b><tt>true</tt></b></a>
<li><a href="#const:urlns"><b><tt>urlns</tt></b></a>
<li><a href="#const:warning"><b><tt>warning</tt></b></a>
<li><a href="#const:xml"><b><tt>xml</tt></b></a> [ <a href="#const:xml%2">2</a> <a href="#const:xml%3">3</a> <a href="#const:xml%4">4</a>]
<li><a href="#const:xmlns"><b><tt>xmlns</tt></b></a> [ <a href="#const:xmlns%2">2</a> <a href="#const:xmlns%3">3</a> <a href="#const:xmlns%4">4</a> <a href="#const:xmlns%5">5</a> <a href="#const:xmlns%6">6</a> <a href="#const:xmlns%7">7</a> <a href="#const:xmlns%8">8</a>]
</ul>
</body>
</html>
|