1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="previous" href="Pxp_dtd.html">
<link rel="next" href="Pxp_core_types.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of class methods" rel=Appendix href="index_methods.html">
<link title="Index of classes" rel=Appendix href="index_classes.html">
<link title="Index of class types" rel=Appendix href="index_class_types.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Index of module types" rel=Appendix href="index_module_types.html">
<link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html">
<link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html">
<link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html">
<link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html">
<link title="Pxp_event" rel="Chapter" href="Pxp_event.html">
<link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html">
<link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html">
<link title="Intro_trees" rel="Chapter" href="Intro_trees.html">
<link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html">
<link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html">
<link title="Intro_events" rel="Chapter" href="Intro_events.html">
<link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html">
<link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html">
<link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html">
<link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html">
<link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="ID indices" rel="Section" href="#1_IDindices">
<link title="Parsing functions" rel="Section" href="#1_Parsingfunctions">
<link title="Helpers" rel="Section" href="#1_Helpers">
<title>PXP Reference : Pxp_tree_parser</title>
</head>
<body>
<div class="navbar"><a class="pre" href="Pxp_dtd.html" title="Pxp_dtd">Previous</a>
<a class="up" href="index.html" title="Index">Up</a>
<a class="post" href="Pxp_core_types.html" title="Pxp_core_types">Next</a>
</div>
<h1>Module <a href="type_Pxp_tree_parser.html">Pxp_tree_parser</a></h1>
<pre><span class="keyword">module</span> Pxp_tree_parser: <code class="code"><span class="keyword">sig</span></code> <a href="Pxp_tree_parser.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info module top">
Calling the parser in tree mode<br>
</div>
<hr width="100%">
<br>
The following functions return the parsed XML text as tree, i.e.
as <code class="code"><span class="constructor">Pxp_document</span>.node</code> or <code class="code"><span class="constructor">Pxp_document</span>.document</code>.<br>
<br>
<h1 id="1_IDindices">ID indices</h1><br>
<br>
These indices are used to check the uniqueness of elements declared
as <code class="code"><span class="constructor">ID</span></code>. Of course, the indices can also be used to quickly look up
such elements.<br>
<pre><span id="EXCEPTIONID_not_unique"><span class="keyword">exception</span> ID_not_unique</span></pre>
<div class="info ">
Used inside <a href="Pxp_tree_parser.index-c.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> to indicate that the same ID is
attached to several nodes<br>
</div>
<pre><span id="TYPEindex"><span class="keyword">class type</span> <code class="type">[< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a]</code> <a href="Pxp_tree_parser.index-c.html">index</a></span> = <code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.index-c.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info">
The type of indexes over the ID attributes of the elements.
</div>
<pre><span name="TYPEhash_index"><span class="keyword">class</span> <code class="type">[< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a]</code> <a href="Pxp_tree_parser.hash_index-c.html">hash_index</a></span> : <code class="type"></code><code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.hash_index-c.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info">
This is a simple implementation of <a href="Pxp_tree_parser.index-c.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> using
a hash table.
</div>
<br>
<h1 id="1_Parsingfunctions">Parsing functions</h1><br>
<br>
There are two types of XML texts one can parse:<ul>
<li>Closed XML documents</li>
<li>External XML entities</li>
</ul>
Usually, the functions for closed XML documents are the right ones.
The exact difference between both types is subtle, as many texts
are parseable in both ways. The idea, however, is that an external
XML entity is text from a different file that is included by reference
into a closed document. Some XML features are only meaningful for
the whole document, and are not available when only an external entity
is parsed. This includes:<ul>
<li>The DOCTYPE and the DTD declarations</li>
<li>The standalone declaration</li>
</ul>
It is a syntax error to use these features in an external XML entity.
<p>
An external entity is a file referenced by another XML text.
For example, this document includes "file.xml" as external entity:
<p>
<pre class="codepre"><code class="code"> <?xml version=<span class="string">"1.0"</span><span class="keywordsign">?></span><br>
<!<span class="constructor">DOCTYPE</span> root [<br>
<!<span class="constructor">ENTITY</span> extref <span class="constructor">SYSTEM</span> <span class="string">"file.xml"</span>><br>
]><br>
<root><br>
<span class="keywordsign">&</span>extref;<br>
</root><br>
</code></pre>
<p>
(In contrast to this, an internal entity would give the definition
text immediately, e.g. <code class="code"><!<span class="constructor">ENTITY</span> intref <span class="string">"This is the entity text"</span>></code>.)
Of course, it does not make sense that the external entity has
another DOCTYPE definition, and hence it is forbidden to use this
feature in "file.xml".
<p>
There is no function to exactly parse a file like "file.xml"
as if it was included into a bigger document. The closest behavior show
<a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a> and
<a href="Pxp_tree_parser.html#VALparse_wfcontent_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_wfcontent_entity</code></a>. They implement the
additional constraint that the file has to have a single top-most element.
<p>
The following functions also distinguish between validating and
well-formedness mode. In the latter mode, many formal document
constraints are not enforced. For instance, elements and
attributes need not to be declared.
<p>
There are, unfortunately, a number of myths about well-formed XML
documents. One says that the declarations are completely
ignored. This is of course not true. For example, the above shown
example includes the external XML entity "file.xml" by reference.
The <code class="code"><!<span class="constructor">ENTITY</span>></code> declaration is respected no matter in which mode
the parser is run. Also, it is not true that the presence of
<code class="code"><span class="constructor">DOCTYPE</span></code> indicates validated mode and the absence well-formedness
mode. The presence of <code class="code"><span class="constructor">DOCTYPE</span></code> is perfectly compatible with
well-formedness mode - only that the declarations are interpreted
in a different way.
<p>
If it is tried to parse a document in validating mode, but the
<code class="code"><span class="constructor">DOCTYPE</span></code> is missing, this parser will fail when the root element
is parsed, because its declaration is missing. This conforms to the
XML standard, and also follows the logic that the program calling
the parser is written in the expectation that the parsed file is
validated. If this validation is missing, the program can run into
failed assertions (or worse).<br>
<pre><span id="VALparse_document_entity"><span class="keyword">val</span> parse_document_entity</span> : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a>) -><br> ?id_index:(< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a)<br> <a href="Pxp_tree_parser.index-c.html">index</a> -><br> Pxp_types.config -><br> Pxp_types.source -> 'a Pxp_document.spec -> 'a Pxp_document.document</code></pre><div class="info ">
Parse a closed document,
and validate the contents of the document against the DTD contained
and/or referenced in the document.
<p>
If the optional argument <code class="code">transform_dtd</code> is passed, the following
modification applies: After the DTD (both the internal and external
subsets) has been read, the function <code class="code">transform_dtd</code> is called,
and the resulting DTD is actually used to validate the document.
This makes it possible<ul>
<li>to check which DTD is used (e.g. by comparing <a href="Pxp_dtd.dtd-c.html#METHODid"><code class="code"><span class="constructor">Pxp_dtd</span>.dtd.id</code></a>
with a list of allowed ID's)</li>
<li>to apply modifications to the DTD before content parsing is started</li>
<li>to even switch to a built-in DTD, and to drop all user-defined
declarations.</li>
</ul>
If the optional argument <code class="code">transform_dtd</code> is missing, the parser
behaves in the same way as if the identity were passed as <code class="code">transform_dtd</code>,
i.e. the DTD is left unmodified.
<p>
If the optional argument <code class="code">id_index</code> is present, the parser adds
any ID attribute to the passed index. An index is required to detect
violations of the uniqueness of IDs.<br>
</div>
<pre><span id="VALparse_wfdocument_entity"><span class="keyword">val</span> parse_wfdocument_entity</span> : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a>) -><br> Pxp_types.config -><br> Pxp_types.source -><br> (< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a)<br> Pxp_document.spec -> 'a Pxp_document.document</code></pre><div class="info ">
Parse a closed document, but do not
validate it. Only checks on well-formedness are performed.
<p>
The option <code class="code">transform_dtd</code> works as for <code class="code">parse_document_entity</code>,
but the resulting DTD is not used for validation. It is just
included into the returned document (e.g. useful to get entity
declarations).<br>
</div>
<pre><span id="VALparse_content_entity"><span class="keyword">val</span> parse_content_entity</span> : <code class="type">?id_index:(< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a)<br> <a href="Pxp_tree_parser.index-c.html">index</a> -><br> Pxp_types.config -><br> Pxp_types.source -><br> <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> 'a Pxp_document.spec -> 'a Pxp_document.node</code></pre><div class="info ">
Parse a file representing a well-formed fragment of a document. The
fragment must be a single element (i.e. something like <code class="code"><a>...</a></code>;
not a sequence like <code class="code"><a>...</a><b>...</b></code>). The element is validated
against the passed DTD, but it is not checked whether the element is
the root element specified in the DTD. <b>This function is almost
always the wrong one to call. Rather consider <a href="Pxp_tree_parser.html#VALparse_document_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_document_entity</code></a>.</b>
<p>
Despite its name, this function <b>cannot</b> parse the <code class="code">content</code>
production defined in the XML specification! This is a misnomer
I'm sorry about. The <code class="code">content</code> production would allow to parse
a list of elements and other node kinds. Also, this function
corresponds to the event entry point <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_element_content</span></code> and
not <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_content</span></code>.
<p>
If the optional argument <code class="code">id_index</code> is present, the parser adds
any ID attribute to the passed index. An index is required to detect
violations of the uniqueness of IDs.<br>
</div>
<pre><span id="VALparse_wfcontent_entity"><span class="keyword">val</span> parse_wfcontent_entity</span> : <code class="type">Pxp_types.config -><br> Pxp_types.source -><br> (< clone : 'a; node : 'a Pxp_document.node;<br> set_node : 'a Pxp_document.node -> unit; .. ><br> as 'a)<br> Pxp_document.spec -> 'a Pxp_document.node</code></pre><div class="info ">
Parse a file representing a well-formed fragment of a document.
The fragment is not validated, only checked for well-formedness.
See also the notes for <a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a>.<br>
</div>
<br>
<h1 id="1_Helpers">Helpers</h1><br>
<pre><span id="VALdefault_extension"><span class="keyword">val</span> default_extension</span> : <code class="type">'a Pxp_document.node Pxp_document.extension as 'a</code></pre><div class="info ">
A "null" extension; an extension that does not extend the functionality<br>
</div>
<pre><span id="VALdefault_spec"><span class="keyword">val</span> default_spec</span> : <code class="type">('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec</code></pre><div class="info ">
Specifies that you do not want to use extensions.<br>
</div>
<pre><span id="VALdefault_namespace_spec"><span class="keyword">val</span> default_namespace_spec</span> : <code class="type">('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec</code></pre><div class="info ">
Specifies that you want to use namespace, but not extensions<br>
</div>
</body></html>
|