File: Pxp_tree_parser.html

package info (click to toggle)
pxp 1.2.9-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 7,796 kB
  • sloc: ml: 28,666; xml: 2,597; makefile: 821; sh: 691
file content (229 lines) | stat: -rw-r--r-- 15,751 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="previous" href="Pxp_dtd.html">
<link rel="next" href="Pxp_core_types.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of class methods" rel=Appendix href="index_methods.html">
<link title="Index of classes" rel=Appendix href="index_classes.html">
<link title="Index of class types" rel=Appendix href="index_class_types.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Index of module types" rel=Appendix href="index_module_types.html">
<link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html">
<link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html">
<link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html">
<link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html">
<link title="Pxp_event" rel="Chapter" href="Pxp_event.html">
<link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html">
<link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html">
<link title="Intro_trees" rel="Chapter" href="Intro_trees.html">
<link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html">
<link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html">
<link title="Intro_events" rel="Chapter" href="Intro_events.html">
<link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html">
<link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html">
<link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html">
<link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html">
<link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="ID indices" rel="Section" href="#1_IDindices">
<link title="Parsing functions" rel="Section" href="#1_Parsingfunctions">
<link title="Helpers" rel="Section" href="#1_Helpers">
<title>PXP Reference : Pxp_tree_parser</title>
</head>
<body>
<div class="navbar"><a class="pre" href="Pxp_dtd.html" title="Pxp_dtd">Previous</a>
&nbsp;<a class="up" href="index.html" title="Index">Up</a>
&nbsp;<a class="post" href="Pxp_core_types.html" title="Pxp_core_types">Next</a>
</div>
<h1>Module <a href="type_Pxp_tree_parser.html">Pxp_tree_parser</a></h1>

<pre><span class="keyword">module</span> Pxp_tree_parser: <code class="code"><span class="keyword">sig</span></code> <a href="Pxp_tree_parser.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info module top">
Calling the parser in tree mode<br>
</div>
<hr width="100%">
<br>
The following functions return the parsed XML text as tree, i.e.
    as <code class="code"><span class="constructor">Pxp_document</span>.node</code> or <code class="code"><span class="constructor">Pxp_document</span>.document</code>.<br>
<br>
<h1 id="1_IDindices">ID indices</h1><br>
<br>
These indices are used to check the uniqueness of elements declared
    as <code class="code"><span class="constructor">ID</span></code>. Of course, the indices can also be used to quickly look up
    such elements.<br>

<pre><span id="EXCEPTIONID_not_unique"><span class="keyword">exception</span> ID_not_unique</span></pre>
<div class="info ">
Used inside <a href="Pxp_tree_parser.index-c.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> to indicate that the same ID is
      attached to several nodes<br>
</div>

<pre><span id="TYPEindex"><span class="keyword">class type</span> <code class="type">[< clone : 'a; node : 'a Pxp_document.node;<br>       set_node : 'a Pxp_document.node -> unit; .. ><br>     as 'a]</code> <a href="Pxp_tree_parser.index-c.html">index</a></span> = <code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.index-c.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info">
The type of indexes over the ID attributes of the elements.
</div>

<pre><span name="TYPEhash_index"><span class="keyword">class</span> <code class="type">[< clone : 'a; node : 'a Pxp_document.node;<br>       set_node : 'a Pxp_document.node -> unit; .. ><br>     as 'a]</code> <a href="Pxp_tree_parser.hash_index-c.html">hash_index</a></span> : <code class="type"></code><code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.hash_index-c.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info">
This is a simple implementation of <a href="Pxp_tree_parser.index-c.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> using
    a hash table.
</div>
<br>
<h1 id="1_Parsingfunctions">Parsing functions</h1><br>
<br>
There are two types of XML texts one can parse:<ul>
<li>Closed XML documents</li>
<li>External XML entities</li>
</ul>

    Usually, the functions for closed XML documents are the right ones.
    The exact difference between both types is subtle, as many texts
    are parseable in both ways. The idea, however, is that an external
    XML entity is text from a different file that is included by reference
    into a closed document. Some XML features are only meaningful for
    the whole document, and are not available when only an external entity
    is parsed. This includes:<ul>
<li>The DOCTYPE and the DTD declarations</li>
<li>The standalone declaration</li>
</ul>

    It is a syntax error to use these features in an external XML entity.
<p>

    An external entity is a file referenced by another XML text.
    For example, this document includes "file.xml" as external entity:
<p>

    <pre class="codepre"><code class="code">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;?xml&nbsp;version=<span class="string">"1.0"</span><span class="keywordsign">?&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">DOCTYPE</span>&nbsp;root&nbsp;[<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ENTITY</span>&nbsp;extref&nbsp;<span class="constructor">SYSTEM</span>&nbsp;<span class="string">"file.xml"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;root&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">&amp;</span>extref;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;/root&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code></pre>
<p>

    (In contrast to this, an internal entity would give the definition
    text immediately, e.g. <code class="code">&lt;!<span class="constructor">ENTITY</span> intref <span class="string">"This is the entity text"</span>&gt;</code>.)
    Of course, it does not make sense that the external entity has
    another DOCTYPE definition, and hence it is forbidden to use this
    feature in "file.xml".
<p>

    There is no function to exactly parse a file like "file.xml"
    as if it was included into a bigger document. The closest behavior show
    <a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a> and 
    <a href="Pxp_tree_parser.html#VALparse_wfcontent_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_wfcontent_entity</code></a>. They implement the
    additional constraint that the file has to have a single top-most element.
<p>

    The following functions also distinguish between validating and
    well-formedness mode. In the latter mode, many formal document
    constraints are not enforced. For instance, elements and
    attributes need not to be declared.
<p>

    There are, unfortunately, a number of myths about well-formed XML
    documents. One says that the declarations are completely
    ignored. This is of course not true. For example, the above shown
    example includes the external XML entity "file.xml" by reference.
    The <code class="code">&lt;!<span class="constructor">ENTITY</span>&gt;</code> declaration is respected no matter in which mode
    the parser is run. Also, it is not true that the presence of
    <code class="code"><span class="constructor">DOCTYPE</span></code> indicates validated mode and the absence well-formedness
    mode. The presence of <code class="code"><span class="constructor">DOCTYPE</span></code> is perfectly compatible with
    well-formedness mode - only that the declarations are interpreted
    in a different way.
<p>

    If it is tried to parse a document in validating mode, but the
    <code class="code"><span class="constructor">DOCTYPE</span></code> is missing, this parser will fail when the root element
    is parsed, because its declaration is missing. This conforms to the
    XML standard, and also follows the logic that the program calling
    the parser is written in the expectation that the parsed file is
    validated. If this validation is missing, the program can run into
    failed assertions (or worse).<br>

<pre><span id="VALparse_document_entity"><span class="keyword">val</span> parse_document_entity</span> : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a>) -><br>       ?id_index:(< clone : 'a; node : 'a Pxp_document.node;<br>                    set_node : 'a Pxp_document.node -> unit; .. ><br>                  as 'a)<br>                 <a href="Pxp_tree_parser.index-c.html">index</a> -><br>       Pxp_types.config -><br>       Pxp_types.source -> 'a Pxp_document.spec -> 'a Pxp_document.document</code></pre><div class="info ">
Parse a closed document,
 and validate the contents of the document against the DTD contained
 and/or referenced in the document.
<p>

 If the optional argument <code class="code">transform_dtd</code> is passed, the following 
 modification applies: After the DTD (both the internal and external
 subsets) has been read, the function <code class="code">transform_dtd</code> is called,
 and the resulting DTD is actually used to validate the document.
 This makes it possible<ul>
<li>to check which DTD is used (e.g. by comparing <a href="Pxp_dtd.dtd-c.html#METHODid"><code class="code"><span class="constructor">Pxp_dtd</span>.dtd.id</code></a>
   with a list of allowed ID's)</li>
<li>to apply modifications to the DTD before content parsing is started</li>
<li>to even switch to a built-in DTD, and to drop all user-defined
   declarations.</li>
</ul>

 If the optional argument <code class="code">transform_dtd</code> is missing, the parser
 behaves in the same way as if the identity were passed as <code class="code">transform_dtd</code>,
 i.e. the DTD is left unmodified.
<p>

 If the optional argument <code class="code">id_index</code> is present, the parser adds
 any ID attribute to the passed index. An index is required to detect
 violations of the uniqueness of IDs.<br>
</div>

<pre><span id="VALparse_wfdocument_entity"><span class="keyword">val</span> parse_wfdocument_entity</span> : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a>) -><br>       Pxp_types.config -><br>       Pxp_types.source -><br>       (< clone : 'a; node : 'a Pxp_document.node;<br>          set_node : 'a Pxp_document.node -> unit; .. ><br>        as 'a)<br>       Pxp_document.spec -> 'a Pxp_document.document</code></pre><div class="info ">
Parse a closed document, but do not
 validate it. Only checks on well-formedness are performed.
<p>

 The option <code class="code">transform_dtd</code> works as for <code class="code">parse_document_entity</code>,
 but the resulting DTD is not used for validation. It is just
 included into the returned document (e.g. useful to get entity 
 declarations).<br>
</div>

<pre><span id="VALparse_content_entity"><span class="keyword">val</span> parse_content_entity</span> : <code class="type">?id_index:(< clone : 'a; node : 'a Pxp_document.node;<br>                    set_node : 'a Pxp_document.node -> unit; .. ><br>                  as 'a)<br>                 <a href="Pxp_tree_parser.index-c.html">index</a> -><br>       Pxp_types.config -><br>       Pxp_types.source -><br>       <a href="Pxp_dtd.dtd-c.html">Pxp_dtd.dtd</a> -> 'a Pxp_document.spec -> 'a Pxp_document.node</code></pre><div class="info ">
Parse a file representing a well-formed fragment of a document. The
 fragment must be a single element (i.e. something like <code class="code">&lt;a&gt;...&lt;/a&gt;</code>;
 not a sequence like <code class="code">&lt;a&gt;...&lt;/a&gt;&lt;b&gt;...&lt;/b&gt;</code>). The element is validated
 against the passed DTD, but it is not checked whether the element is
 the root element specified in the DTD. <b>This function is almost
 always the wrong one to call. Rather consider <a href="Pxp_tree_parser.html#VALparse_document_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_document_entity</code></a>.</b>
<p>

 Despite its name, this function <b>cannot</b> parse the <code class="code">content</code>
 production defined in the XML specification! This is a misnomer
 I'm sorry about. The <code class="code">content</code> production would allow to parse
 a list of elements and other node kinds. Also, this function
 corresponds to the event entry point <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_element_content</span></code> and
 not <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_content</span></code>.
<p>

 If the optional argument <code class="code">id_index</code> is present, the parser adds
 any ID attribute to the passed index. An index is required to detect
 violations of the uniqueness of IDs.<br>
</div>

<pre><span id="VALparse_wfcontent_entity"><span class="keyword">val</span> parse_wfcontent_entity</span> : <code class="type">Pxp_types.config -><br>       Pxp_types.source -><br>       (< clone : 'a; node : 'a Pxp_document.node;<br>          set_node : 'a Pxp_document.node -> unit; .. ><br>        as 'a)<br>       Pxp_document.spec -> 'a Pxp_document.node</code></pre><div class="info ">
Parse a file representing a well-formed fragment of a document.
 The fragment is not validated, only checked for well-formedness.
 See also the notes for <a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a>.<br>
</div>
<br>
<h1 id="1_Helpers">Helpers</h1><br>

<pre><span id="VALdefault_extension"><span class="keyword">val</span> default_extension</span> : <code class="type">'a Pxp_document.node Pxp_document.extension as 'a</code></pre><div class="info ">
A "null" extension; an extension that does not extend the functionality<br>
</div>

<pre><span id="VALdefault_spec"><span class="keyword">val</span> default_spec</span> : <code class="type">('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec</code></pre><div class="info ">
Specifies that you do not want to use extensions.<br>
</div>

<pre><span id="VALdefault_namespace_spec"><span class="keyword">val</span> default_namespace_spec</span> : <code class="type">('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec</code></pre><div class="info ">
Specifies that you want to use namespace, but not extensions<br>
</div>
</body></html>