File: xsltc_compiler.xml

package info (click to toggle)
libxalan2-java 2.7.1-5
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 19,468 kB
  • ctags: 26,006
  • sloc: java: 175,784; xml: 28,073; sh: 164; jsp: 43; makefile: 43; sql: 6
file content (342 lines) | stat: -rw-r--r-- 16,326 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
 * Copyright 2001-2004 The Apache Software Foundation.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
-->
<!-- $Id: xsltc_compiler.xml 337884 2004-02-17 19:29:35Z minchau $ -->

<s1 title="XSLTC Compiler Design">

  <ul>  
    <li><link anchor="overview">Compiler Overview</link></li>
    <li><link anchor="ast">Building the Abstract Syntax Tree</link></li>
    <li><link anchor="typecheck">Type-check and Cast Expressions</link></li>
    <li><link anchor="compile">JVM byte-code generation</link></li>
  </ul>

  <!--=================== OVERVIEW SECTION ===========================-->

  <anchor name="overview"/>
  <s2 title="Compiler overview">

    <p>The main component of the XSLTC compiler is the class</p>   
    <ul>
      <li><code>org.apache.xalan.xsltc.compiler.XSLTC</code></li>
    </ul>

    <p>This class uses three parsers to consume the input stylesheet(s):</p>

    <ul>
      <li><code>javax.xml.parsers.SAXParser</code></li>
    </ul>

    <p>is used to parse the stylesheet document and pass its contents to
    the compiler as basic SAX2 events.</p>

    <ul>
      <li><code>com.sun.xslt.compiler.XPathParser</code></li>
    </ul>

    <p> is a parser used to parse XPath expressions and patterns. This parser
    is generated using JavaCUP and JavaLEX from Princeton University.</p>

    <ul>
      <li><code>com.sun.xslt.compiler.Parser</code></li>
    </ul>

    <p>is a wrapper for the other two parsers. This parser is responsible for
    using the other two parsers to build the compiler's abstract syntax tree
    (which is described in more detail in the next section of this document).
    </p>

  </s2>

  <!--============== ABSTRACT SYNTAX TREE SECTION ======================-->
  <anchor name="ast"/>
  <s2 title="Building an Abstract Syntax Tree">

    <p>An abstract syntax tree (AST) is a data-structure commonly used by
    compilers to separate the parse-phase from the later phases of the
    compilation. The AST has one node for each parsed token from the stylesheet
    and can easily be parsed at the stages of type-checking and bytecode
    generation.</p>

    <ul>
      <li>
        <link anchor="mapping">Mapping stylesheet elements to AST nodes</link>
      </li>
      <li>
        <link anchor="domxsl">Building the AST from AST nodes</link>
      </li>
      <li>
        <link anchor="mapping">Mapping XPath expressions and patterns to additional AST nodes</link>
      </li>
    </ul>

    <p>The SAX parser passes the contents of the stylesheet to XSLTC's main
    parser. The SAX events represent a decomposition of the XML document that
    contains the stylesheet. The main parser needs to create one AST node from
    each node that it receives from the SAX parser. It also needs to use the
    XPath parser to decompose attributes that contain XPath expressions and
    patterns. Remember that XSLT is in effect two languages: XML and XPath,
    and one parser is needed for each of these languages. The SAX parser breaks
    down the stylesheet document, the XPath parser breaks down XPath expressions
    and patterns, and the main parser maps the decomposed elements into nodes
    in the abstract syntax tree.</p>

    <anchor name="mapping"/>
    <s3 title="Mapping stylesheets elements to AST nodes">

    <p>Every element that is defined in the XSLT 1.0 spec is represented by a
    a class in the <code>org.apache.xalan.xsltc.compiler</code> package. The
    main parser class contains a <code>Hashtable</code> that that maps XSL
    elements into Java classes that make up the nodes in the AST. These Java
    classes all reside in the <code>org.apache.xalan.xsltc.compiler</code>
    package and extend either the <code>TopLevelElement</code> or the
    <code>Instruction</code> classes. (Both these classes extend the
    <code>SyntaxTreeNode</code> class.)</p>

    <p>The mapping from XSL element names to Java classes/AST nodes is set up
    in the <code>initClasses()</code> method of the main parser:</p><source>
    private void initStdClasses() {
	try {
	    initStdClass("template",    "Template");
	    initStdClass("param",       "Param");
	    initStdClass("with-param",  "WithParam");
	    initStdClass("variable",    "Variable");
	    initStdClass("output",      "Output");
	    :
	    :
	    :
	}
    }

    private void initClass(String elementName, String className)
	throws ClassNotFoundException {
	_classes.put(elementName,
		     Class.forName(COMPILER_PACKAGE + '.' + className));
    }</source>

    </s3>

    <anchor name="domxsl"/>
    <s3 title="Building the AST from AST nodes">
    <p>The parser builds an AST from the various syntax tree nodes. Each node
    contains a reference to its parent node, a vector containing references
    to all child nodes and a structure containing all attribute nodes:</p><source>
    protected SyntaxTreeNode _parent; // Parent node
    private   Vector _contents;       // Child nodes
    protected Attributes _attributes; // Attributes of this element</source>


    <p>These variables should be accessed using these methods:</p><source>
    protected final SyntaxTreeNode getParent();
    protected final Vector getContents();
    protected String getAttribute(String qname);
    protected Attributes getAttributes();</source>

    <p>At this time the AST only contains nodes that represent the XSL elements
    from the stylesheet. A SAX parse is generic and can only handle XML files
    and will not break up and identify XPath patterns/expressions (these are
    stored as attributes to the various nodes in the tree). Each XSL instruction
    gets its own node in the AST, and the XPath patterns/expressions are stored
    as attributes of these nodes. A stylesheet looking like this:</p><source>
    &lt;xsl:stylesheet .......&gt;
      &lt;xsl:template match="chapter"&gt;
        &lt;xsl:text&gt;Chapter&lt;/xsl:text&gt;
        &lt;xsl:value-of select="."&gt;
      &lt;/xsl:template&gt;
    &lt;/xsl&gt;stylesheet&gt;</source>

    <p>will be stored in the AST as indicated in the following picture:</p>
    <p><img src="ast_stage1.gif" alt="ast_stage1.gif"/></p>
    <p><ref>Figure 1: The AST in its first stage</ref></p>

    <p>All objects that make up the nodes in the initial AST have a
    <code>parseContents()</code> method. This method is responsible for:</p>

    <ul>
      <li>parsing the values of those attributes that contain XPath expressions
      or patterns, breaking each expression/pattern into AST nodes and inserting
      them into the tree.</li>
      <li>reading/checking all other required attributes</li>
      <li>propagate the <code>parseContents()</code> call down the tree</li>
    </ul>
    </s3>

    <s3 title="Mapping XPath expressions and patterns to additional AST nodes">

    <p>The nodes that represent the XPath expressions and patterns extend
    either the <code>Expression</code> or <code>Pattern</code> class
    respectively. These nodes are not appended to the <code>_contents</code>
    vectory of each node, but rather stored as individual references in each
    AST element node. One example is the <code>ForEach</code> class that
    represents the <code>&lt;xsl:for-each&gt;</code> element. This class has
    a variable that contains a reference to the AST sub-tree that represents
    its <code>select</code> attribute:</p><source>
    private Expression _select;</source>
   
    <p>There is no standard way of storing these XPath expressions and each
    AST node that contains one or more XPath expression/pattern must handle
    that itself. This handling basically involves passing the attribute's
    value to the XPath parser and receiving back an AST sub-tree.</p>

    <p>With all XPath expressions/patterns expanded, the AST will look somewhat
    like this:</p>

    <p><img src="ast_stage2.gif" alt="ast_stage2.gif"/></p>
    <p><ref>Fiugre 2: The AST in its second stage</ref></p>

    </s3>
  </s2>

  <!--================= TYPE CONVERSION SECTION ========================-->

  <anchor name="typecheck"/>
  <s2 title="Type-check and Cast Expressions">

    <p>In many cases we will need to typecast the top node in the expression
    sub-tree to suit the expected result-type of the expression, or to typecast
    child nodes to suit the allowed types for the various operators in the
    expression. This is done by calling 'typeCheck()' on the root-node in the
    XSL tree. Each SyntaxTreeNode node is responsible for inserting type-cast
    nodes between itself and its child nodes or XPath nodes. These type-cast
    nodes will convert the output-types of the child/XPath nodes to the expected
    input-type of the parent node. Let look at our AST again and the node that
    represents the <code>&lt;xsl:value-of&gt;</code> element. This element
    expects to receive a string from its <code>select</code> XPath expression,
    but the <code>Step</code> expression will return either a node-set or a
    single node. An extra node is inserted into the AST to perform the
    necessary type conversions:</p>

    <p><img src="ast_stage3.gif" alt="ast_stage3.gif"/></p>
    <p><ref>Figure 3: XPath expression type cast</ref></p>

    <p>The <code>typeCheck()</code> method of each SyntaxTreeNode object will
    call <code>typeCheck()</code> on each of its XPath expressions. This method
    will return the native type returned by the expression. The AST node will
    insert an additional type-conversion node if the return-type does not match
    the expected data-type. Each possible return type is represented by a class
    in the <code>org.apache.xalan.xsltc.compiler.util</code> package. These
    classes all contain methods that will generate bytecodes needed to perform
    the actual type conversions (at runtime). The type-cast nodes in the AST
    mainly consist of calls to these methods.</p>
  </s2>

  <!--=============== BYTE-CODE GENERATION SECTION ======================-->

  <anchor name="compile"/>
  <s2 title="JVM byte-code generation">

    <ul>
      <li><link anchor="stylesheet">Compiling the stylesheet</link></li>
      <li><link anchor="toplevel">Compiling top-level elements</link></li>
      <li><link anchor="templates">Compiling template code</link></li>
      <li><link anchor="instructions">Compiling instructions, functions expressions and patterns</link></li>
    </ul>

    <p>Evey node in the AST extends the <code>SyntaxTreeNode</code> base class
    and implements the <code>translate()</code> method. This method is
    responsible for outputting the actual bytecodes that make up the
    functionality required for each element, function, expression or pattern.
    </p>

    <anchor name="stylesheet"/>
    <s3 title="Compiling the stylesheet">
    <p>Some nodes in the AST require more complex code than others. The best
    example is the <code>&lt;xsl:stylesheet&gt;</code> element. The code that
    represents this element has to tie together the code that is generated by
    all the other elements and generate the actual class definition for the main
    translet class. The <code>Stylesheet</code> class generates the translet's
    constructor and methods that handle all top-level elements.</p>
    </s3>

    <anchor name="toplevel"/>
    <s3 title="Compiling top-level elements">
    <p>The bytecode that handles top-level elements must be generated before any
    other code. The '<code>translate()</code>' method in these classes are
    mainly called from these methods in the Stylesheet class:</p><source>
    private String compileBuildKeys(ClassGenerator);
    private String compileTopLevel(ClassGenerator, Enumeration);
    private void compileConstructor(ClassGenerator, Output);</source>

    <p>These methods handle most top-level elements, such as global variables
    and parameters, <code>&lt;xsl:output&gt;</code> and
    <code>&lt;xsl:decimal-format&gt;</code> instructions.</p>
    </s3>

    <anchor name="templates"/>
    <s3 title="Compiling template code">
    <p>All XPath patterns in <code>&lt;xsl:apply-template&gt;</code>
    instructions are converted into numeric values (known as the pattern's
    kernel 'type'). All templates with identical pattern kernel types are
    grouped together and inserted into a table known as a test sequence.
    (The table of test sequences is found in the Mode class in the compiler
    package. There will be one such table for each mode that is used in the
    stylesheet). This table is used to build a big <code>switch()</code>
    statement in the translet's <code>applyTemplates()</code> method. This
    method is initially called with the root node of the input document.</p>

    <p>The <code>applyTemplates()</code> method determines the node's type and
    passes this type to the <code>switch()</code> statement to look up the
    matching template. The test sequence code (the <code>TestSeq</code> class)
    is responsible for inserting bytecodes to find  one  matching template
    in cases where more than one template matches the current node type.</p>

    <p>There may be several templates that share the same pattern kernel type.
    Here are a few examples of templates with patterns that all have the same
    kernel type:</p><source>
    &lt;xsl:template match=&quot;A/C&quot;&gt;
    &lt;xsl:template match=&quot;A/B/C&quot;&gt;
    &lt;xsl:template match=&quot;A | C&quot;&gt;</source>

    <p>All these templates will be grouped under the type for
    <code>&lt;C&gt;</code> and will all get the same kernel type (the type for
    <code>"C"</code>). The last template will be grouped both under
    <code>"C"</code> and <code>"A"</code>, since it matches either element.
    If the type identifier for <code>"C"</code> in this case is 8, all these
    templates will be put under <code>case 8:</code> in
    <code>applyTemplates()</code>'s big <code>switch()</code> statement. The
    <code>TestSeq</code> class will insert some code under the
    <code>case 8:</code> statement (similar to if's and then's) in order to
    determine which of the three templates to trigger.</p>
    </s3>

    <anchor name="instructions"/>
    <s3 title="Compiling instructions, functions, expressions and patterns">

    <p>The template code is generated by calling <code>translate()</code> on
    each <code>Template</code> object in the abstract syntax tree. This call
    will be propagated down the abstract syntax tree and every element will
    output the bytecodes necessary to complete its task.</p>

    <p>The Java Virtual Machine is stack-based, which goes hand-in-hand with
    the tree structure of a stylesheet and the AST. A node in the AST will
    call <code>translate()</code> on its child nodes and any XPath nodes before
    it generates its own bytecodes. In that way the correct sequence of JVM
    instructions is generated.  Each one of the child nodes is responsible of
    creating code that leaves the node's output value (if any) on the stack.
    The typical procedure for the parent node is to create JVM code that
    consumes these values off the stack and then leave its own output on the
    stack (for its parent).</p>

    <p>The tree-structure of the stylesheet is in this way closely tied with
    the stack-based JVM. The design does not offer any obvious way of extending
    the compiler to output code for other non-stack-based VMs or processors.</p>
    </s3>

  </s2>

</s1>