File: xsl_key_design.xml

package info (click to toggle)
libxalan2-java 2.7.1-5
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 19,468 kB
  • ctags: 26,006
  • sloc: java: 175,784; xml: 28,073; sh: 164; jsp: 43; makefile: 43; sql: 6
file content (256 lines) | stat: -rw-r--r-- 11,938 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
 * Copyright 2001-2004 The Apache Software Foundation.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
-->
<!-- $Id: xsl_key_design.xml 337884 2004-02-17 19:29:35Z minchau $ -->
  <s1 title="&lt;xsl:key&gt; / key() / IdKeyPattern">

  <s2 title="Contents">
  <ul>
    <li><link anchor="functionality">Functionality</link></li>
    <li><link anchor="implementation">Implementation</link></li>    
  </ul>
  </s2>

  <anchor name="functionality"/>
  <s2 title="Functionality">

  <p>The <code>&lt;xsl:key&gt;</code> element is a top-level element that can be
  used to define a named index of nodes from the source XML tree(s). The
  element has three attributes:</p>

  <ul>
    <li>
      <code>name</code> - the name of the index
    </li>
    <li>
      <code>match</code> - a pattern that defines the nodeset we want
      indexed
    </li>
    <li>
      <code>use</code> - an expression that defines the value to be used
      as the index key value.
    </li>
  </ul>

  <p>A named index can be accessed using either the <code>key()</code> function
  or a KeyPattern. Both these methods address the index using its defined name
  (the "name" attribute above) and a parameter defining one or more lookup
  values for the index. The function or pattern returns a node set containing
  all nodes in the index whose key value match the parameter's value(s):</p>

  <source>
    &lt;xsl:key name="book-author" match="book" use="author"/&gt;
        :
        :
    &lt;xsl:for-each select="key('book-author', 'Mikhail Bulgakov')"&gt;
      &lt;xsl:value-of select="author"/&gt;
      &lt;xsl:text&gt;: &lt;/xsl:text&gt;
      &lt;xsl:value-of select="author"/&gt;
      &lt;xsl:text&gt;&amp;#xa;&lt;/xsl:text&gt;
    &lt;/xsl:for-each&gt;</source>

  <p>The KeyPattern can be used within an index definition to create unions
  and intersections of node sets:</p><source>
    &lt;xsl:key name="cultcies" match="farmer | fisherman" use="name"/&gt;</source>

  <p>This could have been done using regular <code>&lt;xsl:for-each&gt;</code>
  and <code>&lt;xsl:select&gt;</code> elements. However, if your stylesheet
  accesses the same selection of nodes over and over again, the transformation
  will be much more efficient using pre-indexed keys as shown above.</p>

  </s2>

  <anchor name="implementation"/>
  <s2 title="Implementation">

    <anchor name="indexing"/>
    <s3 title="Indexing">

    <p>The <code>AbstractTranslet</code> class has a global hashtable that holds
    an index for each named key in the stylesheet (hashing on the "name"
    attribute of <code>&lt;xsl:key&gt;</code>). <code>AbstractTranslet</code>
    has a couple of public methods for inserting and retrieving data from this
    hashtable:</p><source>
    public void buildKeyIndex(String keyName, int nodeID, String value);
    public KeyIndex KeyIndex getKeyIndex(String keyName);</source>

    <p>The <code>Key</code> class compiles code that traverses the input DOM and
    extracts nodes that match some given parameters (the <code>"match"</code>
    attribute of the <code>&lt;xsl:key&gt;</code> element). A new element is
    inserted into the named key's index. The nodes' DOM index and the value
    translated from the <code>"use"</code> attribute of the
    <code>&lt;xsl:key&gt;</code> element are stored in the new entry in the
    index.</p>

    <p>Something similar is done for indexing IDs. This index is generated from
    the <code>ID</code> and <code>IDREF</code> fields in the input document's
    DTD. This means that the code for generating this index cannot be generated
    at compile-time, which again means that the code has to be generic enough
    to handle all DTDs. The class that handles this is the
    <code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class
    implements the <code>org.xml.sax.XMLReader</code> and
    <code>org.xml.sax.DTDHandler</code> interfaces. A client application using
    the native API must instanciate a <code>DTDMonitor</code> and pass it to the
    translet code - if, and only if, it wants IDs indexed (one can improve
    performance by omitting this step). This is descrived in the
    <link idref="xsltc_native_api">XSLTC Native API reference</link>. The
    <code>DTDMonitor</code> class will use the same indexing as the code
    generated by the <code>Key</code> class. The index for ID's is called
    "##id". We assume that no stylesheets will contain a key with this name.</p>

    <p>The index itself is implemented in the
    <code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an
    hashtable with all the values from the matching nodes (the part of the node
    used to generate this value is the one specified in the <code>"use"</code>
    attribute). For every matching value there is a bit-array (implemented in
    the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of
    all node indexes for which this value gives a match:</p>
    <p><img src="key_relations.gif" alt="key_relations.gif"/></p>
    <p><ref>Figure 1: Indexing tables</ref></p>

    <p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code>
    interface, so that it can be returned directly by the implementation of the
    <code>key()</code> function. This is how the index generated by
    <code>&lt;xsl:key&gt;</code> and the node-set returned by the
    <code>key()</code> and KeyPattern are tied together. You can see how this is
    done in the <code>translate()</code> method of the <code>KeyCall</code>
    class.</p>

    <p>The <code>key()</code> function can be called in two ways:</p><source>
    key('key-name','value')
    key('key-name','node-set')</source>

    <p>The first parameter is always the name of the key. We use this value to
    lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p>
    <source>
    il.append(classGen.aloadThis());
    _name.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(getKeyIndex));</source>

    <p>This compiles into a call to
    <code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a
    <code>KeyIndex</code> object on the stack. What we then need to do it to
    initialise the <code>KeyIndex</code> to give us nodes with the requested
    value. This is done by leaving the <code>KeyIndex</code> object on the stack
    and pushing the <code>"value"</code> parameter to <code>key()</code>, before
    calling <code>lookup()</code> on the index:</p><source>
    il.append(DUP);  // duplicate the KeyIndex obejct before return
    _value.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(lookup));</source>

    <p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>.
    This will initialise the <code>KeyIndex</code> object to return nodes that
    match the given value, so the <code>KeyIndex</code> object can be left on
    the stack when we return. This because the <code>KeyIndex</code> object
    implements the <code>NodeIterator</code> interface.</p>

    <p>This matter is a bit more complex when the second parameter of
    <code>key()</code> is a node-set. In this case we need to traverse the nodes
    in the set and do a lookup for each node in the set. What I do is this:</p>

    <ul>
      <li>
        construct a <code>KeyIndex</code> object that will hold the
        return node-set
      </li>
      <li>
        find the named <code>KeyIndex</code> object from the hashtable in
        AbstractTranslet
      </li>
      <li>
        get an iterator for the node-set and do the folowing loop:</li>
        <ul>
          <li>get string value for current node</li>
          <li>do lookup in KeyIndex object for the named index</li>
          <li>merge the resulting node-set into the return node-set</li>
        </ul>
      <li>
        leave the return node-set on stack when done
      </li>
    </ul>

    </s3>

    <anchor name="improvements"/>
    <s3 title="Possible indexing improvements">

    <p>The indexing implementation can be very, very memeory exhaustive as there
    is one <code>BitArray</code> allocated for each value for every index. This
    is particulary bad for the index used for IDs ('##id'), where a single value
    should only map to one, single node. This means that a whole
    <code>BitArray</code> is used just to contain one node. The
    <code>KeyIndex</code> class should be updated to hold the first node for a
    value in an <code>Integer</code> object, and then replace that with a
    <code>BitArray</code> or a <code>Vector</code> only is a second node is
    added to the value. Here is an outline for <code>KeyIndex</code>:</p>
    <source>

    public void add(Object value, int node) {
        Object container;
	if ((container = (BitArray)_index.get(value)) == null) {
            _index.put(value, new Integer(node));
	}
	else {
            // Check if there is _one_ node for this value
            if (container instanceof Integer) {
                int first = ((Integer)container
                _nodes = new BitArray(_arraySize);
                _nodes.setMask(node &amp; 0xff000000);
                _nodes.setBit(first &amp; 0x00ffffff);
                _nodes.setBit(node &amp; 0x00ffffff);
                _index.put(value, _nodes);
            }
            // Otherwise add node to axisting bit array
            else {
                _nodex = (BitArray)container;
                _nodes.setBit(node &amp; 0x00ffffff);
            }
        }
    }</source>

    <p>Other methods inside the <code>KeyIndex</code> should be updated to
    reflect this.</p>

    </s3>

    <anchor name="patterns"/>
    <s3 title="Id and Key patterns">

    <p>As already mentioned, the challenge with the <code>id()</code> and
    <code>key()</code> patterns is that they have no specific node type. 
    Patterns are normally grouped based on their pattern's "kernel" node type.
    This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class.
    The <code>Mode</code> class has a method, <code>flatternAlaternative</code>,
    that does this grouping, and all templates with a common kernel node type
    will be inserted into a "test sequence". A test sequence is a set templates
    with the same kernel node type. The <code>TestSeq</code> class generates
    code that will figure out which pattern, amongst several patterns with the
    same kernel node type, that matches a certain node. This is used by the
    <code>Mode</code> class when generating the <code>applyTemplates</code>
    method in the translet. A test sequence is also generated for all templates
    whose pattern does not have a kernel node type. This is the case for all
    Id and KeyPatterns. This test sequence, if necessary, is put before the
    big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This
    test has to be done for every single node that is traversed, causing the
    transformation to slow down siginificantly. This is why we do <em>not</em>
    recommend using this type of patterns with XSLTC.</p>

    </s3>

  </s2>

</s1>