1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
|
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
* Copyright 2001-2004 The Apache Software Foundation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->
<!-- $Id: xsl_key_design.xml 337884 2004-02-17 19:29:35Z minchau $ -->
<s1 title="<xsl:key> / key() / IdKeyPattern">
<s2 title="Contents">
<ul>
<li><link anchor="functionality">Functionality</link></li>
<li><link anchor="implementation">Implementation</link></li>
</ul>
</s2>
<anchor name="functionality"/>
<s2 title="Functionality">
<p>The <code><xsl:key></code> element is a top-level element that can be
used to define a named index of nodes from the source XML tree(s). The
element has three attributes:</p>
<ul>
<li>
<code>name</code> - the name of the index
</li>
<li>
<code>match</code> - a pattern that defines the nodeset we want
indexed
</li>
<li>
<code>use</code> - an expression that defines the value to be used
as the index key value.
</li>
</ul>
<p>A named index can be accessed using either the <code>key()</code> function
or a KeyPattern. Both these methods address the index using its defined name
(the "name" attribute above) and a parameter defining one or more lookup
values for the index. The function or pattern returns a node set containing
all nodes in the index whose key value match the parameter's value(s):</p>
<source>
<xsl:key name="book-author" match="book" use="author"/>
:
:
<xsl:for-each select="key('book-author', 'Mikhail Bulgakov')">
<xsl:value-of select="author"/>
<xsl:text>: </xsl:text>
<xsl:value-of select="author"/>
<xsl:text>&#xa;</xsl:text>
</xsl:for-each></source>
<p>The KeyPattern can be used within an index definition to create unions
and intersections of node sets:</p><source>
<xsl:key name="cultcies" match="farmer | fisherman" use="name"/></source>
<p>This could have been done using regular <code><xsl:for-each></code>
and <code><xsl:select></code> elements. However, if your stylesheet
accesses the same selection of nodes over and over again, the transformation
will be much more efficient using pre-indexed keys as shown above.</p>
</s2>
<anchor name="implementation"/>
<s2 title="Implementation">
<anchor name="indexing"/>
<s3 title="Indexing">
<p>The <code>AbstractTranslet</code> class has a global hashtable that holds
an index for each named key in the stylesheet (hashing on the "name"
attribute of <code><xsl:key></code>). <code>AbstractTranslet</code>
has a couple of public methods for inserting and retrieving data from this
hashtable:</p><source>
public void buildKeyIndex(String keyName, int nodeID, String value);
public KeyIndex KeyIndex getKeyIndex(String keyName);</source>
<p>The <code>Key</code> class compiles code that traverses the input DOM and
extracts nodes that match some given parameters (the <code>"match"</code>
attribute of the <code><xsl:key></code> element). A new element is
inserted into the named key's index. The nodes' DOM index and the value
translated from the <code>"use"</code> attribute of the
<code><xsl:key></code> element are stored in the new entry in the
index.</p>
<p>Something similar is done for indexing IDs. This index is generated from
the <code>ID</code> and <code>IDREF</code> fields in the input document's
DTD. This means that the code for generating this index cannot be generated
at compile-time, which again means that the code has to be generic enough
to handle all DTDs. The class that handles this is the
<code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class
implements the <code>org.xml.sax.XMLReader</code> and
<code>org.xml.sax.DTDHandler</code> interfaces. A client application using
the native API must instanciate a <code>DTDMonitor</code> and pass it to the
translet code - if, and only if, it wants IDs indexed (one can improve
performance by omitting this step). This is descrived in the
<link idref="xsltc_native_api">XSLTC Native API reference</link>. The
<code>DTDMonitor</code> class will use the same indexing as the code
generated by the <code>Key</code> class. The index for ID's is called
"##id". We assume that no stylesheets will contain a key with this name.</p>
<p>The index itself is implemented in the
<code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an
hashtable with all the values from the matching nodes (the part of the node
used to generate this value is the one specified in the <code>"use"</code>
attribute). For every matching value there is a bit-array (implemented in
the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of
all node indexes for which this value gives a match:</p>
<p><img src="key_relations.gif" alt="key_relations.gif"/></p>
<p><ref>Figure 1: Indexing tables</ref></p>
<p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code>
interface, so that it can be returned directly by the implementation of the
<code>key()</code> function. This is how the index generated by
<code><xsl:key></code> and the node-set returned by the
<code>key()</code> and KeyPattern are tied together. You can see how this is
done in the <code>translate()</code> method of the <code>KeyCall</code>
class.</p>
<p>The <code>key()</code> function can be called in two ways:</p><source>
key('key-name','value')
key('key-name','node-set')</source>
<p>The first parameter is always the name of the key. We use this value to
lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p>
<source>
il.append(classGen.aloadThis());
_name.translate(classGen, methodGen);
il.append(new INVOKEVIRTUAL(getKeyIndex));</source>
<p>This compiles into a call to
<code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a
<code>KeyIndex</code> object on the stack. What we then need to do it to
initialise the <code>KeyIndex</code> to give us nodes with the requested
value. This is done by leaving the <code>KeyIndex</code> object on the stack
and pushing the <code>"value"</code> parameter to <code>key()</code>, before
calling <code>lookup()</code> on the index:</p><source>
il.append(DUP); // duplicate the KeyIndex obejct before return
_value.translate(classGen, methodGen);
il.append(new INVOKEVIRTUAL(lookup));</source>
<p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>.
This will initialise the <code>KeyIndex</code> object to return nodes that
match the given value, so the <code>KeyIndex</code> object can be left on
the stack when we return. This because the <code>KeyIndex</code> object
implements the <code>NodeIterator</code> interface.</p>
<p>This matter is a bit more complex when the second parameter of
<code>key()</code> is a node-set. In this case we need to traverse the nodes
in the set and do a lookup for each node in the set. What I do is this:</p>
<ul>
<li>
construct a <code>KeyIndex</code> object that will hold the
return node-set
</li>
<li>
find the named <code>KeyIndex</code> object from the hashtable in
AbstractTranslet
</li>
<li>
get an iterator for the node-set and do the folowing loop:</li>
<ul>
<li>get string value for current node</li>
<li>do lookup in KeyIndex object for the named index</li>
<li>merge the resulting node-set into the return node-set</li>
</ul>
<li>
leave the return node-set on stack when done
</li>
</ul>
</s3>
<anchor name="improvements"/>
<s3 title="Possible indexing improvements">
<p>The indexing implementation can be very, very memeory exhaustive as there
is one <code>BitArray</code> allocated for each value for every index. This
is particulary bad for the index used for IDs ('##id'), where a single value
should only map to one, single node. This means that a whole
<code>BitArray</code> is used just to contain one node. The
<code>KeyIndex</code> class should be updated to hold the first node for a
value in an <code>Integer</code> object, and then replace that with a
<code>BitArray</code> or a <code>Vector</code> only is a second node is
added to the value. Here is an outline for <code>KeyIndex</code>:</p>
<source>
public void add(Object value, int node) {
Object container;
if ((container = (BitArray)_index.get(value)) == null) {
_index.put(value, new Integer(node));
}
else {
// Check if there is _one_ node for this value
if (container instanceof Integer) {
int first = ((Integer)container
_nodes = new BitArray(_arraySize);
_nodes.setMask(node & 0xff000000);
_nodes.setBit(first & 0x00ffffff);
_nodes.setBit(node & 0x00ffffff);
_index.put(value, _nodes);
}
// Otherwise add node to axisting bit array
else {
_nodex = (BitArray)container;
_nodes.setBit(node & 0x00ffffff);
}
}
}</source>
<p>Other methods inside the <code>KeyIndex</code> should be updated to
reflect this.</p>
</s3>
<anchor name="patterns"/>
<s3 title="Id and Key patterns">
<p>As already mentioned, the challenge with the <code>id()</code> and
<code>key()</code> patterns is that they have no specific node type.
Patterns are normally grouped based on their pattern's "kernel" node type.
This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class.
The <code>Mode</code> class has a method, <code>flatternAlaternative</code>,
that does this grouping, and all templates with a common kernel node type
will be inserted into a "test sequence". A test sequence is a set templates
with the same kernel node type. The <code>TestSeq</code> class generates
code that will figure out which pattern, amongst several patterns with the
same kernel node type, that matches a certain node. This is used by the
<code>Mode</code> class when generating the <code>applyTemplates</code>
method in the translet. A test sequence is also generated for all templates
whose pattern does not have a kernel node type. This is the case for all
Id and KeyPatterns. This test sequence, if necessary, is put before the
big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This
test has to be done for every single node that is traversed, causing the
transformation to slow down siginificantly. This is why we do <em>not</em>
recommend using this type of patterns with XSLTC.</p>
</s3>
</s2>
</s1>
|