File: xsl_key_design.xml

package info (click to toggle)
libxalan2-java 2.7.1-5
links: PTS, VCS
area: main
in suites: squeeze
size: 19,468 kB
ctags: 26,006
sloc: java: 175,784; xml: 28,073; sh: 164; jsp: 43; makefile: 43; sql: 6
file content (256 lines) | stat: -rw-r--r-- 11,938 bytes
parent folder | download | duplicates (5)
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
 * Copyright 2001-2004 The Apache Software Foundation.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
-->
<!-- $Id: xsl_key_design.xml 337884 2004-02-17 19:29:35Z minchau $ -->
  <s1 title="&lt;xsl:key&gt; / key() / IdKeyPattern">

  <s2 title="Contents">
  <ul>
    <li><link anchor="functionality">Functionality</link></li>
    <li><link anchor="implementation">Implementation</link></li>    
  </ul>
  </s2>

  <anchor name="functionality"/>
  <s2 title="Functionality">

  <p>The <code>&lt;xsl:key&gt;</code> element is a top-level element that can be
  used to define a named index of nodes from the source XML tree(s). The
  element has three attributes:</p>

  <ul>
    <li>
      <code>name</code> - the name of the index
    </li>
    <li>
      <code>match</code> - a pattern that defines the nodeset we want
      indexed
    </li>
    <li>
      <code>use</code> - an expression that defines the value to be used
      as the index key value.
    </li>
  </ul>

  <p>A named index can be accessed using either the <code>key()</code> function
  or a KeyPattern. Both these methods address the index using its defined name
  (the "name" attribute above) and a parameter defining one or more lookup
  values for the index. The function or pattern returns a node set containing
  all nodes in the index whose key value match the parameter's value(s):</p>

  <source>
    &lt;xsl:key name="book-author" match="book" use="author"/&gt;
        :
        :
    &lt;xsl:for-each select="key('book-author', 'Mikhail Bulgakov')"&gt;
      &lt;xsl:value-of select="author"/&gt;
      &lt;xsl:text&gt;: &lt;/xsl:text&gt;
      &lt;xsl:value-of select="author"/&gt;
      &lt;xsl:text&gt;&amp;#xa;&lt;/xsl:text&gt;
    &lt;/xsl:for-each&gt;</source>

  <p>The KeyPattern can be used within an index definition to create unions
  and intersections of node sets:</p><source>
    &lt;xsl:key name="cultcies" match="farmer | fisherman" use="name"/&gt;</source>

  <p>This could have been done using regular <code>&lt;xsl:for-each&gt;</code>
  and <code>&lt;xsl:select&gt;</code> elements. However, if your stylesheet
  accesses the same selection of nodes over and over again, the transformation
  will be much more efficient using pre-indexed keys as shown above.</p>

  </s2>

  <anchor name="implementation"/>
  <s2 title="Implementation">

    <anchor name="indexing"/>
    <s3 title="Indexing">

    <p>The <code>AbstractTranslet</code> class has a global hashtable that holds
    an index for each named key in the stylesheet (hashing on the "name"
    attribute of <code>&lt;xsl:key&gt;</code>). <code>AbstractTranslet</code>
    has a couple of public methods for inserting and retrieving data from this
    hashtable:</p><source>
    public void buildKeyIndex(String keyName, int nodeID, String value);
    public KeyIndex KeyIndex getKeyIndex(String keyName);</source>

    <p>The <code>Key</code> class compiles code that traverses the input DOM and
    extracts nodes that match some given parameters (the <code>"match"</code>
    attribute of the <code>&lt;xsl:key&gt;</code> element). A new element is
    inserted into the named key's index. The nodes' DOM index and the value
    translated from the <code>"use"</code> attribute of the
    <code>&lt;xsl:key&gt;</code> element are stored in the new entry in the
    index.</p>

    <p>Something similar is done for indexing IDs. This index is generated from
    the <code>ID</code> and <code>IDREF</code> fields in the input document's
    DTD. This means that the code for generating this index cannot be generated
    at compile-time, which again means that the code has to be generic enough
    to handle all DTDs. The class that handles this is the
    <code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class
    implements the <code>org.xml.sax.XMLReader</code> and
    <code>org.xml.sax.DTDHandler</code> interfaces. A client application using
    the native API must instanciate a <code>DTDMonitor</code> and pass it to the
    translet code - if, and only if, it wants IDs indexed (one can improve
    performance by omitting this step). This is descrived in the
    <link idref="xsltc_native_api">XSLTC Native API reference</link>. The
    <code>DTDMonitor</code> class will use the same indexing as the code
    generated by the <code>Key</code> class. The index for ID's is called
    "##id". We assume that no stylesheets will contain a key with this name.</p>

    <p>The index itself is implemented in the
    <code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an
    hashtable with all the values from the matching nodes (the part of the node
    used to generate this value is the one specified in the <code>"use"</code>
    attribute). For every matching value there is a bit-array (implemented in
    the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of
    all node indexes for which this value gives a match:</p>
    <p><img src="key_relations.gif" alt="key_relations.gif"/></p>
    <p><ref>Figure 1: Indexing tables</ref></p>

    <p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code>
    interface, so that it can be returned directly by the implementation of the
    <code>key()</code> function. This is how the index generated by
    <code>&lt;xsl:key&gt;</code> and the node-set returned by the
    <code>key()</code> and KeyPattern are tied together. You can see how this is
    done in the <code>translate()</code> method of the <code>KeyCall</code>
    class.</p>

    <p>The <code>key()</code> function can be called in two ways:</p><source>
    key('key-name','value')
    key('key-name','node-set')</source>

    <p>The first parameter is always the name of the key. We use this value to
    lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p>
    <source>
    il.append(classGen.aloadThis());
    _name.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(getKeyIndex));</source>

    <p>This compiles into a call to
    <code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a
    <code>KeyIndex</code> object on the stack. What we then need to do it to
    initialise the <code>KeyIndex</code> to give us nodes with the requested
    value. This is done by leaving the <code>KeyIndex</code> object on the stack
    and pushing the <code>"value"</code> parameter to <code>key()</code>, before
    calling <code>lookup()</code> on the index:</p><source>
    il.append(DUP);  // duplicate the KeyIndex obejct before return
    _value.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(lookup));</source>

    <p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>.
    This will initialise the <code>KeyIndex</code> object to return nodes that
    match the given value, so the <code>KeyIndex</code> object can be left on
    the stack when we return. This because the <code>KeyIndex</code> object
    implements the <code>NodeIterator</code> interface.</p>

    <p>This matter is a bit more complex when the second parameter of
    <code>key()</code> is a node-set. In this case we need to traverse the nodes
    in the set and do a lookup for each node in the set. What I do is this:</p>

    <ul>
      <li>
        construct a <code>KeyIndex</code> object that will hold the
        return node-set
      </li>
      <li>
        find the named <code>KeyIndex</code> object from the hashtable in
        AbstractTranslet
      </li>
      <li>
        get an iterator for the node-set and do the folowing loop:</li>
        <ul>
          <li>get string value for current node</li>
          <li>do lookup in KeyIndex object for the named index</li>
          <li>merge the resulting node-set into the return node-set</li>
        </ul>
      <li>
        leave the return node-set on stack when done
      </li>
    </ul>

    </s3>

    <anchor name="improvements"/>
    <s3 title="Possible indexing improvements">

    <p>The indexing implementation can be very, very memeory exhaustive as there
    is one <code>BitArray</code> allocated for each value for every index. This
    is particulary bad for the index used for IDs ('##id'), where a single value
    should only map to one, single node. This means that a whole
    <code>BitArray</code> is used just to contain one node. The
    <code>KeyIndex</code> class should be updated to hold the first node for a
    value in an <code>Integer</code> object, and then replace that with a
    <code>BitArray</code> or a <code>Vector</code> only is a second node is
    added to the value. Here is an outline for <code>KeyIndex</code>:</p>
    <source>

    public void add(Object value, int node) {
        Object container;
	if ((container = (BitArray)_index.get(value)) == null) {
            _index.put(value, new Integer(node));
	}
	else {
            // Check if there is _one_ node for this value
            if (container instanceof Integer) {
                int first = ((Integer)container
                _nodes = new BitArray(_arraySize);
                _nodes.setMask(node &amp; 0xff000000);
                _nodes.setBit(first &amp; 0x00ffffff);
                _nodes.setBit(node &amp; 0x00ffffff);
                _index.put(value, _nodes);
            }
            // Otherwise add node to axisting bit array
            else {
                _nodex = (BitArray)container;
                _nodes.setBit(node &amp; 0x00ffffff);
            }
        }
    }</source>

    <p>Other methods inside the <code>KeyIndex</code> should be updated to
    reflect this.</p>

    </s3>

    <anchor name="patterns"/>
    <s3 title="Id and Key patterns">

    <p>As already mentioned, the challenge with the <code>id()</code> and
    <code>key()</code> patterns is that they have no specific node type. 
    Patterns are normally grouped based on their pattern's "kernel" node type.
    This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class.
    The <code>Mode</code> class has a method, <code>flatternAlaternative</code>,
    that does this grouping, and all templates with a common kernel node type
    will be inserted into a "test sequence". A test sequence is a set templates
    with the same kernel node type. The <code>TestSeq</code> class generates
    code that will figure out which pattern, amongst several patterns with the
    same kernel node type, that matches a certain node. This is used by the
    <code>Mode</code> class when generating the <code>applyTemplates</code>
    method in the translet. A test sequence is also generated for all templates
    whose pattern does not have a kernel node type. This is the case for all
    Id and KeyPatterns. This test sequence, if necessary, is put before the
    big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This
    test has to be done for every single node that is traversed, causing the
    transformation to slow down siginificantly. This is why we do <em>not</em>
    recommend using this type of patterns with XSLTC.</p>

    </s3>

  </s2>

</s1>