1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
  
     | 
    
      <html>
<head>
<title>ns_xml 2.0 Requirements and Design</title>
</head>
<body>
<h2><code>ns_xml</code> 2.0 Requirements and Design</h2>
by <a href=mailto:jmileham@berklee.edu>John Mileham</a>
<hr>
<h3>Motivation</h3>
<p>
<code>ns_xml</code> as it exists currently is an integral part of OpenACS 4.x, but its implementation
fails to expose several important aspects of <code>libxml</code>'s tree manipulation API.
As a result, it is impossible to create some legal XML documents using
<code>ns_xml</code>.  The shortcoming lies in <code>ns_xml</code>'s inability
to create node content through the use of <code>text</code> nodes.  In 
<code>ns_xml</code>, the only way to manipulate the content of a node is to
either initialize the node's content at creation time (e.g. through a
<code>node new_child</code> call) or by setting it explicitly through
<code>node setcontent</code>.  That overlooks the case of this simple XML
snippet:
</p>
<blockquote>
<pre><sentence>Here is <keyword>my</keyword> sentence</sentence></pre>
</blockquote>
<p>
That is only part of the challenge, however.  <code>ns_xml</code> is also
limited to the linear creation of a document.  This is great if your goal is
to programmatically serialize data in a database, which is the obvious use
of <code>ns_xml</code> within OpenACS.  If you hope to mutate the structure of
an existing document or create a document in a random-access fashion (e.g. through a
user interface), however, you're in trouble.  The following shortcomings exist:
</p>
<ul>
<li>Nodes can not be inserted before other existing nodes at the same level.</li>
<li>Nodes can not be deleted.</li>
<li>Nodes can not be moved.</li>
<li>Nodes can not be copied.</li>
<li>You can't find the parent of a node, so tree traversal is one-way (top-down).</li>
<li>An attribute can not be unset.  It can be set to null, but not made to disappear.</li>
</ul>
<h3>The Requirements</h3>
<ol>
<li>Support for all legacy (<code>ns_xml</code> 1.x) Tcl code.</li>
<li>Support for <code>text</code> node creation.</li>
<li>Support for "insert as previous" node creation for both text and element nodes.</li>
<li>Support for node deletion (recursive).</li>
<li>Support for node relocation, including cross-document relocation.  Nodes moved between documents of different persistence states
will inherit the persistence of the new document (i.e. a node moved from a persistent document to a transient document will vanish at the
end of the Tcl interpreter's session).</li>
<li>Support for in-place node duplication (recursive).  The duplicate will be instantiated as the sibling following the cloned node.</li>
<li>Support for bottom-up tree traversal, from child to parent.</li>
<li>Support for unsetting attributes</li>
</ol>
And from the "While we're at it" department:
<ol>
<li>Support rendering of any node as a stand-alone XML document (recursive)</li>
<li>Get rid of the <code>ns_xml doc free</code> naming convention.  This is Tcl.  We don't want to think about freeing memory.  We're just deleting persistent documents here. :)
</ol>
<h3>The Design</h3>
<p>
Creating this functionality means adding several commands to the
<code>ns_xml</code> API.  The naming scheme of <code>ns_xml</code>'s existing
calls is very intuitive to the Tcl developer.  However, it quickly became
apparent that the naming scheme wasn't general enough to encompass the new
functionality cleanly.  If an attempt were made to add the functionality
described above within the existing naming scheme, all transparency would be
lost.  Tcl developers would find themselves referring to reference material
constantly.  Luckily, the naming scheme proposed below does not conflict with
the existing scheme (except in cases where the commands are identical between
the two schemes).
  Thus, all the deprecated calls can be mapped to their 2.0
equivalents, and a user can run legacy Tcl code unmodified.</p>
<p>In the new scheme, commands are divided into four functional buckets which
largely coincide with their first word. All calls follow the
grammatical convention:
</p>
<blockquote>
<pre>subject verb [object]</pre>
</blockquote>
with the sole exception of <code>ns_xml create xml</code>, which is unique in
that its job is to conjure up an object with no relationship to anything that
exists.
</p>
<p>
Note that the Node Interaction bucket is subdivided into several categories due to its
complexity.</p>
<h3>The 2.0 API</h3>
<ul>
<li>Document Instantiation</li>
<ul>
<li><code>set <i>xml_doc_id</i> [ns_xml string parse xml ?-persist? ?-validate? <i>string</i>]</code></li>
<li><code>set <i>xsl_doc_id</i> [ns_xml string parse xsl ?-persist? ?-validate? <i>string</i>]</code></li>
<li><code>set <i>xml_doc_id</i> [ns_xml create xml ?-persist? ?doc-version?]</code></li>
</ul>
<li>Document Transformation</li>
<ul>
<li><code>set <i>new_xml_doc_id</i> [ns_xml transform ?-persist? <i>xml_doc_id</i> <i>xsl_doc_id</i>]</code></li>
</ul>
<li>Document Interaction</li>
<ul>
<li><code>set <i>node_id</i> [ns_xml doc get root <i>doc_id</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml doc create root <i>doc_id</i> <i>node_name</i> <i>node_content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml doc render <i>doc_id</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml doc delete <i>doc_id</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml doc cleanup <i>doc_id</i>]</code> (tolerant of already deleted documents)</li>
</ul>
<li>Node Interaction</li>
<ul>
<li>Tree Traversal</li>
<ul>
<li><code>set <i>node_id_list</i> [ns_xml node get children <i>node_id</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node get parent <i>node_id</i>]</code></li>
</ul>
<li>Node Instantiation</li>
<ul>
<li><code>set <i>node_id</i> [ns_xml node create child_node <i>node_id</i> <i>name</i> <i>content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node create child_text <i>node_id</i><i>content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node create prev_sibling_node <i>node_id</i> <i>name</i> <i>content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node create prev_sibling_text <i>node_id</i> <i>content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node create next_sibling_node <i>node_id</i> <i>name</i> <i>content</i>]</code></li>
<li><code>set <i>node_id</i> [ns_xml node create next_sibling_text <i>node_id</i><i>content</i>]</code></li>
</ul>
<li>Node Duplication</li>
<ul>
<li><code set <i>new_node_id</i> [ns_xml node clone <i>node_id</i>]</code></li>
</ul>
<li>Node Relocation - <i>note that relinked nodes receive new node_ids.  The old ids become defunct.</i></li>
<ul>
<li><code>set <i>new_node_id</i> [ns_xml node relink_as child <i>node_id</i> <i>new_parent_node_id</i>]</code></li>
<li><code>set <i>new_node_id</i> [ns_xml node relink_as prev_sibling <i>node_id</i> <i>new_sibling_node_id</i>]</code></li>
<li><code>set <i>new_node_id</i> [ns_xml node relink_as next_sibling <i>node_id</i> <i>new_prev_sibling_node_id</i>]</code></li>
</ul>
<li>Node Removal</li>
<ul>
<li><code>ns_xml node delete <i>node_id</i></code></li>
</ul>
<li>Value Retrieval</li>
<ul>
<li><code>set <i>string</i> [ns_xml node get attr <i>node_id</i> <i>prop_name</i>]</code></li>
<li><code>set <i>string</i> [ns_xml node get name <i>node_id</i> <i>prop_name</i>]</code></li>
<li><code>set <i>string</i> [ns_xml node get type <i>node_id</i> <i>prop_name</i>]</code></li>
<li><code>set <i>string</i> [ns_xml node get content <i>node_id</i> <i>prop_name</i>]</code></li>
</ul>
<li>Value Manipulation</li>
<ul>
<li><code>ns_xml node set attr <i>node_id</i> <i>prop_name</i> <i>string</i></code></li>
<li><code>ns_xml node unset attr <i>node_id</i> <i>prop_name</i></code></li>
<li><code>ns_xml node set content <i>node_id</i> <i>string</i></code></li>
</ul>
<li>Node Serialization</li>
<ul>
<li><code>set <i>string</i> [ns_xml node render <i>node_id</i>]</code></li>
</ul>
</ul>
</ul>
<h3>Why'd you do that? (a.k.a. Design Considerations)</h3>
<ul>
<li>Why <code>ns_xml string parse</code> and not just <code>ns_xml parse</code>?
<p>
Because future support may be added for parsing filehandles in addition to tcl in-memory strings.
</p>
</li>
<li>Why <code>ns_xml create xml</code> and not just <code>ns_xml create</code>?
<p>
This leaves the door open for dynamic creation of other documents like XSL or DTDs in the future.  Note that XSL can currently only
be parsed and applied to XML documents as-is.
</p>
</li>
<li>Why implement both <code>ns_xml node create child</code> and <code>ns_xml node create next_sibling</code> when you can
just create a new child of a given node's parent? (or similar questions)
<p>
This API walks a fine line between developer usability and cleanliness.  The number of
calls could have been drastically reduced as well if we offered a lower-level API requiring developers to
first instantiate nodes and then link them as they chose.  But that makes the developer's life
harder (not to mention decreasing performance by putting code that would be in C in Tcl), which is what we're trying to avoid.
We're also trying to get a fair amount of interoperability with the 1.x API, whose lack of a <code>get parent</code>
command mandated the use of the <code>create next_sibling</code> command in cases where the parent was inconvenient or
impossible to derive.  So deprecating the whole <code>create sibling</code> concept seemed a bit harsh,
especially since it's speedier than the Tcl workaround.
</p>
</li>
<li>Why does the XSL transformation command have the opposite arg order to the old one?
<p>Because the old ordering goes against every other command in the API</p>
</li>
<li>Why is there an <code>xml node render</code> command?  You could clone the node, relink the copy to a new document and render that!
<p>Partly because it creates parallelism with the document itself in that a node can be created, futzed with, serialized and deleted.  Partly because it's more efficient than the alternative.  Partly because
it's useful in creating a nice UI for editing complicated XML documents.  Partly because I say so. ;-)</p>
</li>
</ul>
<hr>
<address><a href=mailto:jmileham@berklee.edu>John Mileham</a></address>
 
     |