1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362
|
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>REXML Tutorial</title>
<style media="all" type="text/css">@import "http://www.germane-software.com/~ser/Software/style.css";</style>
</head>
<body>
<div id="banner">
<h1>REXML Tutorial</h1>
</div>
<h4 align="center">$Revision: 1.1.2.1 $</h4>
<div id="centercontent">
<h2>Overview</h2>
<h3>Abstract</h3>
<p>This is a tutorial for using <a href="http://www.germane-software.com/~ser/software/rexml">REXML</a>, a pure-Ruby XML processor.</p>
<h3>Introduction</h3>
<p>REXML was inspired by the Electric XML library for Java, which features an easy-to-use API, small size, and speed. Hopefully, REXML, designed with the same philosophy, has these same features. I've tried to keep the API as intuitive as possible, and have followed the Ruby methodology for method naming and code flow, rather than mirroring the Java API.
</p>
<p>REXML supports both tree and stream document parsing. Stream parsing is extremely fast (about 1.5 thousand times as fast). However, with stream parsing, you don't get access to features such as XPath.</p>
<h3>Tree Parsing XML and accessing Elements</h3>
<p>We'll start with parsing an XML document</p>
<div class="example">
<pre>require "rexml/document"
file = File.new( "mydoc.xml" )
doc = REXML::Document.new file</pre>
</div>
<p>Line 3 creates a new document and parses the supplied file. You can also do the following</p>
<div class="example">
<pre>require "rexml/document"
include REXML # so that we don't have to prefix everything with REXML::...
string = <<EOF
<mydoc>
<someelement attribute="nanoo">Text, text, text</someelement>
</mydoc>
EOF
doc = Document.new string</pre>
</div>
<p>So parsing a string is just as easy as parsing a file. For future examples, I'm going to omit both the <TT>require</TT> and <TT>include</TT> lines.</p>
<p>Once you have a document, you can access elements in that document in a number of ways:</p>
<ul>
<li>The <TT>Element</TT> class itself has <TT>each_element_with_attribute</TT>, a common way of accessing elements.</li>
<li>The attribute <TT>Element.elements</TT> is an <TT>Elements</TT> class instance which has the <TT>each</TT> and <TT>[]</TT> methods for accessing elements. Both methods can be supplied with an XPath for filtering, which makes them very powerful.</li>
<li>Since <TT>Element</TT> is a subclass of Parent, you can also access the element's children directly through the Array-like methods <TT>Element[], Element.each, Element.find, Element.delete</TT>. This is the fastest way of accessing children, but note that, being a true array, XPath searches are not supported, and that all of the element children are contained in this array, not just the Element children.</li>
</ul>
<p>Here are a few examples using these methods. First is the source document used in the examples:</p>
<div class="example">
<div class="exampletitle">The source document</div>
<pre><inventory title="OmniCorp Store #45x10^3">
<section name="health">
<item upc="123456789" stock="12">
<name>Invisibility Cream</name>
<price>14.50</price>
<description>Makes you invisible</description>
</item>
<item upc="445322344" stock="18">
<name>Levitation Salve</name>
<price>23.99</price>
<description>Levitate yourself for up to 3 hours per application</description>
</item>
</section>
<section name="food">
<item upc="485672034" stock="653">
<name>Blork and Freen Instameal</name>
<price>4.95</price>
<description>A tasty meal in a tablet; just add water</description>
</item>
<item upc="132957764" stock="44">
<name>Grob winglets</name>
<price>3.56</price>
<description>Tender winglets of Grob. Just add water</description>
</item>
</section>
</inventory></pre>
</div>
<div class="example">
<div class="exampletitle">Accessing Elements</div>
<pre>doc = Document.new File.new("mydoc.xml")
doc.elements.each("inventory/section") { |element| puts element.attributes["name"] }
# -> health
# -> food
doc.elements.each("*/section/item") { |element| puts element.attributes["upc"] }
# -> 123456789
# -> 445322344
# -> 485672034
# -> 132957764
root = doc.root
puts root.attributes["title"]
# -> OmniCorp Store #45x10^3
puts root.elements["section/item[@stock='44']"].attributes["upc"]
# -> 132957764
puts root.elements["section"].attributes["name"]
# -> health (returns the first encountered matching element)
puts root.elements[1].attributes["name"]
# -> food (returns the FIRST child element)
root.detect {|node|
node.kind_of? Element and
node.attributes["name"] == "food"
}</pre>
</div>
<p>The last line finds the first child element with the name of "food". As you can see in this example, accessing attributes is also straightforward.
</p>
<p>You can also access xpaths directly via the XPath class:</p>
<div class="example">
<div class="exampletitle">Using XPath</div>
<pre># The invisibility cream is the first <item>
invisibility = XPath.first( doc, "//item" )
# Prints out all of the prices
XPath.each( doc, "//price") { |element| puts element.text }
# Gets an array of all of the "name" elements in the document.
names = XPath.match( doc, "//name" )
</pre>
</div>
<p>Another way of getting an array of matching nodes is through
Element.elements.to_a(). This is a misleading method, because it
will return an array of objects that match the xpath, and xpaths
can return more than just Elements.</p>
<div class="example">
<div class="exampletitle">Using to_a()</div>
<pre>all_elements = doc.elements.to_a
all_children = doc.to_a
all_upc_strings = doc.elements.to_a( "//item/attribute::upc" )
all_name_elements = doc.elements.to_a( "//name" )</pre>
</div>
<h3>Creating XML documents</h3>
<p>Again, there are a couple of mechanisms for creating XML documents in REXML. Adding elements by hand is faster than the convenience method, but which you use will probably be a matter of aesthetics.</p>
<div class="example">
<div class="exampletitle">Creating elements</div>
<pre>el = someelement.add_element "myel"
# creates an element named "myel", adds it to "someelement", and returns it
el2 = el.add_element "another", {"id"=>"10"}
# does the same, but also sets attribute "id" of el2 to "10"
el3 = Element.new "blah"
el1.elements << el3
el3.attributes["myid"] = "sean"
# creates el3 "blah", adds it to el1, then sets attribute "myid" to "sean"</pre>
</div>
<p>If you want to add text to an element, you can do it by either creating Text objects and adding them to the element, or by using the convenience method <TT>text=</TT>
</p>
<div class="example">
<div class="exampletitle">Adding text</div>
<pre>el1 = Element.new "myelement"
el1.text = "Hello world!"
# -> <myelement>Hello world!</myelement>
el1.add_text "Hello dolly"
# -> <myelement>Hello world!Hello dolly</element>
el1.add Text.new("Goodbye")
# -> <myelement>Hello world!Hello dollyGoodbye</element>
el1 << Text.new(" cruel world")
# -> <myelement>Hello world!Hello dollyGoodbye cruel world</element></pre>
</div>
<p>But note that each of these text objects are still stored as separate objects; <TT>el1.text</TT> will return "Hello world!"; <TT>el1[2]</TT> will return a Text object with the contents "Goodbye".</p>
<p>If you want to insert an element between two elements, you can use either the standard Ruby array notation, or <TT>Parent.insert_before</TT> and <TT>Parent.insert_after</TT>.</p>
<div class="example">
<div class="exampletitle">Inserts</div>
<pre>doc = Document.new "<a><one/><three/></a>"
doc.root[1,0] = Element.new "two"
# -> <a><one/><two/><three/></a>
three = doc.elements["a/three"]
doc.root.insert_after three, Element.new "four"
# -> <a><one/><two/><three/><four/></a>
# A convenience method allows you to insert before/after an XPath:
doc.root.insert_after( "//one", Element.new("one-five") )
# -> <a><one/><one-five/><two/><three/><four/></a>
# Another convenience method allows you to insert after/before an element:
four = doc.elements["//four"]
four.previous_sibling = Element.new("three-five")
# -> <a><one/><one-five/><two/><three/><three-five/><four/></a></pre>
</div>
<p>You may want to give REXML text, and have it left alone. You
may, for example, want to have "&amp;" left as it is, so that
you can do your own processing of entities.</p>
<div class="example">
<div class="exampletitle">Raw text</div>
<pre>text = Text.new "Cats &amp; dogs", false, true
puts text.string # -> "Cats &amp; dogs"</pre>
</div>
<p>You can also tell REXML to set the Text children of given
elements to raw automatically, on parsing or creating:</p>
<div class="example">
<div class="exampletitle">Automatic raw text handling</div>
<pre>doc = REXML::Document.new( source, {
:raw => %w{ tag1 tag2 tag3 }
}</pre>
</div>
<p>In this example, all tags named "tag1", "tag2", or "tag3" will
have any Text children set to raw text. If you want to have all
of the text processed as raw text, pass in the :all tag:</p>
<div class="example">
<div class="exampletitle">Raw documents</div>
<pre>doc = REXML::Document.new( source, { :raw => :all }</pre>
</div>
<h3>Writing a tree</h3>
<p>There isn't much simpler than writing a REXML tree. Simply pass an object that supports <TT><<( String )</TT> to the <TT>write</TT> method of any object. In Ruby, both IO instances (File) and String instances support <<.</p>
<div class="example">
<pre>doc.write $stdout
output = ""
doc.write output</pre>
</div>
<p>By default, REXML formats the output with indentation. If you want REXML to not format the output, pass <TT>write()</TT> and indent of -1:</p>
<div class="example">
<div class="exampletitle">Write with no indent</div>
<pre>doc.write $stdout, -1</pre>
</div>
<h3>Iterating</h3>
<p>There are four main methods of iterating over children. <TT>Element.each</TT>, which iterates over all the children; <TT>Element.elements.each</TT>, which iterates over just the child Elements; <TT>Element.next_element</TT> and <TT>Element.previous_element</TT>, which can be used to fetch the next Element siblings; and <TT>Element.next_sibling</TT> and <TT>Eleemnt.previous_sibling</TT>, which fetches the next and previous siblings, regardless of type.</p>
<h3>Stream Parsing</h3>
<p>REXML stream parsing requires you to supply a Listener class. When REXML encounters events in a document (tag start, text, etc.) it notifies your listener class of the event. You can supply any subset of the methods, but make sure you implement method_missing if you don't implement them all. A StreamListener module has been supplied as a template for you to use.</p>
<div class="example">
<div class="exampletitle">Stream parsing</div>
<pre>list = MyListener.new
source = File.new "mydoc.xml"
REXML::Document.parse_stream source</pre>
</div>
<p>Stream parsing in REXML is much like SAX, where events are
generated when the parser encounters them in the process of
parsing the document. When a tag is encountered, the stream
listener's <TT>tag_start()</TT> method is called. When the
tag end is encountered, <TT>tag_end()</TT> is called. When
text is encountered, <TT>text()</TT> is called, and so on,
until the end of the stream is reached. One other note: the
method <TT>entity()</TT> is called when an
<TT>&entity;</TT> is encountered in text, and only
then.</p>
<p>Please look at the <a href="api/rexml/StreamListener.html">StreamListener API</a> for more information.</p>
<h3>Whitespace</h3>
<p>In many applications, you want the parser to respect whitespace
in your document. In these cases, you have to tell the parser
which elements you want to respect whitespace in by passing a
context to the parser:</p>
<div class="example">
<div class="exampletitle">Respecting whitespace</div>
<pre>doc = REXML::Document.new( source, {
:respect_whitespace => %w{ tag1 tag2 tag3 }
}</pre>
</div>
<p>Whitespace for tags "tag1", "tag2", and "tag3" will be
respected; all other tags will have their whitespace
compressed. Like :raw, you can set :respect_whitespace to :all,
and have all elements have their whitespace respected.</p>
<h3>Automatic Entity Processing</h3>
<p>REXML does some automatic processing of entities for your
convenience. The processed entities are &, <, >,
", and '. If REXML finds any of these characters in
Text or Attribute values, it automatically turns them into entity
references when it writes them out. Additionally, when REXML
finds any of these entity references in a document source, it
converts them to their character equivalents. All other entity
references are left unprocessed. If REXML finds an &, <,
or > in the document source, it will generate a parsing
error.</p>
<div class="example">
<div class="exampletitle">Entity processing</div>
<pre>bad_source = "<a>Cats & dogs</a>"
good_source = "<a>Cats &amp; &#100;ogs</a>"
doc = REXML::Document.new bad_source # Generates a parse error
doc = REXML::Document.new good_source
puts doc.root.text # -> "Cats & &#100;ogs"
doc.root.write $stdout # -> "<a>Cats &amp; &#100;ogs</a>"
doc.root.attributes["m"] = "x'y\"z"
puts doc.root.attributes["m"] # -> "x'y\"z"
doc.root.write $stdout # -> "<a m='x&apos;y&quot;z'>Cats &amp; &#100;ogs</a>"</pre>
</div>
<h2>Credits</h2>
<p>Among the people who've contributed to this document are:</p>
<ul>
<li>
<a href="mailto:deicher@sandia.gov">Eichert, Diana</a> (bug fix)</li>
</ul>
</div>
<div id="footer">
<div style="float:left;">
<a href="http://www.germane-software.com/~ser">[ Home ]</a>
</div>
<div style="float:right;">
<a href="mailto:ser@germane-software.com">[ EMail ]</a>
</div>
<a href="http://www.germane-software.com/~ser/Software">[ Software ]</a>
</div>
</body>
</html>
|