1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
|
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="http://www.germane-software.com/~ser/Software/documentation.xsl"?>
<!DOCTYPE tutorial>
<documentation>
<head>
<title>REXML Tutorial</title>
<version>$Revision: 1.1.2.1 $</version>
<date>*2001-296+594</date>
<home>http://www.germane-software.com/~ser/software/rexml</home>
<language>ruby</language>
<author email="ser@germane-software.com" href="http://www.germane-software.com/~ser">Sean Russell</author>
</head>
<overview>
<purpose lang="en">
<p>This is a tutorial for using <link href="http://www.germane-software.com/~ser/software/rexml">REXML</link>, a pure-Ruby XML processor.</p>
</purpose>
<general>
<p>REXML was inspired by the Electric XML library for Java, which features an easy-to-use API, small size, and speed. Hopefully, REXML, designed with the same philosophy, has these same features. I've tried to keep the API as intuitive as possible, and have followed the Ruby methodology for method naming and code flow, rather than mirroring the Java API.
</p>
<p>REXML supports both tree and stream document parsing. Stream parsing is extremely fast (about 1.5 thousand times as fast). However, with stream parsing, you don't get access to features such as XPath.</p>
<subsection title="Tree Parsing XML and accessing Elements">
<p>We'll start with parsing an XML document</p>
<example>require "rexml/document"
file = File.new( "mydoc.xml" )
doc = REXML::Document.new file</example>
<p>Line 3 creates a new document and parses the supplied file. You can also do the following</p>
<example>require "rexml/document"
include REXML # so that we don't have to prefix everything with REXML::...
<![CDATA[string = <<EOF
<mydoc>
<someelement attribute="nanoo">Text, text, text</someelement>
</mydoc>
EOF]]>
doc = Document.new string</example>
<p>So parsing a string is just as easy as parsing a file. For future examples, I'm going to omit both the <code>require</code> and <code>include</code> lines.</p>
<p>Once you have a document, you can access elements in that document in a number of ways:</p>
<list>
<item>The <code>Element</code> class itself has <code>each_element_with_attribute</code>, a common way of accessing elements.</item>
<item>The attribute <code>Element.elements</code> is an <code>Elements</code> class instance which has the <code>each</code> and <code>[]</code> methods for accessing elements. Both methods can be supplied with an XPath for filtering, which makes them very powerful.</item>
<item>Since <code>Element</code> is a subclass of Parent, you can also access the element's children directly through the Array-like methods <code>Element[], Element.each, Element.find, Element.delete</code>. This is the fastest way of accessing children, but note that, being a true array, XPath searches are not supported, and that all of the element children are contained in this array, not just the Element children.</item>
</list>
<p>Here are a few examples using these methods. First is the source document used in the examples:</p>
<example title="The source document"><![CDATA[<inventory title="OmniCorp Store #45x10^3">
<section name="health">
<item upc="123456789" stock="12">
<name>Invisibility Cream</name>
<price>14.50</price>
<description>Makes you invisible</description>
</item>
<item upc="445322344" stock="18">
<name>Levitation Salve</name>
<price>23.99</price>
<description>Levitate yourself for up to 3 hours per application</description>
</item>
</section>
<section name="food">
<item upc="485672034" stock="653">
<name>Blork and Freen Instameal</name>
<price>4.95</price>
<description>A tasty meal in a tablet; just add water</description>
</item>
<item upc="132957764" stock="44">
<name>Grob winglets</name>
<price>3.56</price>
<description>Tender winglets of Grob. Just add water</description>
</item>
</section>
</inventory>]]></example>
<example title="Accessing Elements"><![CDATA[doc = Document.new File.new("mydoc.xml")
doc.elements.each("inventory/section") { |element| puts element.attributes["name"] }
# -> health
# -> food
doc.elements.each("*/section/item") { |element| puts element.attributes["upc"] }
# -> 123456789
# -> 445322344
# -> 485672034
# -> 132957764
root = doc.root
puts root.attributes["title"]
# -> OmniCorp Store #45x10^3
puts root.elements["section/item[@stock='44']"].attributes["upc"]
# -> 132957764
puts root.elements["section"].attributes["name"]
# -> health (returns the first encountered matching element)
puts root.elements[1].attributes["name"]
# -> food (returns the FIRST child element)
root.detect {|node|
node.kind_of? Element and
node.attributes["name"] == "food"
}]]></example>
<p>The last line finds the first child element with the name of "food". As you can see in this example, accessing attributes is also straightforward.
</p>
<p>You can also access xpaths directly via the XPath class:</p>
<example title="Using XPath"><![CDATA[# The invisibility cream is the first <item>
invisibility = XPath.first( doc, "//item" )
# Prints out all of the prices
XPath.each( doc, "//price") { |element| puts element.text }
# Gets an array of all of the "name" elements in the document.
names = XPath.match( doc, "//name" )
]]></example>
<p>Another way of getting an array of matching nodes is through
Element.elements.to_a(). This is a misleading method, because it
will return an array of objects that match the xpath, and xpaths
can return more than just Elements.</p>
<example title="Using to_a()"><![CDATA[all_elements = doc.elements.to_a
all_children = doc.to_a
all_upc_strings = doc.elements.to_a( "//item/attribute::upc" )
all_name_elements = doc.elements.to_a( "//name" )]]></example>
</subsection>
<subsection title="Creating XML documents">
<p>Again, there are a couple of mechanisms for creating XML documents in REXML. Adding elements by hand is faster than the convenience method, but which you use will probably be a matter of aesthetics.</p>
<example title="Creating elements"><![CDATA[el = someelement.add_element "myel"
# creates an element named "myel", adds it to "someelement", and returns it
el2 = el.add_element "another", {"id"=>"10"}
# does the same, but also sets attribute "id" of el2 to "10"
el3 = Element.new "blah"
el1.elements << el3
el3.attributes["myid"] = "sean"
# creates el3 "blah", adds it to el1, then sets attribute "myid" to "sean"]]></example>
<p>If you want to add text to an element, you can do it by either creating Text objects and adding them to the element, or by using the convenience method <code>text=</code></p>
<example title="Adding text"><![CDATA[el1 = Element.new "myelement"
el1.text = "Hello world!"
# -> <myelement>Hello world!</myelement>
el1.add_text "Hello dolly"
# -> <myelement>Hello world!Hello dolly</element>
el1.add Text.new("Goodbye")
# -> <myelement>Hello world!Hello dollyGoodbye</element>
el1 << Text.new(" cruel world")
# -> <myelement>Hello world!Hello dollyGoodbye cruel world</element>]]></example>
<p>But note that each of these text objects are still stored as separate objects; <code>el1.text</code> will return "Hello world!"; <code>el1[2]</code> will return a Text object with the contents "Goodbye".</p>
<p>If you want to insert an element between two elements, you can use either the standard Ruby array notation, or <code>Parent.insert_before</code> and <code>Parent.insert_after</code>.</p>
<example title="Inserts"><![CDATA[doc = Document.new "<a><one/><three/></a>"
doc.root[1,0] = Element.new "two"
# -> <a><one/><two/><three/></a>
three = doc.elements["a/three"]
doc.root.insert_after three, Element.new "four"
# -> <a><one/><two/><three/><four/></a>
# A convenience method allows you to insert before/after an XPath:
doc.root.insert_after( "//one", Element.new("one-five") )
# -> <a><one/><one-five/><two/><three/><four/></a>
# Another convenience method allows you to insert after/before an element:
four = doc.elements["//four"]
four.previous_sibling = Element.new("three-five")
# -> <a><one/><one-five/><two/><three/><three-five/><four/></a>]]></example>
<p>You may want to give REXML text, and have it left alone. You
may, for example, want to have "&amp;" left as it is, so that
you can do your own processing of entities.</p>
<example title="Raw text"><![CDATA[text = Text.new "Cats & dogs", false, true
puts text.string # -> "Cats & dogs"]]></example>
<p>You can also tell REXML to set the Text children of given
elements to raw automatically, on parsing or creating:</p>
<example title="Automatic raw text handling">doc = REXML::Document.new( source, {
:raw => %w{ tag1 tag2 tag3 }
}</example>
<p>In this example, all tags named "tag1", "tag2", or "tag3" will
have any Text children set to raw text. If you want to have all
of the text processed as raw text, pass in the :all tag:</p>
<example title="Raw documents">doc = REXML::Document.new( source, { :raw => :all }</example>
</subsection>
<subsection title="Writing a tree">
<p>There isn't much simpler than writing a REXML tree. Simply pass an object that supports <code><![CDATA[<<( String )]]></code> to the <code>write</code> method of any object. In Ruby, both IO instances (File) and String instances support <![CDATA[<<]]>.</p>
<example>doc.write $stdout
output = ""
doc.write output</example>
<p>By default, REXML formats the output with indentation. If you want REXML to not format the output, pass <code>write()</code> and indent of -1:</p>
<example title="Write with no indent">doc.write $stdout, -1</example>
</subsection>
<subsection title="Iterating">
<p>There are four main methods of iterating over children. <code>Element.each</code>, which iterates over all the children; <code>Element.elements.each</code>, which iterates over just the child Elements; <code>Element.next_element</code> and <code>Element.previous_element</code>, which can be used to fetch the next Element siblings; and <code>Element.next_sibling</code> and <code>Eleemnt.previous_sibling</code>, which fetches the next and previous siblings, regardless of type.</p>
</subsection>
<subsection title="Stream Parsing">
<p>REXML stream parsing requires you to supply a Listener class. When REXML encounters events in a document (tag start, text, etc.) it notifies your listener class of the event. You can supply any subset of the methods, but make sure you implement method_missing if you don't implement them all. A StreamListener module has been supplied as a template for you to use.</p>
<example title="Stream parsing">list = MyListener.new
source = File.new "mydoc.xml"
REXML::Document.parse_stream source</example>
<p>Stream parsing in REXML is much like SAX, where events are
generated when the parser encounters them in the process of
parsing the document. When a tag is encountered, the stream
listener's <code>tag_start()</code> method is called. When the
tag end is encountered, <code>tag_end()</code> is called. When
text is encountered, <code>text()</code> is called, and so on,
until the end of the stream is reached. One other note: the
method <code>entity()</code> is called when an
<code>&entity;</code> is encountered in text, and only
then.</p>
<p>Please look at the <link href="api/rexml/StreamListener.html">StreamListener API</link> for more information.</p>
</subsection>
<subsection title="Whitespace">
<p>In many applications, you want the parser to respect whitespace
in your document. In these cases, you have to tell the parser
which elements you want to respect whitespace in by passing a
context to the parser:</p>
<example title="Respecting whitespace">doc = REXML::Document.new( source, {
:respect_whitespace => %w{ tag1 tag2 tag3 }
}</example>
<p>Whitespace for tags "tag1", "tag2", and "tag3" will be
respected; all other tags will have their whitespace
compressed. Like :raw, you can set :respect_whitespace to :all,
and have all elements have their whitespace respected.</p>
</subsection>
<subsection title="Automatic Entity Processing">
<p>REXML does some automatic processing of entities for your
convenience. The processed entities are &, <, >,
", and '. If REXML finds any of these characters in
Text or Attribute values, it automatically turns them into entity
references when it writes them out. Additionally, when REXML
finds any of these entity references in a document source, it
converts them to their character equivalents. All other entity
references are left unprocessed. If REXML finds an &, <,
or > in the document source, it will generate a parsing
error.</p>
<example title="Entity processing"><![CDATA[bad_source = "<a>Cats & dogs</a>"
good_source = "<a>Cats & dogs</a>"
doc = REXML::Document.new bad_source # Generates a parse error
doc = REXML::Document.new good_source
puts doc.root.text # -> "Cats & dogs"
doc.root.write $stdout # -> "<a>Cats & dogs</a>"
doc.root.attributes["m"] = "x'y\"z"
puts doc.root.attributes["m"] # -> "x'y\"z"
doc.root.write $stdout # -> "<a m='x'y"z'>Cats & dogs</a>"]]></example>
</subsection>
</general>
</overview>
<credits>
<p>Among the people who've contributed to this document are:</p>
<list>
<item><link href="mailto:deicher@sandia.gov">Eichert, Diana</link> (bug fix)</item>
</list>
</credits>
</documentation>
|