File: tutorial.xml

package info (click to toggle)
librexml-ruby 1.2.5-1
links: PTS
area: main
in suites: woody
size: 792 kB
ctags: 655
sloc: ruby: 3,778; xml: 1,609; java: 109; makefile: 43
file content (235 lines) | stat: -rw-r--r-- 13,922 bytes
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="http://www.germane-software.com/~ser/Software/documentation.xsl"?>
<!DOCTYPE tutorial>
<documentation> 
	<head> 
		<title>REXML Tutorial</title>
		<version>$Revision: 1.1.2.1 $</version> 
		<date>*2001-296+594</date>
		<home>http://www.germane-software.com/~ser/software/rexml</home>
		<language>ruby</language>
		<author email="ser@germane-software.com" href="http://www.germane-software.com/~ser">Sean Russell</author>
	</head>
	<overview> 
		<purpose lang="en"> 
			<p>This is a tutorial for using <link href="http://www.germane-software.com/~ser/software/rexml">REXML</link>, a pure-Ruby XML processor.</p>
		</purpose> 
		<general>
			<p>REXML was inspired by the Electric XML library for Java, which features an easy-to-use API, small size, and speed.  Hopefully, REXML, designed with the same philosophy, has these same features.  I've tried to keep the API as intuitive as possible, and have followed the Ruby methodology for method naming and code flow, rather than mirroring the Java API.
			</p>
			<p>REXML supports both tree and stream document parsing.  Stream parsing is extremely fast (about 1.5 thousand times as fast).  However, with stream parsing, you don't get access to features such as XPath.</p>
			<subsection title="Tree Parsing XML and accessing Elements">
				<p>We'll start with parsing an XML document</p>
				<example>require "rexml/document"
file = File.new( "mydoc.xml" )
doc = REXML::Document.new file</example>
				<p>Line 3 creates a new document and parses the supplied file. You can also do the following</p>
				<example>require "rexml/document"
include REXML	# so that we don't have to prefix everything with REXML::...
<![CDATA[string = <<EOF
	<mydoc>
		<someelement attribute="nanoo">Text, text, text</someelement>
	</mydoc>
EOF]]>
doc = Document.new string</example>
			<p>So parsing a string is just as easy as parsing a file.  For future examples, I'm going to omit both the <code>require</code> and <code>include</code> lines.</p>
			<p>Once you have a document, you can access elements in that document in a number of ways:</p>
			<list>
			<item>The <code>Element</code> class itself has <code>each_element_with_attribute</code>, a common way of accessing elements.</item>
			<item>The attribute <code>Element.elements</code> is an <code>Elements</code> class instance which has the <code>each</code> and <code>[]</code> methods for accessing elements.  Both methods can be supplied with an XPath for filtering, which makes them very powerful.</item>
			<item>Since <code>Element</code> is a subclass of Parent, you can also access the element's children directly through the Array-like methods <code>Element[], Element.each, Element.find, Element.delete</code>.  This is the fastest way of accessing children, but note that, being a true array, XPath searches are not supported, and that all of the element children are contained in this array, not just the Element children.</item>
			</list>
			<p>Here are a few examples using these methods.  First is the source document used in the examples:</p>
			<example title="The source document"><![CDATA[<inventory title="OmniCorp Store #45x10^3">
   <section name="health">
      <item upc="123456789" stock="12">
         <name>Invisibility Cream</name>
         <price>14.50</price>
         <description>Makes you invisible</description>
         </item>
      <item upc="445322344" stock="18">
         <name>Levitation Salve</name>
         <price>23.99</price>   
         <description>Levitate yourself for up to 3 hours per application</description>
      </item>
   </section>
   <section name="food">
      <item upc="485672034" stock="653">
         <name>Blork and Freen Instameal</name>
         <price>4.95</price>
         <description>A tasty meal in a tablet; just add water</description>
      </item>
      <item upc="132957764" stock="44">
        <name>Grob winglets</name>
        <price>3.56</price>
        <description>Tender winglets of Grob.  Just add water</description>
      </item>
   </section>
</inventory>]]></example>
			<example title="Accessing Elements"><![CDATA[doc = Document.new File.new("mydoc.xml")
doc.elements.each("inventory/section") { |element| puts element.attributes["name"] }
# -> health
# -> food
doc.elements.each("*/section/item") { |element| puts element.attributes["upc"] }
# -> 123456789
# -> 445322344
# -> 485672034
# -> 132957764
root = doc.root
puts root.attributes["title"]
# -> OmniCorp Store #45x10^3
puts root.elements["section/item[@stock='44']"].attributes["upc"]
# -> 132957764
puts root.elements["section"].attributes["name"]
# -> health    (returns the first encountered matching element)
puts root.elements[1].attributes["name"]
# -> food      (returns the FIRST child element)
root.detect {|node|
   node.kind_of? Element and
   node.attributes["name"] == "food"
}]]></example>
				<p>The last line finds the first child element with the name of "food".  As you can see in this example, accessing attributes is also straightforward.
				</p>
				<p>You can also access xpaths directly via the XPath class:</p>
				<example title="Using XPath"><![CDATA[# The invisibility cream is the first <item>
invisibility = XPath.first( doc, "//item" )
# Prints out all of the prices
XPath.each( doc, "//price") { |element| puts element.text }
# Gets an array of all of the "name" elements in the document.
names = XPath.match( doc, "//name" )
]]></example>
				<p>Another way of getting an array of matching nodes is through
					Element.elements.to_a().  This is a misleading method, because it
					will return an array of objects that match the xpath, and xpaths
					can return more than just Elements.</p>
				<example title="Using to_a()"><![CDATA[all_elements = doc.elements.to_a
all_children = doc.to_a
all_upc_strings = doc.elements.to_a( "//item/attribute::upc" )
all_name_elements = doc.elements.to_a( "//name" )]]></example>
			</subsection>
			<subsection title="Creating XML documents">
				<p>Again, there are a couple of mechanisms for creating XML documents in REXML. Adding elements by hand is faster than the convenience method, but which you use will probably be a matter of aesthetics.</p>
				<example title="Creating elements"><![CDATA[el = someelement.add_element "myel"
# creates an element named "myel", adds it to "someelement", and returns it
el2 = el.add_element "another", {"id"=>"10"}
# does the same, but also sets attribute "id" of el2 to "10"
el3 = Element.new "blah"
el1.elements << el3
el3.attributes["myid"] = "sean"
# creates el3 "blah", adds it to el1, then sets attribute "myid" to "sean"]]></example>
				<p>If you want to add text to an element, you can do it by either creating Text objects and adding them to the element, or by using the convenience method <code>text=</code></p>
				<example title="Adding text"><![CDATA[el1 = Element.new "myelement"
el1.text = "Hello world!"
# -> <myelement>Hello world!</myelement>
el1.add_text "Hello dolly"
# -> <myelement>Hello world!Hello dolly</element>
el1.add Text.new("Goodbye")
# -> <myelement>Hello world!Hello dollyGoodbye</element>
el1 << Text.new(" cruel world")
# -> <myelement>Hello world!Hello dollyGoodbye cruel world</element>]]></example>
				<p>But note that each of these text objects are still stored as separate objects; <code>el1.text</code> will return "Hello world!"; <code>el1[2]</code> will return a Text object with the contents "Goodbye".</p>
				<p>If you want to insert an element between two elements, you can use either the standard Ruby array notation, or <code>Parent.insert_before</code> and <code>Parent.insert_after</code>.</p>
				<example title="Inserts"><![CDATA[doc = Document.new "<a><one/><three/></a>"
doc.root[1,0] = Element.new "two"
# -> <a><one/><two/><three/></a>
three = doc.elements["a/three"]
doc.root.insert_after three, Element.new "four"
# -> <a><one/><two/><three/><four/></a>
# A convenience method allows you to insert before/after an XPath:
doc.root.insert_after( "//one", Element.new("one-five") )
# -> <a><one/><one-five/><two/><three/><four/></a>
# Another convenience method allows you to insert after/before an element:
four = doc.elements["//four"]
four.previous_sibling = Element.new("three-five")
# -> <a><one/><one-five/><two/><three/><three-five/><four/></a>]]></example>
				<p>You may want to give REXML text, and have it left alone.  You
					may, for example, want to have "&amp;amp;" left as it is, so that
					you can do your own processing of entities.</p>
				<example title="Raw text"><![CDATA[text = Text.new "Cats &amp; dogs", false, true
puts text.string      # -> "Cats &amp; dogs"]]></example>
				<p>You can also tell REXML to set the Text children of given
					elements to raw automatically, on parsing or creating:</p>
				<example title="Automatic raw text handling">doc = REXML::Document.new( source, {
   :raw => %w{ tag1 tag2 tag3 }
}</example>
				<p>In this example, all tags named "tag1", "tag2", or "tag3" will
					have any Text children set to raw text.  If you want to have all
					of the text processed as raw text, pass in the :all tag:</p>
				<example title="Raw documents">doc = REXML::Document.new( source, { :raw => :all }</example>
			</subsection>


			<subsection title="Writing a tree">
				<p>There isn't much simpler than writing a REXML tree.  Simply pass an object that supports <code><![CDATA[<<( String )]]></code> to the <code>write</code> method of any object.  In Ruby, both IO instances (File) and String instances support <![CDATA[<<]]>.</p>
				<example>doc.write $stdout
output = ""
doc.write output</example>
				<p>By default, REXML formats the output with indentation.  If you want REXML to not format the output, pass <code>write()</code> and indent of -1:</p>
				<example title="Write with no indent">doc.write $stdout, -1</example>

			</subsection>
			<subsection title="Iterating">
				<p>There are four main methods of iterating over children.  <code>Element.each</code>, which iterates over all the children; <code>Element.elements.each</code>, which iterates over just the child Elements; <code>Element.next_element</code> and <code>Element.previous_element</code>, which can be used to fetch the next Element siblings; and <code>Element.next_sibling</code> and <code>Eleemnt.previous_sibling</code>, which fetches the next and previous siblings, regardless of type.</p>
			</subsection>
			<subsection title="Stream Parsing">
				<p>REXML stream parsing requires you to supply a Listener class.  When REXML encounters events in a document (tag start, text, etc.) it notifies your listener class of the event.  You can supply any subset of the methods, but make sure you implement method_missing if you don't implement them all.  A StreamListener module has been supplied as a template for you to use.</p>
				<example title="Stream parsing">list = MyListener.new
source = File.new "mydoc.xml"
REXML::Document.parse_stream source</example>
				<p>Stream parsing in REXML is much like SAX, where events are
					generated when the parser encounters them in the process of
					parsing the document.  When a tag is encountered, the stream
					listener's <code>tag_start()</code> method is called.  When the
					tag end is encountered, <code>tag_end()</code> is called.  When
					text is encountered, <code>text()</code> is called, and so on,
					until the end of the stream is reached.  One other note: the
					method <code>entity()</code> is called when an
					<code>&amp;entity;</code> is encountered in text, and only
					then.</p>
				<p>Please look at the <link href="api/rexml/StreamListener.html">StreamListener API</link> for more information.</p>
			</subsection>

			<subsection title="Whitespace">
				<p>In many applications, you want the parser to respect whitespace
					in your document.  In these cases, you have to tell the parser
					which elements you want to respect whitespace in by passing a
					context to the parser:</p>
					<example title="Respecting whitespace">doc = REXML::Document.new( source, {
   :respect_whitespace => %w{ tag1 tag2 tag3 }
 }</example>
 				<p>Whitespace for tags "tag1", "tag2", and "tag3" will be
					respected; all other tags will have their whitespace
					compressed.  Like :raw, you can set :respect_whitespace to :all,
					and have all elements have their whitespace respected.</p>
			</subsection>

			<subsection title="Automatic Entity Processing">
				<p>REXML does some automatic processing of entities for your
					convenience.  The processed entities are &amp;, &lt;, &gt;,
					&quot;, and &apos;.  If REXML finds any of these characters in
					Text or Attribute values, it automatically turns them into entity
					references when it writes them out.  Additionally, when REXML
					finds any of these entity references in a document source, it
					converts them to their character equivalents.  All other entity
					references are left unprocessed.  If REXML finds an &amp;, &lt;,
					or &gt; in the document source, it will generate a parsing
					error.</p>
				<example title="Entity processing"><![CDATA[bad_source = "<a>Cats & dogs</a>"
good_source = "<a>Cats &amp; &#100;ogs</a>"
doc = REXML::Document.new bad_source     # Generates a parse error
doc = REXML::Document.new good_source
puts doc.root.text                       # -> "Cats & &#100;ogs"
doc.root.write $stdout                   # -> "<a>Cats &amp; &#100;ogs</a>"
doc.root.attributes["m"] = "x'y\"z"
puts doc.root.attributes["m"]            # -> "x'y\"z"
doc.root.write $stdout                   # -> "<a m='x&apos;y&quot;z'>Cats &amp; &#100;ogs</a>"]]></example>
			</subsection>
		</general>
	</overview>
	<credits>
		<p>Among the people who've contributed to this document are:</p>
		<list>
			<item><link href="mailto:deicher@sandia.gov">Eichert, Diana</link> (bug fix)</item>
		</list>
	</credits>
</documentation>