File: README.rdoc

package info (click to toggle)
ruby-libxml 2.3.2-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 1,812 kB
  • sloc: xml: 9,628; ruby: 7,119; ansic: 6,665; makefile: 2
file content (189 lines) | stat: -rw-r--r-- 7,061 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
= LibXML Ruby

== Overview
The libxml gem provides Ruby language bindings for GNOME's Libxml2
XML toolkit. It is free software, released under the MIT License.

We think libxml-ruby is the best XML library for Ruby because:

* Speed - Its much faster than REXML and Hpricot
* Features - It provides an amazing number of featues
* Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite

== Requirements
libxml-ruby requires Ruby 1.8.4 or higher.  It is dependent on 
the following libraries to function properly:

* libm      (math routines: very standard)
* libz      (zlib)
* libiconv
* libxml2

If you are running Linux or Unix you'll need a C compiler so the
extension can be compiled when it is installed.  If you are running
Windows, then install the Windows specific RubyGem which
includes an already built extension.

== INSTALLATION
The easiest way to install libxml-ruby is via Ruby Gems.  To install:

<tt>gem install libxml-ruby</tt>

If you are running Windows, make sure to install the Win32 RubyGem
which includes prebuilt extensions for Ruby 1.8 and Ruby 1.9.  These
extensions are built with MinGW32 against libxml2 version 2.7.8,
iconv version 1.13 and zlib version 1.2.5.  Note these binaries
are available in the lib\libs directory.  To use them, put them
someplace on your path.

The gem also includes a Microsoft VC++ 2010
solution (useful for debugging).

== Getting Started
Using libxml is easy. First decide what parser you want to use:

* Generally you'll want to use the LibXML::XML::Parser which provides a tree based API.
* For larger documents that don't fit into memory, or if you prefer an input based API, use the LibXML::XML::Reader.
* To parse HTML files use LibXML::XML::HTMLParser.
* If you are masochistic, then use the LibXML::XML::SaxParser, which provides a callback API.

Once you have chosen a parser, choose a datasource.  Libxml can parse files, strings, URIs
and IO streams.  For each data source you can specify an LibXML::XML::Encoding, a base uri and
various parser options.  For more information, refer the LibXML::XML::Parser.document,
LibXML::XML::Parser.file, LibXML::XML::Parser.io or LibXML:::XML::Parser.string methods (the
same methods are defined on all four parser classes).

== Advanced Functionality
Beyond the basics of parsing and processing XML and HTML documents,
libxml provides a wealth of additional functionality.

Most commonly, you'll want to use its LibXML::XML::XPath support, which makes
it easy to find data inside a XML document.  Although not as popular,
LibXML::XML::XPointer provides another API for finding data inside an XML document.

Often times you'll need to validate data before processing it. For example,
if you accept user generated content submitted over the Web, you'll
want to verify that it does not contain malicious code such as embedded scripts.
This can be done using libxml's powerful set of validators:

* DTDs (LibXML::XML::Dtd)
* Relax Schemas (LibXML::XML::RelaxNG)
* XML Schema (LibXML::XML::Schema)

Finally, if you'd like to use XSL Transformations to process data,
then install the libxslt gem which is available at
http://rubyforge.org/projects/libxsl/.

== Usage
For information about using libxml-ruby please refer to its documentation at
http://xml4r.github.com/libxml-ruby/rdoc/index.html Some tutorials are also
available at https://github.com/xml4r/libxml-ruby/wiki.

All libxml classes are in the LibXML::XML module. The easiest
way to use libxml is to require 'xml'.  This will mixin
the LibXML module into the global namespace, allowing you to
write code like this:

  require 'xml'
  document = XML::Document.new

However, when creating an application or library you plan to
redistribute, it is best to not add the LibXML module to the global
namespace, in which case you can either write your code like this:

  require 'libxml'
  document = LibXML::XML::Document.new

Or you can utilize a namespace for your own work and include LibXML into it.
For example:

  require 'libxml'

  module MyApplication
    include LibXML

    class MyClass
      def some_method
        document = XML::Document.new
      end
    end
  end

For simplicity's sake, the documentation uses the xml module in its examples.

== Memory Management
libxml-ruby automatically manages memory associated with the
underlying libxml2 library.  There is however one corner case that
your code must handle.  If a node is imported into a document, but not
added to the document, a segmentation fault may occur on program termination.

  # Do NOT do this
  require 'xml'
  doc1 = XML::Document.string("test1")
  doc2 = XML::Document.string("test2")
  node = doc2.import(doc1.root)

If doc2 is freed before node2 a segmentatin fault will occur since
node2 references the document.  To avoid this, simply make sure to add the
node to the document:

  # DO this instead
  doc1 = XML::Document.string("test1")
  doc2 = XML::Document.string("test2")
  doc2.root << doc2.import(doc1.root)

Alternatively, you can call node2.remove! to disassociate node2 from doc2.

== Threading
libxml-ruby fully supports native, background Ruby threads.  This of course
only applies to Ruby 1.9.x and higher since earlier versions of Ruby do not
support native threads.

== Performance
In addition to being feature rich and conformation, the main reason
people use libxml-ruby is for performance.  Here are the results 
of a couple simple benchmarks recently blogged about on the
Web (you can find them in the benchmark directory of the 
libxml distribution).

From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks

               user     system      total        real
 libxml    0.032000   0.000000   0.032000 (  0.031000)
 Hpricot   0.640000   0.031000   0.671000 (  0.890000)
 REXML     1.813000   0.047000   1.860000 (  2.031000)

From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/

               user     system      total        real
 libxml    0.641000   0.031000   0.672000 (  0.672000)
 hpricot   5.359000   0.062000   5.421000 (  5.516000)
 rexml    22.859000   0.047000  22.906000 ( 23.203000) 


== Documentation
Documentation is available via rdoc, and is installed automatically with the
gem.

libxml-ruby's online documentation is generated using Hanna.  To generate
documentation from source:

  gem install hanna
  rake rdoc

Note that older versions of Rdoc, which ship with Ruby 1.8.x, will report
a number of errors.  To avoid them, install Rdoc 2.1 or higher from
RubyForge (http://rdoc.rubyforge.org/).  Once you have installed the gem,
you'll have to disable the version of Rdoc that Ruby 1.8.x includes.  An
easy way to do that is rename the directory uby/lib/ruby/1.8/rdoc to
ruby/lib/ruby/1.8/rdoc_old.

== Support

If you have any questions about using libxml-ruby, please send them to
libxml-devel@rubyforge.org.  If you have found any bugs in libxml-devel,
or have developed new patches, please submit them to Git Hub at
https://github.com/xml4r/libxml-ruby/issues.

== License
See LICENSE for license information.