1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297
|
<?xml version="1.0" encoding="UTF-8"?>
<!-- Reviewed: no -->
<sect1 id="zend.dom.query">
<title>Zend_Dom_Query</title>
<para>
<classname>Zend_Dom_Query</classname> provides mechanisms for querying
<acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
<acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
<acronym>MVC</acronym> applications, but could also be used for rapid development of screen
scrapers.
</para>
<para>
<acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
notation for web developers to utilize when querying documents with <acronym>XML</acronym>
structures. The notation should be familiar to anybody who has developed
Cascading Style Sheets or who utilizes Javascript toolkits that provide
functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
(<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
$$()</ulink> and
<ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
dojo.query</ulink> were both inspirations for the component).
</para>
<sect2 id="zend.dom.query.operation">
<title>Theory of Operation</title>
<para>
To use <classname>Zend_Dom_Query</classname>, you instantiate a
<classname>Zend_Dom_Query</classname> object, optionally passing a document to
query (a string). Once you have a document, you can use either the
<methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
method will return a <classname>Zend_Dom_Query_Result</classname> object with
any matching nodes.
</para>
<para>
The primary difference between <classname>Zend_Dom_Query</classname> and using
DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
selectors. You can utilize any of the following, in any combination:
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>element types</emphasis>: provide an element type to
match: 'div', 'a', 'span', 'h2', etc.
</para>
</listitem>
<listitem>
<para>
<emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
to match: '<command>.error</command>', '<command>div.error</command>',
'<command>label.required</command>', etc. If an
element defines more than one style, this will match as long as
the named style is present anywhere in the style declaration.
</para>
</listitem>
<listitem>
<para>
<emphasis>id attributes</emphasis>: element ID attributes to
match: '#content', 'div#nav', etc.
</para>
</listitem>
<listitem>
<para>
<emphasis>arbitrary attributes</emphasis>: arbitrary element
attributes to match. Three different types of matching are
provided:
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>exact match</emphasis>: the attribute exactly
matches the string: 'div[bar="baz"]' would match a div
element with a "bar" attribute that exactly matches the
value "baz".
</para>
</listitem>
<listitem>
<para>
<emphasis>word match</emphasis>: the attribute contains
a word matching the string: 'div[bar~="baz"]' would match a div
element with a "bar" attribute that contains the
word "baz". '<div bar="foo baz">' would match, but '<div
bar="foo bazbat">' would not.
</para>
</listitem>
<listitem>
<para>
<emphasis>substring match</emphasis>: the attribute contains
the string: 'div[bar*="baz"]' would match a div
element with a "bar" attribute that contains the
string "baz" anywhere within it.
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
<emphasis>direct descendents</emphasis>: utilize '>' between
selectors to denote direct descendents. 'div > span' would
select only 'span' elements that are direct descendents of a
'div'. Can also be used with any of the selectors above.
</para>
</listitem>
<listitem>
<para>
<emphasis>descendents</emphasis>: string together
multiple selectors to indicate a hierarchy along which
to search. '<command>div .foo span #one</command>' would select an element
of id 'one' that is a descendent of arbitrary depth
beneath a 'span' element, which is in turn a descendent
of arbitrary depth beneath an element with a class of
'foo', that is an descendent of arbitrary depth beneath
a 'div' element. For example, it would match the link to
the word 'One' in the listing below:
</para>
<programlisting language="html"><![CDATA[
<div>
<table>
<tr>
<td class="foo">
<div>
Lorem ipsum <span class="bar">
<a href="/foo/bar" id="one">One</a>
<a href="/foo/baz" id="two">Two</a>
<a href="/foo/bat" id="three">Three</a>
<a href="/foo/bla" id="four">Four</a>
</span>
</div>
</td>
</tr>
</table>
</div>
]]></programlisting>
</listitem>
</itemizedlist>
<para>
Once you've performed your query, you can then work with the result
object to determine information about the nodes, as well as to pull
them and/or their content directly for examination and manipulation.
<classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
and <classname>Iterator</classname>, and store the results internally as
DOMNodes and DOMElements. As an example, consider the following call,
that selects against the <acronym>HTML</acronym> above:
</para>
<programlisting language="php"><![CDATA[
$dom = new Zend_Dom_Query($html);
$results = $dom->query('.foo .bar a');
$count = count($results); // get number of matches: 4
foreach ($results as $result) {
// $result is a DOMElement
}
]]></programlisting>
<para>
<classname>Zend_Dom_Query</classname> also allows straight XPath queries
utilizing the <methodname>queryXpath()</methodname> method; you can pass any
valid XPath query to this method, and it will return a
<classname>Zend_Dom_Query_Result</classname> object.
</para>
</sect2>
<sect2 id="zend.dom.query.methods">
<title>Methods Available</title>
<para>
The <classname>Zend_Dom_Query</classname> family of classes have the following
methods available.
</para>
<sect3 id="zend.dom.query.methods.zenddomquery">
<title>Zend_Dom_Query</title>
<para>
The following methods are available to
<classname>Zend_Dom_Query</classname>:
</para>
<itemizedlist>
<listitem>
<para>
<methodname>setDocumentXml($document)</methodname>: specify an
<acronym>XML</acronym> string to query against.
</para>
</listitem>
<listitem>
<para>
<methodname>setDocumentXhtml($document)</methodname>: specify an
<acronym>XHTML</acronym> string to query against.
</para>
</listitem>
<listitem>
<para>
<methodname>setDocumentHtml($document)</methodname>: specify an
<acronym>HTML</acronym> string to query against.
</para>
</listitem>
<listitem>
<para>
<methodname>setDocument($document)</methodname>: specify a
string to query against; <classname>Zend_Dom_Query</classname> will
then attempt to autodetect the document type.
</para>
</listitem>
<listitem>
<para>
<methodname>getDocument()</methodname>: retrieve the original document
string provided to the object.
</para>
</listitem>
<listitem>
<para>
<methodname>getDocumentType()</methodname>: retrieve the document
type of the document provided to the object; will be one of
the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
<constant>DOC_HTML</constant> class constants.
</para>
</listitem>
<listitem>
<para>
<methodname>query($query)</methodname>: query the document using
<acronym>CSS</acronym> selector notation.
</para>
</listitem>
<listitem>
<para>
<methodname>queryXpath($xPathQuery)</methodname>: query the document
using XPath notation.
</para>
</listitem>
</itemizedlist>
</sect3>
<sect3 id="zend.dom.query.methods.zenddomqueryresult">
<title>Zend_Dom_Query_Result</title>
<para>
As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
implements both <classname>Iterator</classname> and
<classname>Countable</classname>, and as such can be used in a
<methodname>foreach()</methodname> loop as well as with the
<methodname>count()</methodname> function. Additionally, it exposes the
following methods:
</para>
<itemizedlist>
<listitem>
<para>
<methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
selector query used to produce the result (if any).
</para>
</listitem>
<listitem>
<para>
<methodname>getXpathQuery()</methodname>: return the XPath query
used to produce the result. Internally,
<classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
selector queries to XPath, so this value will always be populated.
</para>
</listitem>
<listitem>
<para>
<methodname>getDocument()</methodname>: retrieve the DOMDocument the
selection was made against.
</para>
</listitem>
</itemizedlist>
</sect3>
</sect2>
</sect1>
<!--
vim:se ts=4 sw=4 et:
-->
|