File: Zend_Dom-Query.xml

package info (click to toggle)
zendframework 1.12.9%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie-kfreebsd
  • size: 133,584 kB
  • sloc: xml: 1,311,829; php: 570,173; sh: 170; makefile: 125; sql: 121
file content (297 lines) | stat: -rw-r--r-- 12,478 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
<?xml version="1.0" encoding="UTF-8"?>
<!-- Reviewed: no -->
<sect1 id="zend.dom.query">
    <title>Zend_Dom_Query</title>

    <para>
        <classname>Zend_Dom_Query</classname> provides mechanisms for querying
        <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
        <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
        <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
        scrapers.
    </para>

    <para>
        <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
        notation for web developers to utilize when querying documents with <acronym>XML</acronym>
        structures. The notation should be familiar to anybody who has developed
        Cascading Style Sheets or who utilizes Javascript toolkits that provide
        functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
        (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
            $$()</ulink> and
        <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
            dojo.query</ulink> were both inspirations for the component).
    </para>

    <sect2 id="zend.dom.query.operation">
        <title>Theory of Operation</title>

        <para>
            To use <classname>Zend_Dom_Query</classname>, you instantiate a
            <classname>Zend_Dom_Query</classname> object, optionally passing a document to
            query (a string). Once you have a document, you can use either the
            <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
            method will return a <classname>Zend_Dom_Query_Result</classname> object with
            any matching nodes.
        </para>

        <para>
            The primary difference between <classname>Zend_Dom_Query</classname> and using
            DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
            selectors. You can utilize any of the following, in any combination:
        </para>

        <itemizedlist>
            <listitem>
                <para>
                    <emphasis>element types</emphasis>: provide an element type to
                    match: 'div', 'a', 'span', 'h2', etc.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
                    to match: '<command>.error</command>', '<command>div.error</command>',
                    '<command>label.required</command>', etc. If an
                    element defines more than one style, this will match as long as
                    the named style is present anywhere in the style declaration.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>id attributes</emphasis>: element ID attributes to
                    match: '#content', 'div#nav', etc.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>arbitrary attributes</emphasis>: arbitrary element
                    attributes to match. Three different types of matching are
                    provided:
                </para>

                <itemizedlist>
                    <listitem>
                        <para>
                            <emphasis>exact match</emphasis>: the attribute exactly
                            matches the string: 'div[bar="baz"]' would match a div
                            element with a "bar" attribute that exactly matches the
                            value "baz".
                        </para>
                    </listitem>

                    <listitem>
                        <para>
                            <emphasis>word match</emphasis>: the attribute contains
                            a word matching the string: 'div[bar~="baz"]' would match a div
                            element with a "bar" attribute that contains the
                            word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
                            bar="foo bazbat"&gt;' would not.
                        </para>
                    </listitem>

                    <listitem>
                        <para>
                            <emphasis>substring match</emphasis>: the attribute contains
                            the string: 'div[bar*="baz"]' would match a div
                            element with a "bar" attribute that contains the
                            string "baz" anywhere within it.
                        </para>
                    </listitem>
                </itemizedlist>
            </listitem>

            <listitem>
                <para>
                    <emphasis>direct descendents</emphasis>: utilize '&gt;' between
                    selectors to denote direct descendents. 'div > span' would
                    select only 'span' elements that are direct descendents of a
                    'div'. Can also be used with any of the selectors above.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>descendents</emphasis>: string together
                    multiple selectors to indicate a hierarchy along which
                    to search. '<command>div .foo span #one</command>' would select an element
                    of id 'one' that is a descendent of arbitrary depth
                    beneath a 'span' element, which is in turn a descendent
                    of arbitrary depth beneath an element with a class of
                    'foo', that is an descendent of arbitrary depth beneath
                    a 'div' element. For example, it would match the link to
                    the word 'One' in the listing below:
                </para>

                <programlisting language="html"><![CDATA[
<div>
<table>
    <tr>
        <td class="foo">
            <div>
                Lorem ipsum <span class="bar">
                    <a href="/foo/bar" id="one">One</a>
                    <a href="/foo/baz" id="two">Two</a>
                    <a href="/foo/bat" id="three">Three</a>
                    <a href="/foo/bla" id="four">Four</a>
                </span>
            </div>
        </td>
    </tr>
</table>
</div>
]]></programlisting>
            </listitem>
        </itemizedlist>

        <para>
            Once you've performed your query, you can then work with the result
            object to determine information about the nodes, as well as to pull
            them and/or their content directly for examination and manipulation.
            <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
            and <classname>Iterator</classname>, and store the results internally as
            DOMNodes and DOMElements. As an example, consider the following call,
            that selects against the <acronym>HTML</acronym> above:
        </para>

        <programlisting language="php"><![CDATA[
$dom = new Zend_Dom_Query($html);
$results = $dom->query('.foo .bar a');

$count = count($results); // get number of matches: 4
foreach ($results as $result) {
    // $result is a DOMElement
}
]]></programlisting>

        <para>
            <classname>Zend_Dom_Query</classname> also allows straight XPath queries
            utilizing the <methodname>queryXpath()</methodname> method; you can pass any
            valid XPath query to this method, and it will return a
            <classname>Zend_Dom_Query_Result</classname> object.
        </para>
    </sect2>

    <sect2 id="zend.dom.query.methods">
        <title>Methods Available</title>

        <para>
            The <classname>Zend_Dom_Query</classname> family of classes have the following
            methods available.
        </para>

        <sect3 id="zend.dom.query.methods.zenddomquery">
            <title>Zend_Dom_Query</title>

            <para>
                The following methods are available to
                <classname>Zend_Dom_Query</classname>:
            </para>

            <itemizedlist>
                <listitem>
                    <para>
                        <methodname>setDocumentXml($document)</methodname>: specify an
                        <acronym>XML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocumentXhtml($document)</methodname>: specify an
                        <acronym>XHTML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocumentHtml($document)</methodname>: specify an
                        <acronym>HTML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocument($document)</methodname>: specify a
                        string to query against; <classname>Zend_Dom_Query</classname> will
                        then attempt to autodetect the document type.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocument()</methodname>: retrieve the original document
                        string provided to the object.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocumentType()</methodname>: retrieve the document
                        type of the document provided to the object; will be one of
                        the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
                        <constant>DOC_HTML</constant> class constants.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>query($query)</methodname>: query the document using
                        <acronym>CSS</acronym> selector notation.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>queryXpath($xPathQuery)</methodname>: query the document
                        using XPath notation.
                    </para>
                </listitem>
            </itemizedlist>
        </sect3>

        <sect3 id="zend.dom.query.methods.zenddomqueryresult">
            <title>Zend_Dom_Query_Result</title>

            <para>
                As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
                implements both <classname>Iterator</classname> and
                <classname>Countable</classname>, and as such can be used in a
                <methodname>foreach()</methodname> loop as well as with the
                <methodname>count()</methodname> function. Additionally, it exposes the
                following methods:
            </para>

            <itemizedlist>
                <listitem>
                    <para>
                        <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
                        selector query used to produce the result (if any).
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getXpathQuery()</methodname>: return the XPath query
                        used to produce the result. Internally,
                        <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
                        selector queries to XPath, so this value will always be populated.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocument()</methodname>: retrieve the DOMDocument the
                        selection was made against.
                    </para>
                </listitem>
            </itemizedlist>
        </sect3>
    </sect2>
</sect1>
<!--
vim:se ts=4 sw=4 et:
-->