File: Zend_Dom-Query.xml

package info (click to toggle)
zendframework 1.12.9%2Bdfsg-2
links: PTS, VCS
area: main
in suites: jessie-kfreebsd
size: 133,584 kB
sloc: xml: 1,311,829; php: 570,173; sh: 170; makefile: 125; sql: 121
file content (297 lines) | stat: -rw-r--r-- 12,478 bytes
parent folder | download | duplicates (2)
<?xml version="1.0" encoding="UTF-8"?>
<!-- Reviewed: no -->
<sect1 id="zend.dom.query">
    <title>Zend_Dom_Query</title>

    <para>
        <classname>Zend_Dom_Query</classname> provides mechanisms for querying
        <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
        <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
        <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
        scrapers.
    </para>

    <para>
        <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
        notation for web developers to utilize when querying documents with <acronym>XML</acronym>
        structures. The notation should be familiar to anybody who has developed
        Cascading Style Sheets or who utilizes Javascript toolkits that provide
        functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
        (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
            $$()</ulink> and
        <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
            dojo.query</ulink> were both inspirations for the component).
    </para>

    <sect2 id="zend.dom.query.operation">
        <title>Theory of Operation</title>

        <para>
            To use <classname>Zend_Dom_Query</classname>, you instantiate a
            <classname>Zend_Dom_Query</classname> object, optionally passing a document to
            query (a string). Once you have a document, you can use either the
            <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
            method will return a <classname>Zend_Dom_Query_Result</classname> object with
            any matching nodes.
        </para>

        <para>
            The primary difference between <classname>Zend_Dom_Query</classname> and using
            DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
            selectors. You can utilize any of the following, in any combination:
        </para>

        <itemizedlist>
            <listitem>
                <para>
                    <emphasis>element types</emphasis>: provide an element type to
                    match: 'div', 'a', 'span', 'h2', etc.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
                    to match: '<command>.error</command>', '<command>div.error</command>',
                    '<command>label.required</command>', etc. If an
                    element defines more than one style, this will match as long as
                    the named style is present anywhere in the style declaration.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>id attributes</emphasis>: element ID attributes to
                    match: '#content', 'div#nav', etc.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>arbitrary attributes</emphasis>: arbitrary element
                    attributes to match. Three different types of matching are
                    provided:
                </para>

                <itemizedlist>
                    <listitem>
                        <para>
                            <emphasis>exact match</emphasis>: the attribute exactly
                            matches the string: 'div[bar="baz"]' would match a div
                            element with a "bar" attribute that exactly matches the
                            value "baz".
                        </para>
                    </listitem>

                    <listitem>
                        <para>
                            <emphasis>word match</emphasis>: the attribute contains
                            a word matching the string: 'div[bar~="baz"]' would match a div
                            element with a "bar" attribute that contains the
                            word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
                            bar="foo bazbat"&gt;' would not.
                        </para>
                    </listitem>

                    <listitem>
                        <para>
                            <emphasis>substring match</emphasis>: the attribute contains
                            the string: 'div[bar*="baz"]' would match a div
                            element with a "bar" attribute that contains the
                            string "baz" anywhere within it.
                        </para>
                    </listitem>
                </itemizedlist>
            </listitem>

            <listitem>
                <para>
                    <emphasis>direct descendents</emphasis>: utilize '&gt;' between
                    selectors to denote direct descendents. 'div > span' would
                    select only 'span' elements that are direct descendents of a
                    'div'. Can also be used with any of the selectors above.
                </para>
            </listitem>

            <listitem>
                <para>
                    <emphasis>descendents</emphasis>: string together
                    multiple selectors to indicate a hierarchy along which
                    to search. '<command>div .foo span #one</command>' would select an element
                    of id 'one' that is a descendent of arbitrary depth
                    beneath a 'span' element, which is in turn a descendent
                    of arbitrary depth beneath an element with a class of
                    'foo', that is an descendent of arbitrary depth beneath
                    a 'div' element. For example, it would match the link to
                    the word 'One' in the listing below:
                </para>

                <programlisting language="html"><![CDATA[
<div>
<table>
    <tr>
        <td class="foo">
            <div>
                Lorem ipsum <span class="bar">
                    <a href="/foo/bar" id="one">One</a>
                    <a href="/foo/baz" id="two">Two</a>
                    <a href="/foo/bat" id="three">Three</a>
                    <a href="/foo/bla" id="four">Four</a>
                </span>
            </div>
        </td>
    </tr>
</table>
</div>
]]></programlisting>
            </listitem>
        </itemizedlist>

        <para>
            Once you've performed your query, you can then work with the result
            object to determine information about the nodes, as well as to pull
            them and/or their content directly for examination and manipulation.
            <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
            and <classname>Iterator</classname>, and store the results internally as
            DOMNodes and DOMElements. As an example, consider the following call,
            that selects against the <acronym>HTML</acronym> above:
        </para>

        <programlisting language="php"><![CDATA[
$dom = new Zend_Dom_Query($html);
$results = $dom->query('.foo .bar a');

$count = count($results); // get number of matches: 4
foreach ($results as $result) {
    // $result is a DOMElement
}
]]></programlisting>

        <para>
            <classname>Zend_Dom_Query</classname> also allows straight XPath queries
            utilizing the <methodname>queryXpath()</methodname> method; you can pass any
            valid XPath query to this method, and it will return a
            <classname>Zend_Dom_Query_Result</classname> object.
        </para>
    </sect2>

    <sect2 id="zend.dom.query.methods">
        <title>Methods Available</title>

        <para>
            The <classname>Zend_Dom_Query</classname> family of classes have the following
            methods available.
        </para>

        <sect3 id="zend.dom.query.methods.zenddomquery">
            <title>Zend_Dom_Query</title>

            <para>
                The following methods are available to
                <classname>Zend_Dom_Query</classname>:
            </para>

            <itemizedlist>
                <listitem>
                    <para>
                        <methodname>setDocumentXml($document)</methodname>: specify an
                        <acronym>XML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocumentXhtml($document)</methodname>: specify an
                        <acronym>XHTML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocumentHtml($document)</methodname>: specify an
                        <acronym>HTML</acronym> string to query against.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>setDocument($document)</methodname>: specify a
                        string to query against; <classname>Zend_Dom_Query</classname> will
                        then attempt to autodetect the document type.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocument()</methodname>: retrieve the original document
                        string provided to the object.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocumentType()</methodname>: retrieve the document
                        type of the document provided to the object; will be one of
                        the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
                        <constant>DOC_HTML</constant> class constants.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>query($query)</methodname>: query the document using
                        <acronym>CSS</acronym> selector notation.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>queryXpath($xPathQuery)</methodname>: query the document
                        using XPath notation.
                    </para>
                </listitem>
            </itemizedlist>
        </sect3>

        <sect3 id="zend.dom.query.methods.zenddomqueryresult">
            <title>Zend_Dom_Query_Result</title>

            <para>
                As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
                implements both <classname>Iterator</classname> and
                <classname>Countable</classname>, and as such can be used in a
                <methodname>foreach()</methodname> loop as well as with the
                <methodname>count()</methodname> function. Additionally, it exposes the
                following methods:
            </para>

            <itemizedlist>
                <listitem>
                    <para>
                        <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
                        selector query used to produce the result (if any).
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getXpathQuery()</methodname>: return the XPath query
                        used to produce the result. Internally,
                        <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
                        selector queries to XPath, so this value will always be populated.
                    </para>
                </listitem>

                <listitem>
                    <para>
                        <methodname>getDocument()</methodname>: retrieve the DOMDocument the
                        selection was made against.
                    </para>
                </listitem>
            </itemizedlist>
        </sect3>
    </sect2>
</sect1>
<!--
vim:se ts=4 sw=4 et:
-->