File: querymodel-zebra.html

package info (click to toggle)
idzebra 2.2.8-2
links: PTS, VCS
area: main
in suites: forky, sid
size: 10,572 kB
sloc: ansic: 54,389; xml: 27,058; sh: 5,892; makefile: 1,102; perl: 210; tcl: 64
file content (492 lines) | stat: -rw-r--r-- 40,151 bytes
parent folder | download | duplicates (3)
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>3.Extended Zebra RPN Features</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="Zebra - User's Guide and Reference"><link rel="up" href="querymodel.html" title="Chapter5.Query Model"><link rel="prev" href="querymodel-rpn.html" title="2.RPN queries and semantics"><link rel="next" href="querymodel-cql-to-pqf.html" title="4.Server Side CQL to PQF Query Translation"></head><body><link rel="stylesheet" type="text/css" href="common/style1.css"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.Extended <span class="application">Zebra</span> <acronym class="acronym">RPN</acronym> Features</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="querymodel-rpn.html">Prev</a></td><th width="60%" align="center">Chapter5.Query Model</th><td width="20%" align="right"><a accesskey="n" href="querymodel-cql-to-pqf.html">Next</a></td></tr></table><hr></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="querymodel-zebra"></a>3.Extended <span class="application">Zebra</span> <acronym class="acronym">RPN</acronym> Features</h2></div></div></div><p>
    The <span class="application">Zebra</span> internal query engine has been extended to specific needs
    not covered by the <code class="literal">bib-1</code> attribute set query
    model. These extensions are <span class="emphasis"><em>non-standard</em></span>
    and <span class="emphasis"><em>non-portable</em></span>: most functional extensions
    are modeled over the <code class="literal">bib-1</code> attribute set,
    defining type 7 and higher values.
    There are also the special
    <code class="literal">string</code> type index names for the
    <code class="literal">idxpath</code> attribute set.
   </p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-allrecords"></a>3.1.<span class="application">Zebra</span> specific retrieval of all records</h3></div></div></div><p>
     <span class="application">Zebra</span> defines a hardwired <code class="literal">string</code> index name
     called <code class="literal">_ALLRECORDS</code>. It matches any record
     contained in the database, if used in conjunction with
     the relation attribute
     <code class="literal">AlwaysMatches (103)</code>.
    </p><p>
     The <code class="literal">_ALLRECORDS</code> index name is used for total database
     export. The search term is ignored, it may be empty.
     </p><pre class="screen">
      Z&gt; find @attr 1=_ALLRECORDS @attr 2=103 ""
     </pre><p>
    </p><p>
     Combination with other index types can be made. For example, to
     find all records which are <span class="emphasis"><em>not</em></span> indexed in
     the <code class="literal">Title</code> register, issue one of the two
     equivalent queries:
     </p><pre class="screen">
      Z&gt; find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
      Z&gt; find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
     </pre><p>
    </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
      The special string index <code class="literal">_ALLRECORDS</code> is
      experimental, and the provided functionality and syntax may very
      well change in future releases of <span class="application">Zebra</span>.
     </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-search"></a>3.2.<span class="application">Zebra</span> specific Search Extensions to all Attribute Sets</h3></div></div></div><p>
     <span class="application">Zebra</span> extends the <acronym class="acronym">BIB-1</acronym> attribute types, and these extensions are
     recognized regardless of attribute
     set used in a <code class="literal">search</code> operation query.
    </p><div class="table"><a name="querymodel-zebra-attr-search-table"></a><p class="title"><b>Table5.9.<span class="application">Zebra</span> Search Attribute Extensions</b></p><div class="table-contents"><table class="table" summary="Zebra Search Attribute Extensions" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Name</th><th>Value</th><th>Operation</th><th><span class="application">Zebra</span> version</th></tr></thead><tbody><tr><td>Embedded Sort</td><td>7</td><td>search</td><td>1.1</td></tr><tr><td>Term Set</td><td>8</td><td>search</td><td>1.1</td></tr><tr><td>Rank Weight</td><td>9</td><td>search</td><td>1.1</td></tr><tr><td>Term Reference</td><td>10</td><td>search</td><td>1.4</td></tr><tr><td>Local Approx Limit</td><td>11</td><td>search</td><td>1.4</td></tr><tr><td>Global Approx Limit</td><td>12</td><td>search</td><td>2.0.8</td></tr><tr><td>Maximum number of truncated terms (truncmax)</td><td>13</td><td>search</td><td>2.0.10</td></tr><tr><td>
	 Specifies whether un-indexed fields should be ignored.
	 A zero value (default) throws a diagnostic when an un-indexed
	 field is specified. A non-zero value makes it return 0 hits.
	</td><td>14</td><td>search</td><td>2.0.16</td></tr></tbody></table></div></div><br class="table-break"><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-sorting"></a>3.2.1.<span class="application">Zebra</span> Extension Embedded Sort Attribute (type 7)</h4></div></div></div><p>
      The embedded sort is a way to specify sort within a query - thus
      removing the need to send a Sort Request separately. It is both
      faster and does not require clients to deal with the Sort
      Facility.
     </p><p>
      All ordering operations are based on a lexicographical ordering,
      <span class="emphasis"><em>except</em></span> when the
      <code class="literal">structure attribute numeric (109)</code> is used. In
      this case, ordering is numerical. See
      <a class="xref" href="querymodel-rpn.html#querymodel-bib1-structure" title="2.4.3.Structure Attributes (type 4)">Section2.4.3, &#8220;Structure Attributes (type 4)&#8221;</a>.
     </p><p>
      The possible values after attribute <code class="literal">type 7</code> are
      <code class="literal">1</code> ascending and
      <code class="literal">2</code> descending.
      The attributes+term (<acronym class="acronym">APT</acronym>) node is separate from the
      rest and must be <code class="literal">@or</code>'ed.
      The term associated with <acronym class="acronym">APT</acronym> is the sorting level in integers,
      where <code class="literal">0</code> means primary sort,
      <code class="literal">1</code> means secondary sort, and so forth.
      See also <a class="xref" href="administration-ranking.html" title="9.Relevance Ranking and Sorting of Result Sets">Section9, &#8220;Relevance Ranking and Sorting of Result Sets&#8221;</a>.
     </p><p>
      For example, searching for water, sort by title (ascending)
      </p><pre class="screen">
       Z&gt; find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
      </pre><p>
     </p><p>
      Or, searching for water, sort by title ascending, then date descending
      </p><pre class="screen">
       Z&gt; find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
      </pre><p>
     </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-weight"></a>3.2.2.<span class="application">Zebra</span> Extension Rank Weight Attribute (type 9)</h4></div></div></div><p>
      Rank weight is a way to pass a value to a ranking algorithm - so
      that one <acronym class="acronym">APT</acronym> has one value - while another as a different one.
      See also <a class="xref" href="administration-ranking.html" title="9.Relevance Ranking and Sorting of Result Sets">Section9, &#8220;Relevance Ranking and Sorting of Result Sets&#8221;</a>.
     </p><p>
      For example, searching  for utah in title with weight 30 as well
      as any with weight 20:
      </p><pre class="screen">
       Z&gt; find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
      </pre><p>
     </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-termref"></a>3.2.3.<span class="application">Zebra</span> Extension Term Reference Attribute (type 10)</h4></div></div></div><p>
      <span class="application">Zebra</span> supports the searchResult-1 facility.
      If the Term Reference Attribute (type 10) is
      given, that specifies a subqueryId value returned as part of the
      search result. It is a way for a client to name an <acronym class="acronym">APT</acronym> part of a
      query.
     </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
       Experimental. Do not use in production code.
      </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-local-attr-limit"></a>3.2.4.Local Approximative Limit Attribute (type 11)</h4></div></div></div><p>
      <span class="application">Zebra</span> computes - unless otherwise configured -
      the exact hit count for every <acronym class="acronym">APT</acronym>
      (leaf) in the query tree. These hit counts are returned as part of
      the searchResult-1 facility in the binary encoded <acronym class="acronym">Z39.50</acronym> search
      response packages.
     </p><p>
      By setting an estimation limit size of the resultset of the <acronym class="acronym">APT</acronym>
      leaves, <span class="application">Zebra</span> stops processing the result set when the limit
      length is reached.
      Hit counts under this limit are still precise, but hit counts over it
      are estimated using the statistics gathered from the chopped
      result set.
     </p><p>
      Specifying a limit of <code class="literal">0</code> results in exact hit counts.
     </p><p>
      For example, we might be interested in exact hit count for a, but
      for b we allow hit count estimates for 1000 and higher.
      </p><pre class="screen">
       Z&gt; find @and a @attr 11=1000 b
      </pre><p>
     </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
       The estimated hit count facility makes searches faster, as one
       only needs to process large hit lists partially.
       It is mostly used in huge databases, where you you want trade
       exactness of hit counts against speed of execution.
      </p></div><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
       Do not use approximative hit count limits
       in conjunction with relevance ranking, as re-sorting of the
       result set only works when the entire result set has
       been processed.
      </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-global-attr-limit"></a>3.2.5.Global Approximative Limit Attribute (type 12)</h4></div></div></div><p>
      By default <span class="application">Zebra</span> computes precise hit counts for a query as
      a whole. Setting attribute 12 makes it perform approximative
      hit counts instead. It has the same semantics as
      <code class="literal">estimatehits</code> for the <a class="xref" href="zebra-cfg.html" title="2.The Zebra Configuration File">Section2, &#8220;The <span class="application">Zebra</span> Configuration File&#8221;</a>.
     </p><p>
      The attribute (12) can occur anywhere in the query tree.
      Unlike regular attributes it does not relate to the leaf (<acronym class="acronym">APT</acronym>)
      - but to the whole query.
     </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
       Do not use approximative hit count limits
       in conjunction with relevance ranking, as re-sorting of the
       result set only works when the entire result set has
       been processed.
      </p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-scan"></a>3.3.<span class="application">Zebra</span> specific Scan Extensions to all Attribute Sets</h3></div></div></div><p>
     <span class="application">Zebra</span> extends the Bib1 attribute types, and these extensions are
     recognized regardless of attribute
     set used in a scan operation query.
    </p><div class="table"><a name="querymodel-zebra-attr-scan-table"></a><p class="title"><b>Table5.10.<span class="application">Zebra</span> Scan Attribute Extensions</b></p><div class="table-contents"><table class="table" summary="Zebra Scan Attribute Extensions" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Name</th><th>Type</th><th>Operation</th><th><span class="application">Zebra</span> version</th></tr></thead><tbody><tr><td>Result Set Narrow</td><td>8</td><td>scan</td><td>1.3</td></tr><tr><td>Approximative Limit</td><td>12</td><td>scan</td><td>2.0.20</td></tr></tbody></table></div></div><br class="table-break"><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-narrow"></a>3.3.1.<span class="application">Zebra</span> Extension Result Set Narrow (type 8)</h4></div></div></div><p>
      If attribute Result Set Narrow (type 8)
      is given for scan, the value is the name of a
      result set. Each hit count in scan is
      <code class="literal">@and</code>'ed with the result set given.
     </p><p>
      Consider for example
      the case of scanning all title fields around the
      scanterm <span class="emphasis"><em>mozart</em></span>, then refining the scan by
      issuing a filtering query for <span class="emphasis"><em>amadeus</em></span> to
      restrict the scan to the result set of the query:
      </p><pre class="screen">
       Z&gt; scan @attr 1=4 mozart
       ...
       * mozart (43)
       mozartforskningen (1)
       mozartiana (1)
       mozarts (16)
       ...
       Z&gt; f @attr 1=4 amadeus
       ...
       Number of hits: 15, setno 2
       ...
       Z&gt; scan @attr 1=4 @attr 8=2 mozart
       ...
       * mozart (14)
       mozartforskningen (0)
       mozartiana (0)
       mozarts (1)
       ...
      </pre><p>
     </p><p>
      <span class="application">Zebra</span> 2.0.2 and later is able to skip 0 hit counts. This, however,
      is known not to scale if the number of terms to skip is high.
      This most likely will happen if the result set is small (and
      result in many 0 hits).
     </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-approx"></a>3.3.2.<span class="application">Zebra</span> Extension Approximative Limit (type 12)</h4></div></div></div><p>
      The <span class="application">Zebra</span> Extension Approximative Limit (type 12) is a way to
      enable approximate hit counts for scan hit counts, in the same
      way as for search hit counts.
     </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-idxpath"></a>3.4.<span class="application">Zebra</span> special <acronym class="acronym">IDXPATH</acronym> Attribute Set for <acronym class="acronym">GRS-1</acronym> indexing</h3></div></div></div><p>
     The attribute-set <code class="literal">idxpath</code> consists of a single
     Use (type 1) attribute. All non-use attributes behave as normal.
    </p><p>
     This feature is enabled when defining the
     <code class="literal">xpath enable</code> option in the <acronym class="acronym">GRS-1</acronym> filter
     <code class="filename">*.abs</code> configuration files. If one wants to use
     the special <code class="literal">idxpath</code> numeric attribute set, the
     main <span class="application">Zebra</span> configuration file <code class="filename">zebra.cfg</code>
     directive <code class="literal">attset: idxpath.att</code> must be enabled.
    </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
      The <code class="literal">idxpath</code> is deprecated, may not be
      supported in future <span class="application">Zebra</span> versions, and should definitely
      not be used in production code.
     </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-idxpath-use"></a>3.4.1.<acronym class="acronym">IDXPATH</acronym> Use Attributes (type = 1)</h4></div></div></div><p>
      This attribute set allows one to search <acronym class="acronym">GRS-1</acronym> filter indexed
      records by <acronym class="acronym">XPATH</acronym> like structured index names.
     </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
       The <code class="literal">idxpath</code> option defines hard-coded
       index names, which might clash with your own index names.
      </p></div><div class="table"><a name="querymodel-idxpath-use-table"></a><p class="title"><b>Table5.11.<span class="application">Zebra</span> specific <acronym class="acronym">IDXPATH</acronym> Use Attributes (type 1)</b></p><div class="table-contents"><table class="table" summary="Zebra specific IDXPATH Use Attributes (type 1)" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th><acronym class="acronym">IDXPATH</acronym></th><th>Value</th><th>String Index</th><th>Notes</th></tr></thead><tbody><tr><td><acronym class="acronym">XPATH</acronym> Begin</td><td>1</td><td>_XPATH_BEGIN</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> End</td><td>2</td><td>_XPATH_END</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> CData</td><td>1016</td><td>_XPATH_CDATA</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> Attribute Name</td><td>3</td><td>_XPATH_ATTR_NAME</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> Attribute CData</td><td>1015</td><td>_XPATH_ATTR_CDATA</td><td>deprecated</td></tr></tbody></table></div></div><br class="table-break"><p>
      See <code class="filename">tab/idxpath.att</code> for more information.
     </p><p>
      Search for all documents starting with root element
      <code class="literal">/root</code> (either using the numeric or the string
      use attributes):
      </p><pre class="screen">
       Z&gt; find @attrset idxpath @attr 1=1 @attr 4=3 root/
       Z&gt; find @attr idxpath 1=1 @attr 4=3 root/
       Z&gt; find @attr 1=_XPATH_BEGIN @attr 4=3 root/
      </pre><p>
     </p><p>
      Search for all documents where specific nested <acronym class="acronym">XPATH</acronym>
      <code class="literal">/c1/c2/../cn</code> exists. Notice the very
      counter-intuitive <span class="emphasis"><em>reverse</em></span> notation!
      </p><pre class="screen">
       Z&gt; find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
       Z&gt; find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
      </pre><p>
     </p><p>
      Search for CDATA string <span class="emphasis"><em>text</em></span> in any  element
      </p><pre class="screen">
       Z&gt; find @attrset idxpath @attr 1=1016 text
       Z&gt; find @attr 1=_XPATH_CDATA text
      </pre><p>
     </p><p>
      Search for CDATA string <span class="emphasis"><em>anothertext</em></span> in any
      attribute:
      </p><pre class="screen">
       Z&gt; find @attrset idxpath @attr 1=1015 anothertext
       Z&gt; find @attr 1=_XPATH_ATTR_CDATA anothertext
      </pre><p>
     </p><p>
      Search for all documents with have an <acronym class="acronym">XML</acronym> element node
      including an <acronym class="acronym">XML</acronym>  attribute named <span class="emphasis"><em>creator</em></span>
      </p><pre class="screen">
       Z&gt; find @attrset idxpath @attr 1=3 @attr 4=3 creator
       Z&gt; find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
      </pre><p>
     </p><p>
      Combining usual <code class="literal">bib-1</code> attribute set searches
      with <code class="literal">idxpath</code> attribute set searches:
      </p><pre class="screen">
       Z&gt; find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
       Z&gt; find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
      </pre><p>
     </p><p>
      Scanning is supported on all <code class="literal">idxpath</code>
      indexes, both specified as numeric use attributes, or as string
      index names.
      </p><pre class="screen">
       Z&gt; scan  @attrset idxpath @attr 1=1016 text
       Z&gt; scan  @attr 1=_XPATH_ATTR_CDATA anothertext
       Z&gt; scan  @attrset idxpath @attr 1=3 @attr 4=3 ''
      </pre><p>
     </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-pqf-apt-mapping"></a>3.5.Mapping from <acronym class="acronym">PQF</acronym> atomic <acronym class="acronym">APT</acronym> queries to <span class="application">Zebra</span> internal
     register indexes</h3></div></div></div><p>
     The rules for <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> mapping are rather tricky to grasp in the
     first place. We deal first with the rules for deciding which
     internal register or string index to use, according to the use
     attribute or access point specified in the query. Thereafter we
     deal with the rules for determining the correct structure type of
     the named register.
    </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-pqf-apt-mapping-accesspoint"></a>3.5.1.Mapping of <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> access points</h4></div></div></div><p>
      <span class="application">Zebra</span> understands four fundamental different types of access
      points, of which only the
      <span class="emphasis"><em>numeric use attribute</em></span> type access points
      are defined by the  <a class="ulink" href="https://www.loc.gov/z3950/agency/" target="_top"><acronym class="acronym">Z39.50</acronym></a>
      standard.
      All other access point types are <span class="application">Zebra</span> specific, and non-portable.
     </p><div class="table"><a name="querymodel-zebra-mapping-accesspoint-types"></a><p class="title"><b>Table5.12.Access point name mapping</b></p><div class="table-contents"><table class="table" summary="Access point name mapping" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Access Point</th><th>Type</th><th>Grammar</th><th>Notes</th></tr></thead><tbody><tr><td>Use attribute</td><td>numeric</td><td>[1-9][1-9]*</td><td>directly mapped to string index name</td></tr><tr><td>String index name</td><td>string</td><td>[a-zA-Z](\-?[a-zA-Z0-9])*</td><td>normalized name is used as internal string index name</td></tr><tr><td><span class="application">Zebra</span> internal index name</td><td>zebra</td><td>_[a-zA-Z](_?[a-zA-Z0-9])*</td><td>hardwired internal string index name</td></tr><tr><td><acronym class="acronym">XPATH</acronym> special index</td><td>XPath</td><td>/.*</td><td>special xpath search for <acronym class="acronym">GRS-1</acronym> indexed records</td></tr></tbody></table></div></div><br class="table-break"><p>
      <code class="literal">Attribute set names</code> and
      <code class="literal">string index names</code> are normalizes
      according to the following rules: all <span class="emphasis"><em>single</em></span>
      hyphens <code class="literal">'-'</code> are stripped, and all upper case
      letters are folded to lower case.
     </p><p>
      <span class="emphasis"><em>Numeric use attributes</em></span> are mapped
      to the <span class="application">Zebra</span> internal
      string index according to the attribute set definition in use.
      The default attribute set is <acronym class="acronym">BIB-1</acronym>, and may be
      omitted in the <acronym class="acronym">PQF</acronym> query.
     </p><p>
      According to normalization and numeric
      use attribute mapping, it follows that the following
      <acronym class="acronym">PQF</acronym> queries are considered equivalent (assuming the default
      configuration has not been altered):
      </p><pre class="screen">
       Z&gt; find  @attr 1=Body-of-text serenade
       Z&gt; find  @attr 1=bodyoftext serenade
       Z&gt; find  @attr 1=BodyOfText serenade
       Z&gt; find  @attr 1=bO-d-Y-of-tE-x-t serenade
       Z&gt; find  @attr 1=1010 serenade
       Z&gt; find  @attrset bib1 @attr 1=1010 serenade
       Z&gt; find  @attrset bib1 @attr 1=1010 serenade
       Z&gt; find  @attrset Bib1 @attr 1=1010 serenade
       Z&gt; find  @attrset b-I-b-1 @attr 1=1010 serenade
      </pre><p>
     </p><p>
      The <span class="emphasis"><em>numerical</em></span>
      <code class="literal">use attributes (type 1)</code>
      are interpreted according to the
      attribute sets which have been loaded in the
      <code class="literal">zebra.cfg</code> file, and are matched against specific
      fields as specified in the <code class="literal">.abs</code> file which
      describes the profile of the records which have been loaded.
      If no use attribute is provided, a default of
      <acronym class="acronym">BIB-1</acronym> Use Any (1016) is assumed.
      The predefined use attribute sets
      can be reconfigured by  tweaking the configuration files
      <code class="filename">tab/*.att</code>, and
      new attribute sets can be defined by adding similar files in the
      configuration path <code class="literal">profilePath</code> of the server.
     </p><p>
      String indexes can be accessed directly,
      independently which attribute set is in use. These are just
      ignored. The above mentioned name normalization applies.
      String index names are defined in the
      used indexing  filter configuration files, for example in the
      <acronym class="acronym">GRS-1</acronym>
      <code class="filename">*.abs</code> configuration files, or in the
      <code class="literal">alvis</code> filter <acronym class="acronym">XSLT</acronym> indexing stylesheets.
     </p><p>
      <span class="application">Zebra</span> internal indexes can be accessed directly,
      according to the same rules as the user defined
      string indexes. The only difference is that
      <span class="application">Zebra</span> internal index names are hardwired,
      all uppercase and
      must start with the character <code class="literal">'_'</code>.
     </p><p>
      Finally, <acronym class="acronym">XPATH</acronym> access points are only
      available using the <acronym class="acronym">GRS-1</acronym> filter for indexing.
      These access point names must start with the character
      <code class="literal">'/'</code>, they are <span class="emphasis"><em>not
       normalized</em></span>, but passed unaltered to the <span class="application">Zebra</span> internal
      <acronym class="acronym">XPATH</acronym> engine. See <a class="xref" href="querymodel-rpn.html#querymodel-use-xpath" title="2.1.6.Zebra's special access point of type 'XPath' for GRS-1 filters">Section2.1.6, &#8220;<span class="application">Zebra</span>'s special access point of type 'XPath'
      for <acronym class="acronym">GRS-1</acronym> filters&#8221;</a>.

     </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-pqf-apt-mapping-structuretype"></a>3.5.2.Mapping of <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> structure and completeness to
      register type</h4></div></div></div><p>
      Internally <span class="application">Zebra</span> has in its default configuration several
      different types of registers or indexes, whose tokenization and
      character normalization rules differ. This reflects the fact that
      searching fundamental different tokens like dates, numbers,
      bitfields and string based text needs different rule sets.
     </p><div class="table"><a name="querymodel-zebra-mapping-structure-types"></a><p class="title"><b>Table5.13.Structure and completeness mapping to register types</b></p><div class="table-contents"><table class="table" summary="Structure and completeness mapping to register types" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Structure</th><th>Completeness</th><th>Register type</th><th>Notes</th></tr></thead><tbody><tr><td>
          phrase (@attr 4=1), word (@attr 4=2),
          word-list (@attr 4=6),
          free-form-text  (@attr 4=105), or document-text (@attr 4=106)
         </td><td>Incomplete field (@attr 6=1)</td><td>Word ('w')</td><td>Traditional tokenized and character normalized word index</td></tr><tr><td>
          phrase (@attr 4=1), word (@attr 4=2),
          word-list (@attr 4=6),
          free-form-text  (@attr 4=105), or document-text (@attr 4=106)
         </td><td>complete field' (@attr 6=3)</td><td>Phrase ('p')</td><td>Character normalized, but not tokenized index for phrase
          matches
         </td></tr><tr><td>urx (@attr 4=104)</td><td>ignored</td><td>URX/URL ('u')</td><td>Special index for URL web addresses</td></tr><tr><td>numeric (@attr 4=109)</td><td>ignored</td><td>Numeric ('n')</td><td>Special index for digital numbers</td></tr><tr><td>key (@attr 4=3)</td><td>ignored</td><td>Null bitmap ('0')</td><td>Used for non-tokenized and non-normalized bit sequences</td></tr><tr><td>year (@attr 4=4)</td><td>ignored</td><td>Year ('y')</td><td>Non-tokenized and non-normalized 4 digit numbers</td></tr><tr><td>date (@attr 4=5)</td><td>ignored</td><td>Date ('d')</td><td>Non-tokenized and non-normalized ISO date strings</td></tr><tr><td>ignored</td><td>ignored</td><td>Sort ('s')</td><td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td></tr><tr><td>overruled</td><td>overruled</td><td>special</td><td>Internal record ID register, used whenever
	  Relation Always Matches (@attr 2=103) is specified</td></tr></tbody></table></div></div><br class="table-break"><p>
      If a <span class="emphasis"><em>Structure</em></span> attribute of
      <span class="emphasis"><em>Phrase</em></span> is used in conjunction with a
      <span class="emphasis"><em>Completeness</em></span> attribute of
      <span class="emphasis"><em>Complete (Sub)field</em></span>, the term is matched
      against the contents of the phrase (long word) register, if one
      exists for the given <span class="emphasis"><em>Use</em></span> attribute.
      A phrase register is created for those fields in the
      <acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file that contains a
      <code class="literal">p</code>-specifier.
      </p><pre class="screen">
       Z&gt; scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
       ...
       bayreuther festspiele (1)
       * beethoven bibliography database (1)
       benny carter (1)
       ...
       Z&gt; find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
       ...
       Number of hits: 0, setno 5
       ...
       Z&gt; find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
       ...
       Number of hits: 1, setno 6
      </pre><p>
     </p><p>
      If <span class="emphasis"><em>Structure</em></span>=<span class="emphasis"><em>Phrase</em></span> is
      used in conjunction with <span class="emphasis"><em>Incomplete Field</em></span> - the
      default value for <span class="emphasis"><em>Completeness</em></span>, the
      search is directed against the normal word registers, but if the term
      contains multiple words, the term will only match if all of the words
      are found immediately adjacent, and in the given order.
      The word search is performed on those fields that are indexed as
      type <code class="literal">w</code> in the <acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file.
      </p><pre class="screen">
       Z&gt; scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
       ...
       beefheart (1)
       * beethoven (18)
       beethovens (7)
       ...
       Z&gt; find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
       ...
       Number of hits: 18, setno 1
       ...
       Z&gt; find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven  bibliography"
       ...
       Number of hits: 2, setno 2
       ...
      </pre><p>
     </p><p>
      If the <span class="emphasis"><em>Structure</em></span> attribute is
      <span class="emphasis"><em>Word List</em></span>,
      <span class="emphasis"><em>Free-form Text</em></span>, or
      <span class="emphasis"><em>Document Text</em></span>, the term is treated as a
      natural-language, relevance-ranked query.
      This search type uses the word register, i.e. those fields
      that are indexed as type <code class="literal">w</code> in the
      <acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file.
     </p><p>
      If the <span class="emphasis"><em>Structure</em></span> attribute is
      <span class="emphasis"><em>Numeric String</em></span> the term is treated as an integer.
      The search is performed on those fields that are indexed
      as type <code class="literal">n</code> in the <acronym class="acronym">GRS-1</acronym>
      <code class="filename">*.abs</code> file.
     </p><p>
      If the <span class="emphasis"><em>Structure</em></span> attribute is
      <span class="emphasis"><em>URX</em></span> the term is treated as a URX (URL) entity.
      The search is performed on those fields that are indexed as type
      <code class="literal">u</code> in the <code class="filename">*.abs</code> file.
     </p><p>
      If the <span class="emphasis"><em>Structure</em></span> attribute is
      <span class="emphasis"><em>Local Number</em></span> the term is treated as
      native <span class="application">Zebra</span> Record Identifier.
     </p><p>
      If the <span class="emphasis"><em>Relation</em></span> attribute is
      <span class="emphasis"><em>Equals</em></span> (default), the term is matched
      in a normal fashion (modulo truncation and processing of
      individual words, if required).
      If <span class="emphasis"><em>Relation</em></span> is <span class="emphasis"><em>Less Than</em></span>,
      <span class="emphasis"><em>Less Than or Equal</em></span>,
      <span class="emphasis"><em>Greater than</em></span>, or <span class="emphasis"><em>Greater than or
       Equal</em></span>, the term is assumed to be numerical, and a
      standard regular expression is constructed to match the given
      expression.
      If <span class="emphasis"><em>Relation</em></span> is <span class="emphasis"><em>Relevance</em></span>,
      the standard natural-language query processor is invoked.
     </p><p>
      For the <span class="emphasis"><em>Truncation</em></span> attribute,
      <span class="emphasis"><em>No Truncation</em></span> is the default.
      <span class="emphasis"><em>Left Truncation</em></span> is not supported.
      <span class="emphasis"><em>Process # in search term</em></span> is supported, as is
      <span class="emphasis"><em>Regxp-1</em></span>.
      <span class="emphasis"><em>Regxp-2</em></span> enables the fault-tolerant (fuzzy)
      search. As a default, a single error (deletion, insertion,
      replacement) is accepted when terms are matched against the register
      contents.
     </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-regular"></a>3.6.<span class="application">Zebra</span> Regular Expressions in Truncation Attribute (type = 5)</h3></div></div></div><p>
     Each term in a query is interpreted as a regular expression if
     the truncation value is either <span class="emphasis"><em>Regxp-1 (@attr 5=102)</em></span>
     or <span class="emphasis"><em>Regxp-2 (@attr 5=103)</em></span>.
     Both query types follow the same syntax with the operands:
    </p><div class="table"><a name="querymodel-regular-operands-table"></a><p class="title"><b>Table5.14.Regular Expression Operands</b></p><div class="table-contents"><table class="table" summary="Regular Expression Operands" border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">x</code></td><td>Matches the character <code class="literal">x</code>.</td></tr><tr><td><code class="literal">.</code></td><td>Matches any character.</td></tr><tr><td><code class="literal">[ .. ]</code></td><td>Matches the set of characters specified;
         such as <code class="literal">[abc]</code> or <code class="literal">[a-c]</code>.</td></tr></tbody></table></div></div><br class="table-break"><p>
     The above operands can be combined with the following operators:
    </p><div class="table"><a name="querymodel-regular-operators-table"></a><p class="title"><b>Table5.15.Regular Expression Operators</b></p><div class="table-contents"><table class="table" summary="Regular Expression Operators" border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">x*</code></td><td>Matches <code class="literal">x</code> zero or more times.
	 Priority: high.</td></tr><tr><td><code class="literal">x+</code></td><td>Matches <code class="literal">x</code> one or more times.
	 Priority: high.</td></tr><tr><td><code class="literal">x?</code></td><td> Matches <code class="literal">x</code> zero or once.
	 Priority: high.</td></tr><tr><td><code class="literal">xy</code></td><td> Matches <code class="literal">x</code>, then <code class="literal">y</code>.
         Priority: medium.</td></tr><tr><td><code class="literal">x|y</code></td><td> Matches either <code class="literal">x</code> or <code class="literal">y</code>.
         Priority: low.</td></tr><tr><td><code class="literal">( )</code></td><td>The order of evaluation may be changed by using parentheses.</td></tr></tbody></table></div></div><br class="table-break"><p>
     If the first character of the <code class="literal">Regxp-2</code> query
     is a plus character (<code class="literal">+</code>) it marks the
     beginning of a section with non-standard specifiers.
     The next plus character marks the end of the section.
     Currently <span class="application">Zebra</span> only supports one specifier, the error tolerance,
     which consists one digit.
     
    </p><p>
     Since the plus operator is normally a suffix operator the addition to
     the query syntax doesn't violate the syntax for standard regular
     expressions.
    </p><p>
     For example, a phrase search with regular expressions  in
     the title-register is performed like this:
     </p><pre class="screen">
      Z&gt; find @attr 1=4 @attr 5=102 "informat.* retrieval"
     </pre><p>
    </p><p>
     Combinations with other attributes are possible. For example, a
     ranked search with a regular expression:
     </p><pre class="screen">
      Z&gt; find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
     </pre><p>
    </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="querymodel-rpn.html">Prev</a></td><td width="20%" align="center"><a accesskey="u" href="querymodel.html">Up</a></td><td width="40%" align="right"><a accesskey="n" href="querymodel-cql-to-pqf.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">2.<acronym class="acronym">RPN</acronym> queries and semantics</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">4.Server Side <acronym class="acronym">CQL</acronym> to <acronym class="acronym">PQF</acronym> Query Translation</td></tr></table></div></body></html>