1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492
|
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>3.Extended Zebra RPN Features</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="Zebra - User's Guide and Reference"><link rel="up" href="querymodel.html" title="Chapter5.Query Model"><link rel="prev" href="querymodel-rpn.html" title="2.RPN queries and semantics"><link rel="next" href="querymodel-cql-to-pqf.html" title="4.Server Side CQL to PQF Query Translation"></head><body><link rel="stylesheet" type="text/css" href="common/style1.css"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.Extended <span class="application">Zebra</span> <acronym class="acronym">RPN</acronym> Features</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="querymodel-rpn.html">Prev</a></td><th width="60%" align="center">Chapter5.Query Model</th><td width="20%" align="right"><a accesskey="n" href="querymodel-cql-to-pqf.html">Next</a></td></tr></table><hr></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="querymodel-zebra"></a>3.Extended <span class="application">Zebra</span> <acronym class="acronym">RPN</acronym> Features</h2></div></div></div><p>
The <span class="application">Zebra</span> internal query engine has been extended to specific needs
not covered by the <code class="literal">bib-1</code> attribute set query
model. These extensions are <span class="emphasis"><em>non-standard</em></span>
and <span class="emphasis"><em>non-portable</em></span>: most functional extensions
are modeled over the <code class="literal">bib-1</code> attribute set,
defining type 7 and higher values.
There are also the special
<code class="literal">string</code> type index names for the
<code class="literal">idxpath</code> attribute set.
</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-allrecords"></a>3.1.<span class="application">Zebra</span> specific retrieval of all records</h3></div></div></div><p>
<span class="application">Zebra</span> defines a hardwired <code class="literal">string</code> index name
called <code class="literal">_ALLRECORDS</code>. It matches any record
contained in the database, if used in conjunction with
the relation attribute
<code class="literal">AlwaysMatches (103)</code>.
</p><p>
The <code class="literal">_ALLRECORDS</code> index name is used for total database
export. The search term is ignored, it may be empty.
</p><pre class="screen">
Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
</pre><p>
</p><p>
Combination with other index types can be made. For example, to
find all records which are <span class="emphasis"><em>not</em></span> indexed in
the <code class="literal">Title</code> register, issue one of the two
equivalent queries:
</p><pre class="screen">
Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
</pre><p>
</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
The special string index <code class="literal">_ALLRECORDS</code> is
experimental, and the provided functionality and syntax may very
well change in future releases of <span class="application">Zebra</span>.
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-search"></a>3.2.<span class="application">Zebra</span> specific Search Extensions to all Attribute Sets</h3></div></div></div><p>
<span class="application">Zebra</span> extends the <acronym class="acronym">BIB-1</acronym> attribute types, and these extensions are
recognized regardless of attribute
set used in a <code class="literal">search</code> operation query.
</p><div class="table"><a name="querymodel-zebra-attr-search-table"></a><p class="title"><b>Table5.9.<span class="application">Zebra</span> Search Attribute Extensions</b></p><div class="table-contents"><table class="table" summary="Zebra Search Attribute Extensions" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Name</th><th>Value</th><th>Operation</th><th><span class="application">Zebra</span> version</th></tr></thead><tbody><tr><td>Embedded Sort</td><td>7</td><td>search</td><td>1.1</td></tr><tr><td>Term Set</td><td>8</td><td>search</td><td>1.1</td></tr><tr><td>Rank Weight</td><td>9</td><td>search</td><td>1.1</td></tr><tr><td>Term Reference</td><td>10</td><td>search</td><td>1.4</td></tr><tr><td>Local Approx Limit</td><td>11</td><td>search</td><td>1.4</td></tr><tr><td>Global Approx Limit</td><td>12</td><td>search</td><td>2.0.8</td></tr><tr><td>Maximum number of truncated terms (truncmax)</td><td>13</td><td>search</td><td>2.0.10</td></tr><tr><td>
Specifies whether un-indexed fields should be ignored.
A zero value (default) throws a diagnostic when an un-indexed
field is specified. A non-zero value makes it return 0 hits.
</td><td>14</td><td>search</td><td>2.0.16</td></tr></tbody></table></div></div><br class="table-break"><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-sorting"></a>3.2.1.<span class="application">Zebra</span> Extension Embedded Sort Attribute (type 7)</h4></div></div></div><p>
The embedded sort is a way to specify sort within a query - thus
removing the need to send a Sort Request separately. It is both
faster and does not require clients to deal with the Sort
Facility.
</p><p>
All ordering operations are based on a lexicographical ordering,
<span class="emphasis"><em>except</em></span> when the
<code class="literal">structure attribute numeric (109)</code> is used. In
this case, ordering is numerical. See
<a class="xref" href="querymodel-rpn.html#querymodel-bib1-structure" title="2.4.3.Structure Attributes (type 4)">Section2.4.3, “Structure Attributes (type 4)”</a>.
</p><p>
The possible values after attribute <code class="literal">type 7</code> are
<code class="literal">1</code> ascending and
<code class="literal">2</code> descending.
The attributes+term (<acronym class="acronym">APT</acronym>) node is separate from the
rest and must be <code class="literal">@or</code>'ed.
The term associated with <acronym class="acronym">APT</acronym> is the sorting level in integers,
where <code class="literal">0</code> means primary sort,
<code class="literal">1</code> means secondary sort, and so forth.
See also <a class="xref" href="administration-ranking.html" title="9.Relevance Ranking and Sorting of Result Sets">Section9, “Relevance Ranking and Sorting of Result Sets”</a>.
</p><p>
For example, searching for water, sort by title (ascending)
</p><pre class="screen">
Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
</pre><p>
</p><p>
Or, searching for water, sort by title ascending, then date descending
</p><pre class="screen">
Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
</pre><p>
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-weight"></a>3.2.2.<span class="application">Zebra</span> Extension Rank Weight Attribute (type 9)</h4></div></div></div><p>
Rank weight is a way to pass a value to a ranking algorithm - so
that one <acronym class="acronym">APT</acronym> has one value - while another as a different one.
See also <a class="xref" href="administration-ranking.html" title="9.Relevance Ranking and Sorting of Result Sets">Section9, “Relevance Ranking and Sorting of Result Sets”</a>.
</p><p>
For example, searching for utah in title with weight 30 as well
as any with weight 20:
</p><pre class="screen">
Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
</pre><p>
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-termref"></a>3.2.3.<span class="application">Zebra</span> Extension Term Reference Attribute (type 10)</h4></div></div></div><p>
<span class="application">Zebra</span> supports the searchResult-1 facility.
If the Term Reference Attribute (type 10) is
given, that specifies a subqueryId value returned as part of the
search result. It is a way for a client to name an <acronym class="acronym">APT</acronym> part of a
query.
</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
Experimental. Do not use in production code.
</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-local-attr-limit"></a>3.2.4.Local Approximative Limit Attribute (type 11)</h4></div></div></div><p>
<span class="application">Zebra</span> computes - unless otherwise configured -
the exact hit count for every <acronym class="acronym">APT</acronym>
(leaf) in the query tree. These hit counts are returned as part of
the searchResult-1 facility in the binary encoded <acronym class="acronym">Z39.50</acronym> search
response packages.
</p><p>
By setting an estimation limit size of the resultset of the <acronym class="acronym">APT</acronym>
leaves, <span class="application">Zebra</span> stops processing the result set when the limit
length is reached.
Hit counts under this limit are still precise, but hit counts over it
are estimated using the statistics gathered from the chopped
result set.
</p><p>
Specifying a limit of <code class="literal">0</code> results in exact hit counts.
</p><p>
For example, we might be interested in exact hit count for a, but
for b we allow hit count estimates for 1000 and higher.
</p><pre class="screen">
Z> find @and a @attr 11=1000 b
</pre><p>
</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
The estimated hit count facility makes searches faster, as one
only needs to process large hit lists partially.
It is mostly used in huge databases, where you you want trade
exactness of hit counts against speed of execution.
</p></div><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
Do not use approximative hit count limits
in conjunction with relevance ranking, as re-sorting of the
result set only works when the entire result set has
been processed.
</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-global-attr-limit"></a>3.2.5.Global Approximative Limit Attribute (type 12)</h4></div></div></div><p>
By default <span class="application">Zebra</span> computes precise hit counts for a query as
a whole. Setting attribute 12 makes it perform approximative
hit counts instead. It has the same semantics as
<code class="literal">estimatehits</code> for the <a class="xref" href="zebra-cfg.html" title="2.The Zebra Configuration File">Section2, “The <span class="application">Zebra</span> Configuration File”</a>.
</p><p>
The attribute (12) can occur anywhere in the query tree.
Unlike regular attributes it does not relate to the leaf (<acronym class="acronym">APT</acronym>)
- but to the whole query.
</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
Do not use approximative hit count limits
in conjunction with relevance ranking, as re-sorting of the
result set only works when the entire result set has
been processed.
</p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-zebra-attr-scan"></a>3.3.<span class="application">Zebra</span> specific Scan Extensions to all Attribute Sets</h3></div></div></div><p>
<span class="application">Zebra</span> extends the Bib1 attribute types, and these extensions are
recognized regardless of attribute
set used in a scan operation query.
</p><div class="table"><a name="querymodel-zebra-attr-scan-table"></a><p class="title"><b>Table5.10.<span class="application">Zebra</span> Scan Attribute Extensions</b></p><div class="table-contents"><table class="table" summary="Zebra Scan Attribute Extensions" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Name</th><th>Type</th><th>Operation</th><th><span class="application">Zebra</span> version</th></tr></thead><tbody><tr><td>Result Set Narrow</td><td>8</td><td>scan</td><td>1.3</td></tr><tr><td>Approximative Limit</td><td>12</td><td>scan</td><td>2.0.20</td></tr></tbody></table></div></div><br class="table-break"><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-narrow"></a>3.3.1.<span class="application">Zebra</span> Extension Result Set Narrow (type 8)</h4></div></div></div><p>
If attribute Result Set Narrow (type 8)
is given for scan, the value is the name of a
result set. Each hit count in scan is
<code class="literal">@and</code>'ed with the result set given.
</p><p>
Consider for example
the case of scanning all title fields around the
scanterm <span class="emphasis"><em>mozart</em></span>, then refining the scan by
issuing a filtering query for <span class="emphasis"><em>amadeus</em></span> to
restrict the scan to the result set of the query:
</p><pre class="screen">
Z> scan @attr 1=4 mozart
...
* mozart (43)
mozartforskningen (1)
mozartiana (1)
mozarts (16)
...
Z> f @attr 1=4 amadeus
...
Number of hits: 15, setno 2
...
Z> scan @attr 1=4 @attr 8=2 mozart
...
* mozart (14)
mozartforskningen (0)
mozartiana (0)
mozarts (1)
...
</pre><p>
</p><p>
<span class="application">Zebra</span> 2.0.2 and later is able to skip 0 hit counts. This, however,
is known not to scale if the number of terms to skip is high.
This most likely will happen if the result set is small (and
result in many 0 hits).
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-zebra-attr-approx"></a>3.3.2.<span class="application">Zebra</span> Extension Approximative Limit (type 12)</h4></div></div></div><p>
The <span class="application">Zebra</span> Extension Approximative Limit (type 12) is a way to
enable approximate hit counts for scan hit counts, in the same
way as for search hit counts.
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-idxpath"></a>3.4.<span class="application">Zebra</span> special <acronym class="acronym">IDXPATH</acronym> Attribute Set for <acronym class="acronym">GRS-1</acronym> indexing</h3></div></div></div><p>
The attribute-set <code class="literal">idxpath</code> consists of a single
Use (type 1) attribute. All non-use attributes behave as normal.
</p><p>
This feature is enabled when defining the
<code class="literal">xpath enable</code> option in the <acronym class="acronym">GRS-1</acronym> filter
<code class="filename">*.abs</code> configuration files. If one wants to use
the special <code class="literal">idxpath</code> numeric attribute set, the
main <span class="application">Zebra</span> configuration file <code class="filename">zebra.cfg</code>
directive <code class="literal">attset: idxpath.att</code> must be enabled.
</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
The <code class="literal">idxpath</code> is deprecated, may not be
supported in future <span class="application">Zebra</span> versions, and should definitely
not be used in production code.
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-idxpath-use"></a>3.4.1.<acronym class="acronym">IDXPATH</acronym> Use Attributes (type = 1)</h4></div></div></div><p>
This attribute set allows one to search <acronym class="acronym">GRS-1</acronym> filter indexed
records by <acronym class="acronym">XPATH</acronym> like structured index names.
</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
The <code class="literal">idxpath</code> option defines hard-coded
index names, which might clash with your own index names.
</p></div><div class="table"><a name="querymodel-idxpath-use-table"></a><p class="title"><b>Table5.11.<span class="application">Zebra</span> specific <acronym class="acronym">IDXPATH</acronym> Use Attributes (type 1)</b></p><div class="table-contents"><table class="table" summary="Zebra specific IDXPATH Use Attributes (type 1)" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th><acronym class="acronym">IDXPATH</acronym></th><th>Value</th><th>String Index</th><th>Notes</th></tr></thead><tbody><tr><td><acronym class="acronym">XPATH</acronym> Begin</td><td>1</td><td>_XPATH_BEGIN</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> End</td><td>2</td><td>_XPATH_END</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> CData</td><td>1016</td><td>_XPATH_CDATA</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> Attribute Name</td><td>3</td><td>_XPATH_ATTR_NAME</td><td>deprecated</td></tr><tr><td><acronym class="acronym">XPATH</acronym> Attribute CData</td><td>1015</td><td>_XPATH_ATTR_CDATA</td><td>deprecated</td></tr></tbody></table></div></div><br class="table-break"><p>
See <code class="filename">tab/idxpath.att</code> for more information.
</p><p>
Search for all documents starting with root element
<code class="literal">/root</code> (either using the numeric or the string
use attributes):
</p><pre class="screen">
Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
Z> find @attr idxpath 1=1 @attr 4=3 root/
Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
</pre><p>
</p><p>
Search for all documents where specific nested <acronym class="acronym">XPATH</acronym>
<code class="literal">/c1/c2/../cn</code> exists. Notice the very
counter-intuitive <span class="emphasis"><em>reverse</em></span> notation!
</p><pre class="screen">
Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
</pre><p>
</p><p>
Search for CDATA string <span class="emphasis"><em>text</em></span> in any element
</p><pre class="screen">
Z> find @attrset idxpath @attr 1=1016 text
Z> find @attr 1=_XPATH_CDATA text
</pre><p>
</p><p>
Search for CDATA string <span class="emphasis"><em>anothertext</em></span> in any
attribute:
</p><pre class="screen">
Z> find @attrset idxpath @attr 1=1015 anothertext
Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
</pre><p>
</p><p>
Search for all documents with have an <acronym class="acronym">XML</acronym> element node
including an <acronym class="acronym">XML</acronym> attribute named <span class="emphasis"><em>creator</em></span>
</p><pre class="screen">
Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
</pre><p>
</p><p>
Combining usual <code class="literal">bib-1</code> attribute set searches
with <code class="literal">idxpath</code> attribute set searches:
</p><pre class="screen">
Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
</pre><p>
</p><p>
Scanning is supported on all <code class="literal">idxpath</code>
indexes, both specified as numeric use attributes, or as string
index names.
</p><pre class="screen">
Z> scan @attrset idxpath @attr 1=1016 text
Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext
Z> scan @attrset idxpath @attr 1=3 @attr 4=3 ''
</pre><p>
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-pqf-apt-mapping"></a>3.5.Mapping from <acronym class="acronym">PQF</acronym> atomic <acronym class="acronym">APT</acronym> queries to <span class="application">Zebra</span> internal
register indexes</h3></div></div></div><p>
The rules for <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> mapping are rather tricky to grasp in the
first place. We deal first with the rules for deciding which
internal register or string index to use, according to the use
attribute or access point specified in the query. Thereafter we
deal with the rules for determining the correct structure type of
the named register.
</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-pqf-apt-mapping-accesspoint"></a>3.5.1.Mapping of <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> access points</h4></div></div></div><p>
<span class="application">Zebra</span> understands four fundamental different types of access
points, of which only the
<span class="emphasis"><em>numeric use attribute</em></span> type access points
are defined by the <a class="ulink" href="https://www.loc.gov/z3950/agency/" target="_top"><acronym class="acronym">Z39.50</acronym></a>
standard.
All other access point types are <span class="application">Zebra</span> specific, and non-portable.
</p><div class="table"><a name="querymodel-zebra-mapping-accesspoint-types"></a><p class="title"><b>Table5.12.Access point name mapping</b></p><div class="table-contents"><table class="table" summary="Access point name mapping" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Access Point</th><th>Type</th><th>Grammar</th><th>Notes</th></tr></thead><tbody><tr><td>Use attribute</td><td>numeric</td><td>[1-9][1-9]*</td><td>directly mapped to string index name</td></tr><tr><td>String index name</td><td>string</td><td>[a-zA-Z](\-?[a-zA-Z0-9])*</td><td>normalized name is used as internal string index name</td></tr><tr><td><span class="application">Zebra</span> internal index name</td><td>zebra</td><td>_[a-zA-Z](_?[a-zA-Z0-9])*</td><td>hardwired internal string index name</td></tr><tr><td><acronym class="acronym">XPATH</acronym> special index</td><td>XPath</td><td>/.*</td><td>special xpath search for <acronym class="acronym">GRS-1</acronym> indexed records</td></tr></tbody></table></div></div><br class="table-break"><p>
<code class="literal">Attribute set names</code> and
<code class="literal">string index names</code> are normalizes
according to the following rules: all <span class="emphasis"><em>single</em></span>
hyphens <code class="literal">'-'</code> are stripped, and all upper case
letters are folded to lower case.
</p><p>
<span class="emphasis"><em>Numeric use attributes</em></span> are mapped
to the <span class="application">Zebra</span> internal
string index according to the attribute set definition in use.
The default attribute set is <acronym class="acronym">BIB-1</acronym>, and may be
omitted in the <acronym class="acronym">PQF</acronym> query.
</p><p>
According to normalization and numeric
use attribute mapping, it follows that the following
<acronym class="acronym">PQF</acronym> queries are considered equivalent (assuming the default
configuration has not been altered):
</p><pre class="screen">
Z> find @attr 1=Body-of-text serenade
Z> find @attr 1=bodyoftext serenade
Z> find @attr 1=BodyOfText serenade
Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
Z> find @attr 1=1010 serenade
Z> find @attrset bib1 @attr 1=1010 serenade
Z> find @attrset bib1 @attr 1=1010 serenade
Z> find @attrset Bib1 @attr 1=1010 serenade
Z> find @attrset b-I-b-1 @attr 1=1010 serenade
</pre><p>
</p><p>
The <span class="emphasis"><em>numerical</em></span>
<code class="literal">use attributes (type 1)</code>
are interpreted according to the
attribute sets which have been loaded in the
<code class="literal">zebra.cfg</code> file, and are matched against specific
fields as specified in the <code class="literal">.abs</code> file which
describes the profile of the records which have been loaded.
If no use attribute is provided, a default of
<acronym class="acronym">BIB-1</acronym> Use Any (1016) is assumed.
The predefined use attribute sets
can be reconfigured by tweaking the configuration files
<code class="filename">tab/*.att</code>, and
new attribute sets can be defined by adding similar files in the
configuration path <code class="literal">profilePath</code> of the server.
</p><p>
String indexes can be accessed directly,
independently which attribute set is in use. These are just
ignored. The above mentioned name normalization applies.
String index names are defined in the
used indexing filter configuration files, for example in the
<acronym class="acronym">GRS-1</acronym>
<code class="filename">*.abs</code> configuration files, or in the
<code class="literal">alvis</code> filter <acronym class="acronym">XSLT</acronym> indexing stylesheets.
</p><p>
<span class="application">Zebra</span> internal indexes can be accessed directly,
according to the same rules as the user defined
string indexes. The only difference is that
<span class="application">Zebra</span> internal index names are hardwired,
all uppercase and
must start with the character <code class="literal">'_'</code>.
</p><p>
Finally, <acronym class="acronym">XPATH</acronym> access points are only
available using the <acronym class="acronym">GRS-1</acronym> filter for indexing.
These access point names must start with the character
<code class="literal">'/'</code>, they are <span class="emphasis"><em>not
normalized</em></span>, but passed unaltered to the <span class="application">Zebra</span> internal
<acronym class="acronym">XPATH</acronym> engine. See <a class="xref" href="querymodel-rpn.html#querymodel-use-xpath" title="2.1.6.Zebra's special access point of type 'XPath' for GRS-1 filters">Section2.1.6, “<span class="application">Zebra</span>'s special access point of type 'XPath'
for <acronym class="acronym">GRS-1</acronym> filters”</a>.
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="querymodel-pqf-apt-mapping-structuretype"></a>3.5.2.Mapping of <acronym class="acronym">PQF</acronym> <acronym class="acronym">APT</acronym> structure and completeness to
register type</h4></div></div></div><p>
Internally <span class="application">Zebra</span> has in its default configuration several
different types of registers or indexes, whose tokenization and
character normalization rules differ. This reflects the fact that
searching fundamental different tokens like dates, numbers,
bitfields and string based text needs different rule sets.
</p><div class="table"><a name="querymodel-zebra-mapping-structure-types"></a><p class="title"><b>Table5.13.Structure and completeness mapping to register types</b></p><div class="table-contents"><table class="table" summary="Structure and completeness mapping to register types" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th>Structure</th><th>Completeness</th><th>Register type</th><th>Notes</th></tr></thead><tbody><tr><td>
phrase (@attr 4=1), word (@attr 4=2),
word-list (@attr 4=6),
free-form-text (@attr 4=105), or document-text (@attr 4=106)
</td><td>Incomplete field (@attr 6=1)</td><td>Word ('w')</td><td>Traditional tokenized and character normalized word index</td></tr><tr><td>
phrase (@attr 4=1), word (@attr 4=2),
word-list (@attr 4=6),
free-form-text (@attr 4=105), or document-text (@attr 4=106)
</td><td>complete field' (@attr 6=3)</td><td>Phrase ('p')</td><td>Character normalized, but not tokenized index for phrase
matches
</td></tr><tr><td>urx (@attr 4=104)</td><td>ignored</td><td>URX/URL ('u')</td><td>Special index for URL web addresses</td></tr><tr><td>numeric (@attr 4=109)</td><td>ignored</td><td>Numeric ('n')</td><td>Special index for digital numbers</td></tr><tr><td>key (@attr 4=3)</td><td>ignored</td><td>Null bitmap ('0')</td><td>Used for non-tokenized and non-normalized bit sequences</td></tr><tr><td>year (@attr 4=4)</td><td>ignored</td><td>Year ('y')</td><td>Non-tokenized and non-normalized 4 digit numbers</td></tr><tr><td>date (@attr 4=5)</td><td>ignored</td><td>Date ('d')</td><td>Non-tokenized and non-normalized ISO date strings</td></tr><tr><td>ignored</td><td>ignored</td><td>Sort ('s')</td><td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td></tr><tr><td>overruled</td><td>overruled</td><td>special</td><td>Internal record ID register, used whenever
Relation Always Matches (@attr 2=103) is specified</td></tr></tbody></table></div></div><br class="table-break"><p>
If a <span class="emphasis"><em>Structure</em></span> attribute of
<span class="emphasis"><em>Phrase</em></span> is used in conjunction with a
<span class="emphasis"><em>Completeness</em></span> attribute of
<span class="emphasis"><em>Complete (Sub)field</em></span>, the term is matched
against the contents of the phrase (long word) register, if one
exists for the given <span class="emphasis"><em>Use</em></span> attribute.
A phrase register is created for those fields in the
<acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file that contains a
<code class="literal">p</code>-specifier.
</p><pre class="screen">
Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
...
bayreuther festspiele (1)
* beethoven bibliography database (1)
benny carter (1)
...
Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
...
Number of hits: 0, setno 5
...
Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
...
Number of hits: 1, setno 6
</pre><p>
</p><p>
If <span class="emphasis"><em>Structure</em></span>=<span class="emphasis"><em>Phrase</em></span> is
used in conjunction with <span class="emphasis"><em>Incomplete Field</em></span> - the
default value for <span class="emphasis"><em>Completeness</em></span>, the
search is directed against the normal word registers, but if the term
contains multiple words, the term will only match if all of the words
are found immediately adjacent, and in the given order.
The word search is performed on those fields that are indexed as
type <code class="literal">w</code> in the <acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file.
</p><pre class="screen">
Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
beefheart (1)
* beethoven (18)
beethovens (7)
...
Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
Number of hits: 18, setno 1
...
Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
...
Number of hits: 2, setno 2
...
</pre><p>
</p><p>
If the <span class="emphasis"><em>Structure</em></span> attribute is
<span class="emphasis"><em>Word List</em></span>,
<span class="emphasis"><em>Free-form Text</em></span>, or
<span class="emphasis"><em>Document Text</em></span>, the term is treated as a
natural-language, relevance-ranked query.
This search type uses the word register, i.e. those fields
that are indexed as type <code class="literal">w</code> in the
<acronym class="acronym">GRS-1</acronym> <code class="filename">*.abs</code> file.
</p><p>
If the <span class="emphasis"><em>Structure</em></span> attribute is
<span class="emphasis"><em>Numeric String</em></span> the term is treated as an integer.
The search is performed on those fields that are indexed
as type <code class="literal">n</code> in the <acronym class="acronym">GRS-1</acronym>
<code class="filename">*.abs</code> file.
</p><p>
If the <span class="emphasis"><em>Structure</em></span> attribute is
<span class="emphasis"><em>URX</em></span> the term is treated as a URX (URL) entity.
The search is performed on those fields that are indexed as type
<code class="literal">u</code> in the <code class="filename">*.abs</code> file.
</p><p>
If the <span class="emphasis"><em>Structure</em></span> attribute is
<span class="emphasis"><em>Local Number</em></span> the term is treated as
native <span class="application">Zebra</span> Record Identifier.
</p><p>
If the <span class="emphasis"><em>Relation</em></span> attribute is
<span class="emphasis"><em>Equals</em></span> (default), the term is matched
in a normal fashion (modulo truncation and processing of
individual words, if required).
If <span class="emphasis"><em>Relation</em></span> is <span class="emphasis"><em>Less Than</em></span>,
<span class="emphasis"><em>Less Than or Equal</em></span>,
<span class="emphasis"><em>Greater than</em></span>, or <span class="emphasis"><em>Greater than or
Equal</em></span>, the term is assumed to be numerical, and a
standard regular expression is constructed to match the given
expression.
If <span class="emphasis"><em>Relation</em></span> is <span class="emphasis"><em>Relevance</em></span>,
the standard natural-language query processor is invoked.
</p><p>
For the <span class="emphasis"><em>Truncation</em></span> attribute,
<span class="emphasis"><em>No Truncation</em></span> is the default.
<span class="emphasis"><em>Left Truncation</em></span> is not supported.
<span class="emphasis"><em>Process # in search term</em></span> is supported, as is
<span class="emphasis"><em>Regxp-1</em></span>.
<span class="emphasis"><em>Regxp-2</em></span> enables the fault-tolerant (fuzzy)
search. As a default, a single error (deletion, insertion,
replacement) is accepted when terms are matched against the register
contents.
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="querymodel-regular"></a>3.6.<span class="application">Zebra</span> Regular Expressions in Truncation Attribute (type = 5)</h3></div></div></div><p>
Each term in a query is interpreted as a regular expression if
the truncation value is either <span class="emphasis"><em>Regxp-1 (@attr 5=102)</em></span>
or <span class="emphasis"><em>Regxp-2 (@attr 5=103)</em></span>.
Both query types follow the same syntax with the operands:
</p><div class="table"><a name="querymodel-regular-operands-table"></a><p class="title"><b>Table5.14.Regular Expression Operands</b></p><div class="table-contents"><table class="table" summary="Regular Expression Operands" border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">x</code></td><td>Matches the character <code class="literal">x</code>.</td></tr><tr><td><code class="literal">.</code></td><td>Matches any character.</td></tr><tr><td><code class="literal">[ .. ]</code></td><td>Matches the set of characters specified;
such as <code class="literal">[abc]</code> or <code class="literal">[a-c]</code>.</td></tr></tbody></table></div></div><br class="table-break"><p>
The above operands can be combined with the following operators:
</p><div class="table"><a name="querymodel-regular-operators-table"></a><p class="title"><b>Table5.15.Regular Expression Operators</b></p><div class="table-contents"><table class="table" summary="Regular Expression Operators" border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">x*</code></td><td>Matches <code class="literal">x</code> zero or more times.
Priority: high.</td></tr><tr><td><code class="literal">x+</code></td><td>Matches <code class="literal">x</code> one or more times.
Priority: high.</td></tr><tr><td><code class="literal">x?</code></td><td> Matches <code class="literal">x</code> zero or once.
Priority: high.</td></tr><tr><td><code class="literal">xy</code></td><td> Matches <code class="literal">x</code>, then <code class="literal">y</code>.
Priority: medium.</td></tr><tr><td><code class="literal">x|y</code></td><td> Matches either <code class="literal">x</code> or <code class="literal">y</code>.
Priority: low.</td></tr><tr><td><code class="literal">( )</code></td><td>The order of evaluation may be changed by using parentheses.</td></tr></tbody></table></div></div><br class="table-break"><p>
If the first character of the <code class="literal">Regxp-2</code> query
is a plus character (<code class="literal">+</code>) it marks the
beginning of a section with non-standard specifiers.
The next plus character marks the end of the section.
Currently <span class="application">Zebra</span> only supports one specifier, the error tolerance,
which consists one digit.
</p><p>
Since the plus operator is normally a suffix operator the addition to
the query syntax doesn't violate the syntax for standard regular
expressions.
</p><p>
For example, a phrase search with regular expressions in
the title-register is performed like this:
</p><pre class="screen">
Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
</pre><p>
</p><p>
Combinations with other attributes are possible. For example, a
ranked search with a regular expression:
</p><pre class="screen">
Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
</pre><p>
</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="querymodel-rpn.html">Prev</a></td><td width="20%" align="center"><a accesskey="u" href="querymodel.html">Up</a></td><td width="40%" align="right"><a accesskey="n" href="querymodel-cql-to-pqf.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">2.<acronym class="acronym">RPN</acronym> queries and semantics</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">4.Server Side <acronym class="acronym">CQL</acronym> to <acronym class="acronym">PQF</acronym> Query Translation</td></tr></table></div></body></html>
|