File: file-ids.html

package info (click to toggle)
idzebra 2.2.8-2
links: PTS, VCS
area: main
in suites: forky, sid
size: 10,572 kB
sloc: ansic: 54,389; xml: 27,058; sh: 5,892; makefile: 1,102; perl: 210; tcl: 64
file content (64 lines) | stat: -rw-r--r-- 5,577 bytes
parent folder | download | duplicates (3)
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>5.Indexing with File Record IDs</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="Zebra - User's Guide and Reference"><link rel="up" href="administration.html" title="Chapter6.Administrating Zebra"><link rel="prev" href="simple-indexing.html" title="4.Indexing with no Record IDs (Simple Indexing)"><link rel="next" href="generic-ids.html" title="6.Indexing with General Record IDs"></head><body><link rel="stylesheet" type="text/css" href="common/style1.css"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">5.Indexing with File Record IDs</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="simple-indexing.html">Prev</a></td><th width="60%" align="center">Chapter6.Administrating <span class="application">Zebra</span></th><td width="20%" align="right"><a accesskey="n" href="generic-ids.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="file-ids"></a>5.Indexing with File Record IDs</h2></div></div></div><p>
    If you have a set of files that regularly change over time: Old files
    are deleted, new ones are added, or existing files are modified, you
    can benefit from using the <span class="emphasis"><em>file ID</em></span>
    indexing methodology.
    Examples of this type of database might include an index of WWW
    resources, or a USENET news spool area.
    Briefly speaking, the file key methodology uses the directory paths
    of the individual records as a unique identifier for each record.
    To perform indexing of a directory with file keys, again, you specify
    the top-level directory after the <code class="literal">update</code> command.
    The command will recursively traverse the directories and compare
    each one with whatever have been indexed before in that same directory.
    If a file is new (not in the previous version of the directory) it
    is inserted into the registers; if a file was already indexed and
    it has been modified since the last update, the index is also
    modified; if a file has been removed since the last
    visit, it is deleted from the index.
   </p><p>
    The resulting system is easy to administrate. To delete a record you
    simply have to delete the corresponding file (say, with the
    <code class="literal">rm</code> command). And to add records you create new
    files (or directories with files). For your changes to take effect
    in the register you must run <code class="literal">zebraidx update</code> with
    the same directory root again. This mode of operation requires more
    disk space than simpler indexing methods, but it makes it easier for
    you to keep the index in sync with a frequently changing set of data.
    If you combine this system with the <span class="emphasis"><em>safe update</em></span>
    facility (see below), you never have to take your server off-line for
    maintenance or register updating purposes.
   </p><p>
    To enable indexing with pathname IDs, you must specify
    <code class="literal">file</code> as the value of <code class="literal">recordId</code>
    in the configuration file. In addition, you should set
    <code class="literal">storeKeys</code> to <code class="literal">1</code>, since the <span class="application">Zebra</span>
    indexer must save additional information about the contents of each record
    in order to modify the indexes correctly at a later time.
   </p><p>
    For example, to update records of group <code class="literal">esdd</code>
    located below
    <code class="literal">/data1/records/</code> you should type:
    </p><pre class="screen">
     $ zebraidx -g esdd update /data1/records
    </pre><p>
   </p><p>
    The corresponding configuration file includes:
    </p><pre class="screen">
     esdd.recordId: file
     esdd.recordType: grs.sgml
     esdd.storeKeys: 1
    </pre><p>
   </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You cannot start out with a group of records with simple
     indexing (no record IDs as in the previous section) and then later
     enable file record Ids. <span class="application">Zebra</span> must know from the first time that you
     index the group that
     the files should be indexed with file record IDs.
    </p></div><p>
    You cannot explicitly delete records when using this method (using the
    <code class="literal">delete</code> command to <code class="literal">zebraidx</code>. Instead
    you have to delete the files from the file system (or move them to a
    different location)
    and then run <code class="literal">zebraidx</code> with the
    <code class="literal">update</code> command.
   </p></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="simple-indexing.html">Prev</a></td><td width="20%" align="center"><a accesskey="u" href="administration.html">Up</a></td><td width="40%" align="right"><a accesskey="n" href="generic-ids.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">4.Indexing with no Record IDs (Simple Indexing)</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">6.Indexing with General Record IDs</td></tr></table></div></body></html>