
|
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter9.GRS-1 Record Model and Filter Modules</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="Zebra - User's Guide and Reference"><link rel="up" href="index.html" title="Zebra - User's Guide and Reference"><link rel="prev" href="record-model-alvisxslt-conf.html" title="2.ALVIS Record Model Configuration"><link rel="next" href="grs-internal-representation.html" title="2.GRS-1 Internal Record Representation"></head><body><link rel="stylesheet" type="text/css" href="common/style1.css"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter9.<acronym class="acronym">GRS-1</acronym> Record Model and Filter Modules</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="record-model-alvisxslt-conf.html">Prev</a></td><th width="60%" align="center"></th><td width="20%" align="right"><a accesskey="n" href="grs-internal-representation.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="grs"></a>Chapter9.<acronym class="acronym">GRS-1</acronym> Record Model and Filter Modules</h1></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="section"><a href="grs.html#grs-filters">1. <acronym class="acronym">GRS-1</acronym> Record Filters</a></span></dt><dd><dl><dt><span class="section"><a href="grs.html#grs-canonical-format">1.1. <acronym class="acronym">GRS-1</acronym> Canonical Input Format</a></span></dt><dd><dl><dt><span class="section"><a href="grs.html#grs-record-root">1.1.1. Record Root</a></span></dt><dt><span class="section"><a href="grs.html#grs-variants">1.1.2. Variants</a></span></dt></dl></dd><dt><span class="section"><a href="grs.html#grs-regx-tcl">1.2. <acronym class="acronym">GRS-1</acronym> REGX And TCL Input Filters</a></span></dt></dl></dd><dt><span class="section"><a href="grs-internal-representation.html">2. <acronym class="acronym">GRS-1</acronym> Internal Record Representation</a></span></dt><dd><dl><dt><span class="section"><a href="grs-internal-representation.html#grs-tagged-elements">2.1. Tagged Elements</a></span></dt><dt><span class="section"><a href="grs-internal-representation.html#grs-variant-details">2.2. Variants</a></span></dt><dt><span class="section"><a href="grs-internal-representation.html#grs-data-elements">2.3. Data Elements</a></span></dt></dl></dd><dt><span class="section"><a href="grs-conf.html">3. <acronym class="acronym">GRS-1</acronym> Record Model Configuration</a></span></dt><dd><dl><dt><span class="section"><a href="grs-conf.html#grs-abstract-syntax">3.1. The Abstract Syntax</a></span></dt><dt><span class="section"><a href="grs-conf.html#grs-configuration-files">3.2. The Configuration Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#abs-file">3.3. The Abstract Syntax (.abs) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#attset-files">3.4. The Attribute Set (.att) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#grs-tag-files">3.5. The Tag Set (.tag) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#grs-var-files">3.6. The Variant Set (.var) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#grs-est-files">3.7. The Element Set (.est) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#schema-mapping">3.8. The Schema Mapping (.map) Files</a></span></dt><dt><span class="section"><a href="grs-conf.html#grs-mar-files">3.9. The <acronym class="acronym">MARC</acronym> (ISO2709) Representation (.mar) Files</a></span></dt></dl></dd><dt><span class="section"><a href="grs-exchange-formats.html">4. <acronym class="acronym">GRS-1</acronym> Exchange Formats</a></span></dt><dt><span class="section"><a href="grs-extended-marc-indexing.html">5. Extended indexing of <acronym class="acronym">MARC</acronym> records</a></span></dt><dd><dl><dt><span class="section"><a href="grs-extended-marc-indexing.html#formula">5.1. The index-formula</a></span></dt><dt><span class="section"><a href="grs-extended-marc-indexing.html#notation">5.2. Notation of <span class="emphasis"><em>index-formula</em></span> for <span class="application">Zebra</span></a></span></dt><dd><dl><dt><span class="section"><a href="grs-extended-marc-indexing.html#grs-examples">5.2.1. Examples</a></span></dt></dl></dd></dl></dd></dl></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
The functionality of this record model has been improved and
replaced by the DOM <acronym class="acronym">XML</acronym> record model. See
<a class="xref" href="record-model-domxml.html" title="Chapter7.DOM XML Record Model and Filter Module">Chapter7, <i><acronym class="acronym">DOM</acronym> <acronym class="acronym">XML</acronym> Record Model and Filter Module</i></a>.
</p></div><p>
The record model described in this chapter applies to the fundamental,
structured
record type <code class="literal">grs</code>, introduced in
<a class="xref" href="architecture-maincomponents.html#componentmodulesgrs" title="2.5.3.GRS-1 Record Model and Filter Modules">Section2.5.3, “<acronym class="acronym">GRS-1</acronym> Record Model and Filter Modules”</a>.
</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="grs-filters"></a>1.<acronym class="acronym">GRS-1</acronym> Record Filters</h2></div></div></div><p>
Many basic subtypes of the <span class="emphasis"><em>grs</em></span> type are
currently available:
</p><p>
</p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">grs.sgml</code></span></dt><dd><p>
This is the canonical input format
described <a class="xref" href="grs.html#grs-canonical-format" title="1.1.GRS-1 Canonical Input Format">Section1.1, “<acronym class="acronym">GRS-1</acronym> Canonical Input Format”</a>. It is using
simple <acronym class="acronym">SGML</acronym>-like syntax.
</p></dd><dt><span class="term"><code class="literal">grs.marc.</code><em class="replaceable"><code>type</code></em></span></dt><dd><p>
This allows <span class="application">Zebra</span> to read
records in the ISO2709 (<acronym class="acronym">MARC</acronym>) encoding standard.
Last parameter <em class="replaceable"><code>type</code></em> names the
<code class="literal">.abs</code> file (see below)
which describes the specific <acronym class="acronym">MARC</acronym> structure of the input record as
well as the indexing rules.
</p><p>The <code class="literal">grs.marc</code> uses an internal representation
which is not <acronym class="acronym">XML</acronym> conformant. In particular <acronym class="acronym">MARC</acronym> tags are
presented as elements with the same name. And <acronym class="acronym">XML</acronym> elements
may not start with digits. Therefore this filter is only
suitable for systems returning <acronym class="acronym">GRS-1</acronym> and <acronym class="acronym">MARC</acronym> records. For <acronym class="acronym">XML</acronym>
use <code class="literal">grs.marcxml</code> filter instead (see below).
</p><p>
The loadable <code class="literal">grs.marc</code> filter module
is packaged in the GNU/Debian package
<code class="literal">libidzebra2.0-mod-grs-marc</code>
</p></dd><dt><span class="term"><code class="literal">grs.marcxml.</code><em class="replaceable"><code>type</code></em></span></dt><dd><p>
This allows <span class="application">Zebra</span> to read ISO2709 encoded records.
Last parameter <em class="replaceable"><code>type</code></em> names the
<code class="literal">.abs</code> file (see below)
which describes the specific <acronym class="acronym">MARC</acronym> structure of the input record as
well as the indexing rules.
</p><p>
The internal representation for <code class="literal">grs.marcxml</code>
is the same as for <a class="ulink" href="https://www.loc.gov/standards/marcxml/" target="_top"><acronym class="acronym">MARCXML</acronym></a>.
It slightly more complicated to work with than
<code class="literal">grs.marc</code> but <acronym class="acronym">XML</acronym> conformant.
</p><p>
The loadable <code class="literal">grs.marcxml</code> filter module
is also contained in the GNU/Debian package
<code class="literal">libidzebra2.0-mod-grs-marc</code>
</p></dd><dt><span class="term"><code class="literal">grs.xml</code></span></dt><dd><p>
This filter reads <acronym class="acronym">XML</acronym> records and uses
<a class="ulink" href="http://expat.sourceforge.net/" target="_top">Expat</a> to
parse them and convert them into ID<span class="application">Zebra</span>'s internal
<code class="literal">grs</code> record model.
Only one record per file is supported, due to the fact <acronym class="acronym">XML</acronym> does
not allow two documents to "follow" each other (there is no way
to know when a document is finished).
This filter is only available if <span class="application">Zebra</span> is compiled with EXPAT support.
</p><p>
The loadable <code class="literal">grs.xml</code> filter module
is packaged in the GNU/Debian package
<code class="literal">libidzebra2.0-mod-grs-xml</code>
</p></dd><dt><span class="term"><code class="literal">grs.regx.</code><em class="replaceable"><code>filter</code></em></span></dt><dd><p>
This enables a user-supplied Regular Expressions input
filter described in <a class="xref" href="grs.html#grs-regx-tcl" title="1.2.GRS-1 REGX And TCL Input Filters">Section1.2, “<acronym class="acronym">GRS-1</acronym> REGX And TCL Input Filters”</a>.
</p><p>
The loadable <code class="literal">grs.regx</code> filter module
is packaged in the GNU/Debian package
<code class="literal">libidzebra2.0-mod-grs-regx</code>
</p></dd><dt><span class="term"><code class="literal">grs.tcl.</code><em class="replaceable"><code>filter</code></em></span></dt><dd><p>
Similar to grs.regx but using Tcl for rules, described in
<a class="xref" href="grs.html#grs-regx-tcl" title="1.2.GRS-1 REGX And TCL Input Filters">Section1.2, “<acronym class="acronym">GRS-1</acronym> REGX And TCL Input Filters”</a>.
</p><p>
The loadable <code class="literal">grs.tcl</code> filter module
is also packaged in the GNU/Debian package
<code class="literal">libidzebra2.0-mod-grs-regx</code>
</p></dd></dl></div><p>
</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="grs-canonical-format"></a>1.1.<acronym class="acronym">GRS-1</acronym> Canonical Input Format</h3></div></div></div><p>
Although input data can take any form, it is sometimes useful to
describe the record processing capabilities of the system in terms of
a single, canonical input format that gives access to the full
spectrum of structure and flexibility in the system. In <span class="application">Zebra</span>, this
canonical format is an "<acronym class="acronym">SGML</acronym>-like" syntax.
</p><p>
To use the canonical format specify <code class="literal">grs.sgml</code> as
the record type.
</p><p>
Consider a record describing an information resource (such a record is
sometimes known as a <span class="emphasis"><em>locator record</em></span>).
It might contain a field describing the distributor of the
information resource, which might in turn be partitioned into
various fields providing details about the distributor, like this:
</p><p>
</p><pre class="screen">
<Distributor>
<Name> USGS/WRD </Name>
<Organization> USGS/WRD </Organization>
<Street-Address>
U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
</Street-Address>
<City> ALBUQUERQUE </City>
<State> NM </State>
<Zip-Code> 87102 </Zip-Code>
<Country> USA </Country>
<Telephone> (505) 766-5560 </Telephone>
</Distributor>
</pre><p>
</p><p>
The keywords surrounded by <...> are
<span class="emphasis"><em>tags</em></span>, while the sections of text
in between are the <span class="emphasis"><em>data elements</em></span>.
A data element is characterized by its location in the tree
that is made up by the nested elements.
Each element is terminated by a closing tag - beginning
with <code class="literal"><</code>/, and containing the same symbolic
tag-name as the corresponding opening tag.
The general closing tag - <code class="literal"></></code> -
terminates the element started by the last opening tag. The
structuring of elements is significant.
The element <span class="emphasis"><em>Telephone</em></span>,
for instance, may be indexed and presented to the client differently,
depending on whether it appears inside the
<span class="emphasis"><em>Distributor</em></span> element, or some other,
structured data element such a <span class="emphasis"><em>Supplier</em></span> element.
</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="grs-record-root"></a>1.1.1.Record Root</h4></div></div></div><p>
The first tag in a record describes the root node of the tree that
makes up the total record. In the canonical input format, the root tag
should contain the name of the schema that lends context to the
elements of the record
(see <a class="xref" href="grs-internal-representation.html" title="2.GRS-1 Internal Record Representation">Section2, “<acronym class="acronym">GRS-1</acronym> Internal Record Representation”</a>).
The following is a GILS record that
contains only a single element (strictly speaking, that makes it an
illegal GILS record, since the GILS profile includes several mandatory
elements - <span class="application">Zebra</span> does not validate the contents of a record against
the <acronym class="acronym">Z39.50</acronym> profile, however - it merely attempts to match up elements
of a local representation with the given schema):
</p><p>
</p><pre class="screen">
<gils>
<title>Zen and the Art of Motorcycle Maintenance</title>
</gils>
</pre><p>
</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="grs-variants"></a>1.1.2.Variants</h4></div></div></div><p>
<span class="application">Zebra</span> allows you to provide individual data elements in a number of
<span class="emphasis"><em>variant forms</em></span>. Examples of variant forms are
textual data elements which might appear in different languages, and
images which may appear in different formats or layouts.
The variant system in <span class="application">Zebra</span> is essentially a representation of
the variant mechanism of <acronym class="acronym">Z39.50</acronym>-1995.
</p><p>
The following is an example of a title element which occurs in two
different languages.
</p><p>
</p><pre class="screen">
<title>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance</>
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel</>
</title>
</pre><p>
</p><p>
The syntax of the <span class="emphasis"><em>variant element</em></span> is
<code class="literal"><var class type value></code>.
The available values for the <span class="emphasis"><em>class</em></span> and
<span class="emphasis"><em>type</em></span> fields are given by the variant set
that is associated with the current schema
(see <a class="xref" href="grs.html#grs-variants" title="1.1.2.Variants">Section1.1.2, “Variants”</a>).
</p><p>
Variant elements are terminated by the general end-tag </>, by
the variant end-tag </var>, by the appearance of another variant
tag with the same <span class="emphasis"><em>class</em></span> and
<span class="emphasis"><em>value</em></span> settings, or by the
appearance of another, normal tag. In other words, the end-tags for
the variants used in the example above could have been omitted.
</p><p>
Variant elements can be nested. The element
</p><p>
</p><pre class="screen">
<title>
<var lang lang "eng"><var body iana "text/plain">
Zen and the Art of Motorcycle Maintenance
</title>
</pre><p>
</p><p>
Associates two variant components to the variant list for the title
element.
</p><p>
Given the nesting rules described above, we could write
</p><p>
</p><pre class="screen">
<title>
<var body iana "text/plain>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel
</title>
</pre><p>
</p><p>
The title element above comes in two variants. Both have the IANA body
type "text/plain", but one is in English, and the other in
Danish. The client, using the element selection mechanism of <acronym class="acronym">Z39.50</acronym>,
can retrieve information about the available variant forms of data
elements, or it can select specific variants based on the requirements
of the end-user.
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="grs-regx-tcl"></a>1.2.<acronym class="acronym">GRS-1</acronym> REGX And TCL Input Filters</h3></div></div></div><p>
In order to handle general input formats, <span class="application">Zebra</span> allows the
operator to define filters which read individual records in their
native format and produce an internal representation that the system
can work with.
</p><p>
Input filters are ASCII files, generally with the suffix
<code class="literal">.flt</code>.
The system looks for the files in the directories given in the
<span class="emphasis"><em>profilePath</em></span> setting in the
<code class="literal">zebra.cfg</code> files.
The record type for the filter is
<code class="literal">grs.regx.</code><span class="emphasis"><em>filter-filename</em></span>
(fundamental type <code class="literal">grs</code>, file read
type <code class="literal">regx</code>, argument
<span class="emphasis"><em>filter-filename</em></span>).
</p><p>
Generally, an input filter consists of a sequence of rules, where each
rule consists of a sequence of expressions, followed by an action. The
expressions are evaluated against the contents of the input record,
and the actions normally contribute to the generation of an internal
representation of the record.
</p><p>
An expression can be either of the following:
</p><p>
</p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">INIT</code></span></dt><dd><p>
The action associated with this expression is evaluated
exactly once in the lifetime of the application, before any records
are read. It can be used in conjunction with an action that
initializes tables or other resources that are used in the processing
of input records.
</p></dd><dt><span class="term"><code class="literal">BEGIN</code></span></dt><dd><p>
Matches the beginning of the record. It can be used to
initialize variables, etc. Typically, the
<span class="emphasis"><em>BEGIN</em></span> rule is also used
to establish the root node of the record.
</p></dd><dt><span class="term"><code class="literal">END</code></span></dt><dd><p>
Matches the end of the record - when all of the contents
of the record has been processed.
</p></dd><dt><span class="term">
<code class="literal">/</code><em class="replaceable"><code>reg</code></em><code class="literal">/</code>
</span></dt><dd><p>
Matches regular expression pattern <em class="replaceable"><code>reg</code></em>
from the input record. The operators supported are the same
as for regular expression queries. Refer to
<a class="xref" href="querymodel-zebra.html#querymodel-regular" title="3.6.Zebra Regular Expressions in Truncation Attribute (type = 5)">Section3.6, “<span class="application">Zebra</span> Regular Expressions in Truncation Attribute (type = 5)”</a>.
</p></dd><dt><span class="term"><code class="literal">BODY</code></span></dt><dd><p>
This keyword may only be used between two patterns.
It matches everything between (not including) those patterns.
</p></dd><dt><span class="term"><code class="literal">FINISH</code></span></dt><dd><p>
The expression associated with this pattern is evaluated
once, before the application terminates. It can be used to release
system resources - typically ones allocated in the
<span class="emphasis"><em>INIT</em></span> step.
</p></dd></dl></div><p>
</p><p>
An action is surrounded by curly braces ({...}), and
consists of a sequence of statements. Statements may be separated
by newlines or semicolons (;).
Within actions, the strings that matched the expressions
immediately preceding the action can be referred to as
$0, $1, $2, etc.
</p><p>
The available statements are:
</p><p>
</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">begin <em class="replaceable"><code>type [parameter ... ]</code></em></span></dt><dd><p>
Begin a new
data element. The <em class="replaceable"><code>type</code></em> is one of
the following:
</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">record</span></dt><dd><p>
Begin a new record. The following parameter should be the
name of the schema that describes the structure of the record, e.g.,
<code class="literal">gils</code> or <code class="literal">wais</code> (see below).
The <code class="literal">begin record</code> call should precede
any other use of the <em class="replaceable"><code>begin</code></em> statement.
</p></dd><dt><span class="term">element</span></dt><dd><p>
Begin a new tagged element. The parameter is the
name of the tag. If the tag is not matched anywhere in the tagsets
referenced by the current schema, it is treated as a local string
tag.
</p></dd><dt><span class="term">variant</span></dt><dd><p>
Begin a new node in a variant tree. The parameters are
<em class="replaceable"><code>class type value</code></em>.
</p></dd></dl></div><p>
</p></dd><dt><span class="term">data <em class="replaceable"><code>parameter</code></em></span></dt><dd><p>
Create a data element. The concatenated arguments make
up the value of the data element.
The option <code class="literal">-text</code> signals that
the layout (whitespace) of the data should be retained for
transmission.
The option <code class="literal">-element</code>
<em class="replaceable"><code>tag</code></em> wraps the data up in
the <em class="replaceable"><code>tag</code></em>.
The use of the <code class="literal">-element</code> option is equivalent to
preceding the command with a <em class="replaceable"><code>begin
element</code></em> command, and following
it with the <em class="replaceable"><code>end</code></em> command.
</p></dd><dt><span class="term">end <em class="replaceable"><code>[type]</code></em></span></dt><dd><p>
Close a tagged element. If no parameter is given,
the last element on the stack is terminated.
The first parameter, if any, is a type name, similar
to the <em class="replaceable"><code>begin</code></em> statement.
For the <em class="replaceable"><code>element</code></em> type, a tag
name can be provided to terminate a specific tag.
</p></dd><dt><span class="term">unread <em class="replaceable"><code>no</code></em></span></dt><dd><p>
Move the input pointer to the offset of first character that
match rule given by <em class="replaceable"><code>no</code></em>.
The first rule from left-to-right is numbered zero,
the second rule is named 1 and so on.
</p></dd></dl></div><p>
</p><p>
The following input filter reads a Usenet news file, producing a
record in the WAIS schema. Note that the body of a news posting is
separated from the list of headers by a blank line (or rather a
sequence of two newline characters.
</p><p>
</p><pre class="screen">
BEGIN { begin record wais }
/^From:/ BODY /$/ { data -element name $1 }
/^Subject:/ BODY /$/ { data -element title $1 }
/^Date:/ BODY /$/ { data -element lastModified $1 }
/\n\n/ BODY END {
begin element bodyOfDisplay
begin variant body iana "text/plain"
data -text $1
end record
}
</pre><p>
</p><p>
If <span class="application">Zebra</span> is compiled with support for Tcl enabled, the statements
described above are supplemented with a complete
scripting environment, including control structures (conditional
expressions and loop constructs), and powerful string manipulation
mechanisms for modifying the elements of a record.
</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="record-model-alvisxslt-conf.html">Prev</a></td><td width="20%" align="center"></td><td width="40%" align="right"><a accesskey="n" href="grs-internal-representation.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">2.ALVIS Record Model Configuration</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">2.<acronym class="acronym">GRS-1</acronym> Internal Record Representation</td></tr></table></div></body></html>
|