File: yaz-icu.html

package info (click to toggle)
yaz 3.0.34-2
  • links: PTS, VCS
  • area: main
  • in suites: lenny
  • size: 13,404 kB
  • ctags: 12,108
  • sloc: xml: 116,075; ansic: 52,205; sh: 9,746; tcl: 2,043; makefile: 1,141; yacc: 347
file content (79 lines) | stat: -rw-r--r-- 7,145 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>yaz-icu</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="YAZ User's Guide and Reference"><link rel="up" href="reference.html" title="Reference"><link rel="prev" href="yaz-illclient.html" title="yaz-illclient"><link rel="next" href="list-oids.html" title="Appendix�A.�List of Object Identifiers"></head><body><link rel="stylesheet" type="text/css" href="common/style1.css"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">yaz-icu</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="yaz-illclient.html">Prev</a>�</td><th width="60%" align="center">Reference</th><td width="20%" align="right">�<a accesskey="n" href="list-oids.html">Next</a></td></tr></table><hr></div><div class="refentry" lang="en"><a name="yaz-icu"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>yaz-icu &#8212; YAZ ICU utility</p></div><div class="refsynopsisdiv"><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">yaz-icu</code>  [commands...] [-c <em class="replaceable"><code>config</code></em>] [-p <em class="replaceable"><code>opt</code></em>] [-x]</p></div></div><div class="refsect1" lang="en"><a name="id2606557"></a><h2>DESCRIPTION</h2><p>
   <span class="command"><strong>yaz-icu</strong></span> is utility which demonstrates 
   the ICU chain module of yaz. (<code class="filename">yaz/icu.h</code>).
  </p></div><div class="refsect1" lang="en"><a name="id2606578"></a><h2>OPTIONS</h2><div class="variablelist"><dl><dt><span class="term">-c <em class="replaceable"><code>config</code></em></span></dt><dd><p>
      Specifies the file containing ICU chain configuration
      which is XML based.
     </p></dd><dt><span class="term">-p <em class="replaceable"><code>type</code></em></span></dt><dd><p>
      Specifies extra information to be printed about the ICU system.
      If <em class="replaceable"><code>type</code></em> is <code class="literal">c</code>
      then ICU converters are printed. 
      If <em class="replaceable"><code>type</code></em> is <code class="literal">l</code>
      available locales are printed.
      If <em class="replaceable"><code>type</code></em> is <code class="literal">t</code>
      available transliterators are printed.
     </p></dd><dt><span class="term">-x <em class="replaceable"><code>config</code></em></span></dt><dd><p>
      Specifies that output should be XML based rather than
      "text" based.
     </p></dd></dl></div></div><div class="refsect1" lang="en"><a name="id2606662"></a><h2>ICU chain configuration</h2><p>
   The ICU chain configuration speicifies one or more rules to convert
   text data into tokens. The configuration format is XML based.
  </p><p>
   The toplevel element must be named <code class="literal">icu_chain</code>.
   The <code class="literal">icu_chain</code> element has one required attribute
   <code class="literal">locale</code> which specifies the ICU locale to be used
   in the conversion steps.
  </p><p>
   The <code class="literal">icu_chain</code> element must include elements where
   each element specifies a conversion step. The conversion is performed
   in the order in which the conversion steps are specified.
   Each conversion element takes one attribute: <code class="literal">rule</code>
   which serves as argument to the conversion step.
  </p><p>
   The following conversion elements are available:
   
   </p><div class="variablelist"><dl><dt><span class="term">casemap</span></dt><dd><p>
       Converts case and rule specifies how:

       </p><div class="variablelist"><dl><dt><span class="term">l</span></dt><dd><p>Lowercase using ICU function u_strToLower. </p></dd><dt><span class="term">u</span></dt><dd><p>Upper case using ICU function u_strToUpper.</p></dd><dt><span class="term">t</span></dt><dd><p>To title using UCU function u_strToTitle.</p></dd><dt><span class="term">f</span></dt><dd><p>Fold case using ICU function u_strFoldCase.</p></dd></dl></div><p>
      </p></dd><dt><span class="term">display</span></dt><dd><p>
       This is a meta step which specifies that a term/token is to
       be displayed. This term is retrieved in an application
       using function icu_chain_token_display (<code class="filename">yaz/icu.h</code>).
      </p></dd><dt><span class="term">transform</span></dt><dd><p>
       Specifies an ICU transform rule. The rule attribute is the
       custom transformation rule to be used. This is a text based format
       which is offered by the ICU transform system. See
       <a class="ulink" href="http://www.icu-project.org/userguide/Transform.html" target="_top">ICU Transforms</a> for
       more information.
      </p></dd><dt><span class="term">tokenize</span></dt><dd><p>
       Breaks / tokenizes a string into components using
       ICU functions ubrk_open, ubrk_setText, .. . The rule is
       one of:
       </p><div class="variablelist"><dl><dt><span class="term">l</span></dt><dd><p>Line. ICU: UBRK_LINE.</p></dd><dt><span class="term">s</span></dt><dd><p>Sentence. ICU: UBRK_SENTENCE.</p></dd><dt><span class="term">w</span></dt><dd><p>Word. ICU: UBRK_WORD.</p></dd><dt><span class="term">c</span></dt><dd><p>Character. ICU: UBRK_CHARACTER.</p></dd><dt><span class="term">t</span></dt><dd><p>Title. ICU: UBRK_TITLE.</p></dd></dl></div><p>
      </p></dd></dl></div><p>
   
  </p></div><div class="refsect1" lang="en"><a name="id2606921"></a><h2>EXAMPLES</h2><p>
   The following command analyzes text in file <code class="filename">text</code>
   using ICU chain configuration <code class="filename">chain.xml</code>:
   </p><pre class="screen">
    cat text | yaz-icu -c chain.xml
   </pre><p>
   The chain.xml might look as follows:
    </p><pre class="screen">
&lt;icu_chain locale="en"&gt;
  &lt;transform rule="[:Control:] Any-Remove"/&gt;
  &lt;tokenize rule="w"/&gt;
  &lt;transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/&gt;
  &lt;display/&gt;
  &lt;casemap rule="l"/&gt;
&lt;/icu_chain&gt;

   </pre><p>
  </p></div><div class="refsect1" lang="en"><a name="id2606962"></a><h2>SEE ALSO</h2><p>
   <span class="citerefentry"><span class="refentrytitle">yaz</span>(7)</span>
  </p><p>
   <a class="ulink" href="http://www.icu-project.org/" target="_top">ICU Home</a>
  </p><p>
   <a class="ulink" href="http://www.icu-project.org/userguide/Transform.html" target="_top">ICU Transforms</a>
  </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="yaz-illclient.html">Prev</a>�</td><td width="20%" align="center"><a accesskey="u" href="reference.html">Up</a></td><td width="40%" align="right">�<a accesskey="n" href="list-oids.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">yaz-illclient�</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">�Appendix�A.�List of Object Identifiers</td></tr></table></div></body></html>