File: normalizer_auto.rst

package info (click to toggle)
groonga 15.0.4%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 163,080 kB
  • sloc: ansic: 770,564; cpp: 48,925; ruby: 40,447; javascript: 10,250; yacc: 7,045; sh: 5,602; python: 2,821; makefile: 1,672
file content (51 lines) | stat: -rw-r--r-- 1,389 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
.. -*- rst -*-

.. groonga-command
.. database: normalisers

.. _normalizer-auto:

``NormalizerAuto``
==================

Summary
-------

Normally you should use ``NormalizerAuto``
normalizer. ``NormalizerAuto`` was the normalizer for Groonga 2.0.9 or
earlier. ``KEY_NORMALIZE`` flag in ``table_create`` on Groonga 2.0.9
or earlier equals to ``--normalizer NormalizerAuto`` option in
``table_create`` on Groonga 2.1.0 or later.

``NormalizerAuto`` supports all encoding. It uses Unicode NFKC
(Normalization Form Compatibility Composition) for UTF-8 encoding
text. It uses encoding specific original normalization for other
encodings. The results of those original normalization are similar to
NFKC.

Syntax
------

``NormalizerAuto`` hasn't parameter::

  NormalizerAuto

Usage
-----

``NormalizerAuto`` normalizes half-width katakana (such as U+FF76 HALFWIDTH KATAKANA
LETTER KA) + half-width katakana voiced sound mark (U+FF9E HALFWIDTH
KATAKANA VOICED SOUND MARK) to full-width katakana with
voiced sound mark (U+30AC KATAKANA LETTER GA). The former is two
characters but the latter is one character.

Here is an example that uses ``NormalizerAuto`` normalizer:

.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-auto.log
.. table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerAuto

See also
--------

* :doc:`../commands/normalize`