1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
|
.. -*- rst -*-
.. groonga-command
.. database: normalisers
.. _normalizer-table:
``NormalizerTable``
===================
Summary
-------
.. versionadded:: 11.0.4
``NormalizerTable`` normalizes text by user defined normalization table. User defined normalization table is just a normal table but it must satisfy some conditions. They are described later.
.. note::
The normalized text is depends on contents of user defined
normalization table. If you want to use this normalizer for
lexicon, you need to re-index when you change your user defined
normalization table.
Syntax
------
There are required and optional parameters.
Required parameters::
NormalizerTable("normalized", "UserDefinedTable.normalized_column")
Optional parameters::
NormalizerTable("normalized", "UserDefinedTable.normalized_column",
"target", "target_column")
NormalizerTable("normalized", "UserDefinedTable.normalized_column",
"unicode_version", "13.0.0")
Usage
-----
.. _normalizer-table-simple-usage:
Simple usage
^^^^^^^^^^^^
Here is an example of ``NormalizerTable``.
``NormalizerTable`` normalizes text by user defined normalization table. You use the following user defined normalization table here:
* Table type must be ``TABLE_PAT_KEY``.
* Table key type must be ``ShortText``.
* Table must have at least one ``ShortText`` column.
Here are schema and data for this example:
.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-table-simple-usage-prepare.log
.. table_create Normalizations TABLE_PAT_KEY ShortText
.. column_create Normalizations normalized COLUMN_SCALAR ShortText
.. load --table Normalizations
.. [
.. {"_key": "a", "normalized": "<A>"},
.. {"_key": "ac", "normalized": "<AC>"}
.. ]
You can normalize ``a`` with ``<A>`` and ``ac`` with ``<AC>`` with this user defined normalization table. For example:
* ``Groonga`` -> ``Groong<A>``
* ``hack`` -> ``h<AC>k``
Here are examples of ``NormalizerTable`` with the user defined normalization table:
.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-table-simple-usage-output.log
.. normalize 'NormalizerTable("normalized", "Normalizations.normalized")' "Groonga"
.. normalize 'NormalizerTable("normalized", "Normalizations.normalized")' "hack"
.. _normalizer-table-usage-unicode-version:
Unicode version
^^^^^^^^^^^^^^^
Some internal processings such as tokenization and highlight use character type. ``NormalizerTable`` provides character type based on Unicode. You can specify used Unicode version by :ref:`normalizer-table-unicode-version` option.
Here is an example to use Unicode 13.0.0:
.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-table-simple-usage-unicode-version.log
.. normalize 'NormalizerTable("normalized", "Normalizations.normalized")' "Groonga" WITH_TYPES
The default Unicode version is 5.0.0.
.. _normalizer-table-advanced-usage:
Advanced usage
^^^^^^^^^^^^^^
You can put a normalized string to a column instead of ``_key``. In this case, you need to create the following index column for the column:
* Lexicon type of the index column must be ``TABLE_PAT_KEY``.
* Lexicon key type of the index column must be ``ShortText``.
* Lexicon of the index column must not have tokenizer.
You can use any table type for this usage such as ``TABLE_NO_KEY``. This is useful when you can't control table type. For example, PGroonga users can only use this usage.
Here are schema and data for this example:
.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-table-advanced-usage-prepare.log
.. table_create ColumnNormalizations TABLE_NO_KEY
.. column_create ColumnNormalizations target_column COLUMN_SCALAR ShortText
.. column_create ColumnNormalizations normalized COLUMN_SCALAR ShortText
..
.. table_create Targets TABLE_PAT_KEY ShortText
.. column_create Targets column_normalizations_target_column \
.. COLUMN_INDEX ColumnNormalizations target_column
..
.. load --table ColumnNormalizations
.. [
.. {"target_column": "a", "normalized": "<A>"},
.. {"target_column": "ac", "normalized": "<AC>"}
.. ]
You need to use :ref:`normalizer-table-target` option to use the user defined normalization table. The above schema uses ``target_column`` for explanation. Generally, ``_column`` in ``target_column`` is redundant but it's added for easy to distinct parameter name and parameter value.
Here are examples of ``NormalizerTable`` with the user defined normalization table:
.. groonga-command
.. include:: ../../example/reference/normalizers/normalizer-table-simple-usage-output.log
.. normalize 'NormalizerTable("normalized", "ColumnNormalizations.normalized", "target", "target_column")' "Groonga"
.. normalize 'NormalizerTable("normalized", "ColumnNormalizations.normalized", "target", "target_column")' "hack"
Parameters
----------
Required parameter
^^^^^^^^^^^^^^^^^^
.. _normalizer-table-normalized:
``normalized``
""""""""""""""
This option specifies a column that has normalized texts. Normalized target texts are texts in corresponding ``_key`` column or column specified by :ref:`normalizer-table-target`.
Value type of the column specified for this option must be one of ``ShortText``, ``Text`` and ``LongText``.
If you don't use :ref:`normalizer-table-target`, the table of column specified for this option must satisfy the followings:
* Table type is ``TABLE_PAT_KEY``
* Table key type is ``ShortText``
See :ref:`normalizer-table-simple-usage` for usage of this case.
Optional parameters
^^^^^^^^^^^^^^^^^^^
.. _normalizer-table-target:
``target``
""""""""""
This option specifies a column that has normalization target texts.
Value type of the column specified for this option must be one of ``ShortText``, ``Text`` and ``LongText``.
You must create an index column for the column specified for this option. The index column and its lexicon must satisfies the followings:
* Index column can be a single column index or a multi column index.
* Lexicon type of the index column must be ``TABLE_PAT_KEY``.
* Lexicon key type of the index column must be ``ShortText``.
* Lexicon of the index must not have tokenizer.
See :ref:`normalizer-table-advanced-usage` for usage of this case.
.. _normalizer-table-unicode-version:
``unicode_version``
"""""""""""""""""""
This option specifies Unicode version to use determining character type.
The default Unicode version is 5.0.0.
See :ref:`normalizer-table-usage-unicode-version` for usage.
See also
--------
* :doc:`../commands/normalize`
|