File: table_tokenize.rst

package info (click to toggle)
groonga 9.0.0-1%2Bdeb10u1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 101,496 kB
  • sloc: ansic: 608,707; ruby: 35,042; xml: 23,643; cpp: 10,319; sh: 7,453; yacc: 5,968; python: 3,033; makefile: 2,609; perl: 133
file content (120 lines) | stat: -rw-r--r-- 2,818 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
.. -*- rst -*-

.. highlightlang:: none

.. groonga-command
.. database: commands_table_tokenize

``table_tokenize``
==================

Summary
-------

``table_tokenize`` command tokenizes text by the specified table's tokenizer.

Syntax
------

This command takes many parameters.

``table`` and ``string`` are required parameters. Others are
optional::

  table_tokenize table
                 string
                 [flags=NONE]
                 [mode=GET]
                 [index_column=null]

Usage
-----

Here is a simple example.

.. groonga-command
.. include:: ../../example/reference/commands/table_tokenize/simple_example.log
.. plugin_register token_filters/stop_word
.. table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
.. column_create Terms is_stop_word COLUMN_SCALAR Bool
.. load --table Terms
.. [
.. {"_key": "and", "is_stop_word": true}
.. ]
.. table_tokenize Terms "Hello and Good-bye" --mode GET

``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer,
``TokenFilterStopWord`` token filter. It returns tokens that is
generated by tokenizeing ``"Hello and Good-bye"`` with ``TokenBigram`` tokenizer.
It is normalized by ``NormalizerAuto`` normalizer.
``and`` token is removed with ``TokenFilterStopWord`` token filter.

Parameters
----------

This section describes all parameters. Parameters are categorized.

Required parameters
^^^^^^^^^^^^^^^^^^^

There are required parameters, ``table`` and ``string``.

``table``
"""""""""

Specifies the lexicon table. ``table_tokenize`` command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.

``string``
""""""""""

Specifies any string which you want to tokenize.

See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` about details.

Optional parameters
^^^^^^^^^^^^^^^^^^^

There are optional parameters.

``flags``
"""""""""

Specifies a tokenization customize options. You can specify
multiple options separated by "``|``".

The default value is ``NONE``.

See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` about details.

``mode``
""""""""

Specifies a tokenize mode.

The default value is ``GET``.

See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about details.

``index_column``
""""""""""""""""

Specifies an index column.

Return value includes ``estimated_size`` of the index.

The ``estimated_size`` is useful for checking estimated frequency of tokens.

Return value
------------

``table_tokenize`` command returns tokenized tokens.

See :ref:`tokenize-return-value` option in :doc:`/reference/commands/tokenize` about details.

See also
--------

* :doc:`/reference/tokenizers`
* :doc:`/reference/commands/tokenize`