1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
|
.. -*- rst -*-
.. highlightlang:: none
.. groonga-command
.. database: commands_table_tokenize
``table_tokenize``
==================
Summary
-------
``table_tokenize`` command tokenizes text by the specified table's tokenizer.
Syntax
------
This command takes many parameters.
``table`` and ``string`` are required parameters. Others are
optional::
table_tokenize table
string
[flags=NONE]
[mode=GET]
[index_column=null]
Usage
-----
Here is a simple example.
.. groonga-command
.. include:: ../../example/reference/commands/table_tokenize/simple_example.log
.. plugin_register token_filters/stop_word
.. table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto --token_filters TokenFilterStopWord
.. column_create Terms is_stop_word COLUMN_SCALAR Bool
.. load --table Terms
.. [
.. {"_key": "and", "is_stop_word": true}
.. ]
.. table_tokenize Terms "Hello and Good-bye" --mode GET
``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer,
``TokenFilterStopWord`` token filter. It returns tokens that is
generated by tokenizeing ``"Hello and Good-bye"`` with ``TokenBigram`` tokenizer.
It is normalized by ``NormalizerAuto`` normalizer.
``and`` token is removed with ``TokenFilterStopWord`` token filter.
Parameters
----------
This section describes all parameters. Parameters are categorized.
Required parameters
^^^^^^^^^^^^^^^^^^^
There are required parameters, ``table`` and ``string``.
``table``
"""""""""
Specifies the lexicon table. ``table_tokenize`` command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.
``string``
""""""""""
Specifies any string which you want to tokenize.
See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` about details.
Optional parameters
^^^^^^^^^^^^^^^^^^^
There are optional parameters.
``flags``
"""""""""
Specifies a tokenization customize options. You can specify
multiple options separated by "``|``".
The default value is ``NONE``.
See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` about details.
``mode``
""""""""
Specifies a tokenize mode.
The default value is ``GET``.
See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about details.
``index_column``
""""""""""""""""
Specifies an index column.
Return value includes ``estimated_size`` of the index.
The ``estimated_size`` is useful for checking estimated frequency of tokens.
Return value
------------
``table_tokenize`` command returns tokenized tokens.
See :ref:`tokenize-return-value` option in :doc:`/reference/commands/tokenize` about details.
See also
--------
* :doc:`/reference/tokenizers`
* :doc:`/reference/commands/tokenize`
|