File: table_tokenize.rst

package info (click to toggle)

groonga 9.0.0-1%2Bdeb10u1

links: PTS, VCS
area: main
in suites: buster
size: 101,496 kB
sloc: ansic: 608,707; ruby: 35,042; xml: 23,643; cpp: 10,319; sh: 7,453; yacc: 5,968; python: 3,033; makefile: 2,609; perl: 133

file content (120 lines) | stat: -rw-r--r-- 2,818 bytes

parent folder | download | duplicates (5)

.. -*- rst -*-

.. highlightlang:: none

.. groonga-command
.. database: commands_table_tokenize

``table_tokenize``
==================

Summary
-------

``table_tokenize`` command tokenizes text by the specified table's tokenizer.

Syntax
------

This command takes many parameters.

``table`` and ``string`` are required parameters. Others are
optional::

  table_tokenize table
                 string
                 [flags=NONE]
                 [mode=GET]
                 [index_column=null]

Usage
-----

Here is a simple example.

.. groonga-command
.. include:: ../../example/reference/commands/table_tokenize/simple_example.log
.. plugin_register token_filters/stop_word
.. table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
.. column_create Terms is_stop_word COLUMN_SCALAR Bool
.. load --table Terms
.. [
.. {"_key": "and", "is_stop_word": true}
.. ]
.. table_tokenize Terms "Hello and Good-bye" --mode GET

``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer,
``TokenFilterStopWord`` token filter. It returns tokens that is
generated by tokenizeing ``"Hello and Good-bye"`` with ``TokenBigram`` tokenizer.
It is normalized by ``NormalizerAuto`` normalizer.
``and`` token is removed with ``TokenFilterStopWord`` token filter.

Parameters
----------

This section describes all parameters. Parameters are categorized.

Required parameters
^^^^^^^^^^^^^^^^^^^

There are required parameters, ``table`` and ``string``.

``table``
"""""""""

Specifies the lexicon table. ``table_tokenize`` command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.

``string``
""""""""""

Specifies any string which you want to tokenize.

See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` about details.

Optional parameters
^^^^^^^^^^^^^^^^^^^

There are optional parameters.

``flags``
"""""""""

Specifies a tokenization customize options. You can specify
multiple options separated by "``|``".

The default value is ``NONE``.

See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` about details.

``mode``
""""""""

Specifies a tokenize mode.

The default value is ``GET``.

See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about details.

``index_column``
""""""""""""""""

Specifies an index column.

Return value includes ``estimated_size`` of the index.

The ``estimated_size`` is useful for checking estimated frequency of tokens.

Return value
------------

``table_tokenize`` command returns tokenized tokens.

See :ref:`tokenize-return-value` option in :doc:`/reference/commands/tokenize` about details.

See also
--------

* :doc:`/reference/tokenizers`
* :doc:`/reference/commands/tokenize`