1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
|
.. -*- rst -*-
.. highlightlang:: none
.. groonga-command
.. database: tokenizers
.. _token-bigram-ignore-blank-split-symbol:
``TokenBigramIgnoreBlankSplitSymbol``
=====================================
Summary
-------
``TokenBigramIgnoreBlankSplitSymbol`` is similar to
:ref:`token-bigram`. The differences between them are the followings:
* Blank handling
* Symbol handling
Syntax
------
``TokenBigramIgnoreBlankSplitSymbol`` hasn't parameter::
TokenBigramIgnoreBlankSplitSymbol
Usage
-----
``TokenBigramIgnoreBlankSplitSymbol`` ignores white-spaces in
continuous symbols and non-ASCII characters.
``TokenBigramIgnoreBlankSplitSymbol`` tokenizes symbols by bigram
tokenize method.
You can find difference of them by ``日 本 語 ! ! !`` text because it
has symbols and non-ASCII characters.
Here is a result by :ref:`token-bigram` :
.. groonga-command
.. include:: ../../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol.log
.. tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto
Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``:
.. groonga-command
.. include:: ../../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol.log
.. tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto
|