File: tokenizers.asciidoc

package info (click to toggle)
elasticsearch 1.0.3%2Bdfsg-5
  • links: PTS, VCS
  • area: main
  • in suites: jessie-kfreebsd
  • size: 37,220 kB
  • sloc: java: 365,486; xml: 1,258; sh: 714; python: 505; ruby: 354; perl: 134; makefile: 41
file content (30 lines) | stat: -rw-r--r-- 864 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[[analysis-tokenizers]]
== Tokenizers

Tokenizers are used to break a string down into a stream of terms
or tokens. A simple tokenizer might split the string up into terms 
wherever it encounters whitespace or punctuation.

Elasticsearch has a number of built in tokenizers which can be
used to build <<analysis-custom-analyzer,custom analyzers>>.

include::tokenizers/standard-tokenizer.asciidoc[]

include::tokenizers/edgengram-tokenizer.asciidoc[]

include::tokenizers/keyword-tokenizer.asciidoc[]

include::tokenizers/letter-tokenizer.asciidoc[]

include::tokenizers/lowercase-tokenizer.asciidoc[]

include::tokenizers/ngram-tokenizer.asciidoc[]

include::tokenizers/whitespace-tokenizer.asciidoc[]

include::tokenizers/pattern-tokenizer.asciidoc[]

include::tokenizers/uaxurlemail-tokenizer.asciidoc[]

include::tokenizers/pathhierarchy-tokenizer.asciidoc[]