File: letter-tokenizer.asciidoc

package info (click to toggle)
elasticsearch 1.0.3%2Bdfsg-5
  • links: PTS, VCS
  • area: main
  • in suites: jessie-kfreebsd
  • size: 37,220 kB
  • sloc: java: 365,486; xml: 1,258; sh: 714; python: 505; ruby: 354; perl: 134; makefile: 41
file content (7 lines) | stat: -rw-r--r-- 337 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
[[analysis-letter-tokenizer]]
=== Letter Tokenizer

A tokenizer of type `letter` that divides text at non-letters. That's to
say, it defines tokens as maximal strings of adjacent letters. Note,
this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.