1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
|
[[analysis-synonym-tokenfilter]]
=== Synonym Token Filter
The `synonym` token filter allows to easily handle synonyms during the
analysis process. Synonyms are configured using a configuration file.
Here is an example:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
--------------------------------------------------
The above configures a `synonym` filter, with a path of
`analysis/synonym.txt` (relative to the `config` location). The
`synonym` analyzer is then configured with the filter. Additional
settings are: `ignore_case` (defaults to `false`), and `expand`
(defaults to `true`).
The `tokenizer` parameter controls the tokenizers that will be used to
tokenize the synonym, and defaults to the `whitespace` tokenizer.
Two synonym formats are supported: Solr, WordNet.
[float]
==== Solr synonyms
The following is a sample format of the file:
[source,js]
--------------------------------------------------
# blank lines and lines starting with pound are comments.
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS. These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod
#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz
--------------------------------------------------
You can also define synonyms for the filter directly in the
configuration file (note use of `synonyms` instead of `synonyms_path`):
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
}
}
--------------------------------------------------
However, it is recommended to define large synonyms set in a file using
`synonyms_path`.
[float]
==== WordNet synonyms
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
declared using `format`:
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms" : [
"s(100000001,1,'abstain',v,1,0).",
"s(100000001,2,'refrain',v,1,0).",
"s(100000001,3,'desist',v,1,0)."
]
}
}
}
--------------------------------------------------
Using `synonyms_path` to define WordNet synonyms in a file is supported
as well.
|