1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
|
[[query-dsl-terms-filter]]
=== Terms Filter
Filters documents that have fields that match any of the provided terms
(*not analyzed*). For example:
[source,js]
--------------------------------------------------
{
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
--------------------------------------------------
The `terms` filter is also aliased with `in` as the filter name for
simpler usage.
[float]
==== Execution Mode
The way terms filter executes is by iterating over the terms provided
and finding matches docs (loading into a bitset) and caching it.
Sometimes, we want a different execution model that can still be
achieved by building more complex queries in the DSL, but we can support
them in the more compact model that terms filter provides.
The `execution` option now has the following options :
[horizontal]
`plain`::
The default. Works as today. Iterates over all the terms,
building a bit set matching it, and filtering. The total filter is
cached.
`fielddata`::
Generates a terms filters that uses the fielddata cache to
compare terms. This execution mode is great to use when filtering
on a field that is already loaded into the fielddata cache from
faceting, sorting, or index warmers. When filtering on
a large number of terms, this execution can be considerably faster
than the other modes. The total filter is not cached unless
explicitly configured to do so.
`bool`::
Generates a term filter (which is cached) for each term, and
wraps those in a bool filter. The bool filter itself is not cached as it
can operate very quickly on the cached term filters.
`and`::
Generates a term filter (which is cached) for each term, and
wraps those in an and filter. The and filter itself is not cached.
`or`::
Generates a term filter (which is cached) for each term, and
wraps those in an or filter. The or filter itself is not cached.
Generally, the `bool` execution mode should be preferred.
If you don't want the generated individual term queries to be cached,
you can use: `bool_nocache`, `and_nocache` or `or_nocache` instead, but
be aware that this will affect performance.
The "total" terms filter caching can still be explicitly controlled
using the `_cache` option. Note the default value for it depends on the
execution value.
For example:
[source,js]
--------------------------------------------------
{
"constant_score" : {
"filter" : {
"terms" : {
"user" : ["kimchy", "elasticsearch"],
"execution" : "bool",
"_cache": true
}
}
}
}
--------------------------------------------------
[float]
==== Caching
The result of the filter is automatically cached by default. The
`_cache` can be set to `false` to turn it off.
[float]
==== Terms lookup mechanism
When it's needed to specify a `terms` filter with a lot of terms it can
be beneficial to fetch those term values from a document in an index. A
concrete example would be to filter tweets tweeted by your followers.
Potentially the amount of user ids specified in the terms filter can be
a lot. In this scenario it makes sense to use the terms filter's terms
lookup mechanism.
The terms lookup mechanism supports the following options:
[horizontal]
`index`::
The index to fetch the term values from. Defaults to the
current index.
`type`::
The type to fetch the term values from.
`id`::
The id of the document to fetch the term values from.
`path`::
The field specified as path to fetch the actual values for the
`terms` filter.
`routing`::
A custom routing value to be used when retrieving the
external terms doc.
`cache`::
Whether to cache the filter built from the retrieved document
(`true` - default) or whether to fetch and rebuild the filter on every
request (`false`). See "<<query-dsl-terms-filter-lookup-caching,Terms lookup caching>>" below
The values for the `terms` filter will be fetched from a field in a
document with the specified id in the specified type and index.
Internally a get request is executed to fetch the values from the
specified path. At the moment for this feature to work the `_source`
needs to be stored.
Also, consider using an index with a single shard and fully replicated
across all nodes if the "reference" terms data is not large. The lookup
terms filter will prefer to execute the get request on a local node if
possible, reducing the need for networking.
["float",id="query-dsl-terms-filter-lookup-caching"]
==== Terms lookup caching
There is an additional cache involved, which caches the lookup of the
lookup document to the actual terms. This lookup cache is a LRU cache.
This cache has the following options:
`indices.cache.filter.terms.size`::
The size of the lookup cache. The default is `10mb`.
`indices.cache.filter.terms.expire_after_access`::
The time after the last read an entry should expire. Disabled by default.
`indices.cache.filter.terms.expire_after_write`:
The time after the last write an entry should expire. Disabled by default.
All options for the lookup of the documents cache can only be configured
via the `elasticsearch.yml` file.
When using the terms lookup the `execution` option isn't taken into
account and behaves as if the execution mode was set to `plain`.
[float]
==== Terms lookup twitter example
[source,js]
--------------------------------------------------
# index the information for user with id 2, specifically, its followers
curl -XPUT localhost:9200/users/user/2 -d '{
"followers" : ["1", "3"]
}'
# index a tweet, from user with id 2
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"user" : "2"
}'
# search on all the tweets that match the followers of user 2
curl -XGET localhost:9200/tweets/_search -d '{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
},
"_cache_key" : "user_2_friends"
}
}
}
}
}'
--------------------------------------------------
The above is highly optimized, both in a sense that the list of
followers will not be fetched if the filter is already cached in the
filter cache, and with internal LRU cache for fetching external values
for the terms filter. Also, the entry in the filter cache will not hold
`all` the terms reducing the memory required for it.
`_cache_key` is recommended to be set, so its simple to clear the cache
associated with it using the clear cache API. For example:
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/tweets/_cache/clear?filter_keys=user_2_friends'
--------------------------------------------------
The structure of the external terms document can also include array of
inner objects, for example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/users/user/2 -d '{
"followers" : [
{
"id" : "1"
},
{
"id" : "2"
}
]
}'
--------------------------------------------------
In which case, the lookup path will be `followers.id`.
|