1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
|
[[faq]]
= Frequently Asked Questions
[partintro]
--
This section will be updated as more frequently asked questions arise
* <<faq_doc_error,How can I report an error in the documentation?>>
* <<faq_partial_delete,Can I delete only certain data from within indices?>>
* <<faq_strange_chars,Can Curator handle index names with strange characters?>>
--
[[faq_doc_error]]
== Q: How can I report an error in the documentation?
=== A: Use the "Edit" link on any page
See <<site-corrections,Site Corrections>>.
[[faq_partial_delete]]
== Q: Can I delete only certain data from within indices?
=== A: It's complicated
[float]
TL;DR: No. Curator can only delete entire indices.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[float]
Full answer:
^^^^^^^^^^^^
As a thought exercise, think of Elasticsearch indices as being like databases,
or tablespaces within a database. If you had hundreds of millions of rows to
delete from your database, would you run a separate
`DELETE from TABLE where date<YYYY.MM.dd` to assemble hundreds of millions of
individual delete operations every day, or would you partition your tables in a
way that you could simply run `DROP table TABLENAME.YYYY.MM.dd`? The strain on
your database would be astronomical on the former and next to nothing on the
latter. Elasticsearch works much the same way. While Elasticsearch _can_
technically do both methods, for use-cases with time-series data (like logging),
we recommend dropping entire indices vs. the extremely I/O expensive search and
delete method. Curator was created to help fill that need.
While you can store different types within different indices (e.g.
syslog-2014.05.05, apache-2015.05.06), this gets very expensive, very quickly in
a totally different way. Each shard in Elasticsearch is a Lucene index. Each
index requires a portion of the heap to exist and be kept current. If you have 3
daily indices with 5 primary shards each, you suddenly have reduced the
available heap space for shard management by a factor of 3, having gone from 5
shards to 15, __per index,__ not counting multiple indexes per day. The ways to
mitigate this (if you pursue this route) include massive daily indexing boxes
and using shard allocation/routing to move indices to specific members of the
cluster where they can have less effect; keeping fewer days of information;
having more nodes in your cluster, and so forth.
[float]
Conclusion:
^^^^^^^^^^^
While it may be desirable to have different life-cycles for your data, sometimes
it's just easier and cheaper to store everything as long as the longest
life-cycle you wish to maintain.
[float]
Post-script:
^^^^^^^^^^^^
Even though it is neither recommended footnote:[There are reasons Elasticsearch does not recommend this, particularly for time-series data. For more information read http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html and watch what happens to your segments when you delete data.],
nor best practices, it is still possible to perform these search & delete
operations yourself, using the {ref}/docs-delete-by-query.html[Delete-by-Query
API]. Curator will not be modified to perform operations such as these, however.
Curator is meant to manage at the index level, rather than the data level.
'''''
[[faq_strange_chars]]
== Q: Can Curator handle index names with strange characters?
=== A: Yes!
This problem can be resolved by using the
<<filtertype_pattern,pattern filtertype>> with <<fe_kind,kind>> set to `regex`,
and <<fe_value,value>> set to the needed regular expression.
[float]
The Problem:
^^^^^^^^^^^^
Illegal characters make it hard to delete indices.
------------------
% curl logs.example.com:9200/_cat/indices
red }?ebc-2015.04.08.03
sip-request{ 5 1 0 0 632b 316b
red }?ebc-2015.04.08.03
sip-response 5 1 0 0 474b 237b
red ?ebc-2015.04.08.02
sip-request{ 5 1 0 0 474b 316b
red
eb 5 1 0 0 632b 316b
red ?e 5 1 0 0 632b 316b
------------------
You can see it looks like there are some tab characters and maybe newline
characters. This makes it hard to use the HTTP API to delete the indices.
Dumping all the index settings out:
[source,sh]
-------
curl -XGET localhost:9200/*/_settings?pretty
-------
...reveals the index names as the first key in the resulting JSON. In this
case, the names were very atypical:
-------
}\b?\u0011ebc-2015.04.08.02\u000Bsip-request{
}\u0006?\u0011ebc-2015.04.08.03\u000Bsip-request{
}\u0003?\u0011ebc-2015.04.08.03\fsip-response
...
-------
Curator lets you use regular expressions to select indices to perform actions
on.
WARNING: Before attempting an action, see what will be affected by using the
`--dry-run` flag first.
To delete the first three from the above example, use `'.*sip.*'` as your
regular expression.
NOTE: In an <<actionfile,actionfile>>, regular expressions and strftime date
strings _must_ be encapsulated in single-quotes.
The next one is trickier. The real name of the index was `\n\u0011eb`. The
regular expression `.*b$` did not work, but `'\n.*'` did.
The last index can be deleted with a regular expression of `'.*e$'`.
The resulting <<actionfile,actionfile>> might look like this:
[source,yaml]
--------
actions:
1:
description: Delete indices with strange characters that match regex '.*sip.*'
action: delete_indices
options:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: regex
value: '.*sip.*'
2:
description: Delete indices with strange characters that match regex '\n.*'
action: delete_indices
options:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: regex
value: '\n.*'
3:
description: Delete indices with strange characters that match regex '.*e$'
action: delete_indices
options:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: regex
value: '.*e$'
--------
|