File: faq.asciidoc

package info (click to toggle)
elasticsearch-curator 8.0.21-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,716 kB
  • sloc: python: 17,838; makefile: 159; sh: 156
file content (184 lines) | stat: -rw-r--r-- 6,255 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
[[faq]]
= Frequently Asked Questions

[partintro]
--
This section will be updated as more frequently asked questions arise

* <<faq_doc_error,How can I report an error in the documentation?>>
* <<faq_partial_delete,Can I delete only certain data from within indices?>>
* <<faq_strange_chars,Can Curator handle index names with strange characters?>>
--

[[faq_doc_error]]
== Q: How can I report an error in the documentation?

=== A: Use the "Edit" link on any page

See <<site-corrections,Site Corrections>>.

[[faq_partial_delete]]
== Q: Can I delete only certain data from within indices?

=== A: It's complicated

[float]
TL;DR: No. Curator can only delete entire indices.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[float]
Full answer:
^^^^^^^^^^^^

As a thought exercise, think of Elasticsearch indices as being like databases,
or tablespaces within a database. If you had hundreds of millions of rows to
delete from your database, would you run a separate
`DELETE from TABLE where date<YYYY.MM.dd` to assemble hundreds of millions of
individual delete operations every day, or would you partition your tables in a
way that you could simply run `DROP table TABLENAME.YYYY.MM.dd`? The strain on
your database would be astronomical on the former and next to nothing on the
latter. Elasticsearch works much the same way. While Elasticsearch _can_
technically do both methods, for use-cases with time-series data (like logging),
we recommend dropping entire indices vs. the extremely I/O expensive search and
delete method. Curator was created to help fill that need.

While you can store different types within different indices (e.g.
syslog-2014.05.05, apache-2015.05.06), this gets very expensive, very quickly in
a totally different way. Each shard in Elasticsearch is a Lucene index. Each
index requires a portion of the heap to exist and be kept current. If you have 3
daily indices with 5 primary shards each, you suddenly have reduced the
available heap space for shard management by a factor of 3, having gone from 5
shards to 15, __per index,__ not counting multiple indexes per day. The ways to
mitigate this (if you pursue this route) include massive daily indexing boxes
and using shard allocation/routing to move indices to specific members of the
cluster where they can have less effect; keeping fewer days of information;
having more nodes in your cluster, and so forth.

[float]
Conclusion:
^^^^^^^^^^^

While it may be desirable to have different life-cycles for your data, sometimes
it's just easier and cheaper to store everything as long as the longest
life-cycle you wish to maintain.

[float]
Post-script:
^^^^^^^^^^^^

Even though it is neither recommended footnote:[There are reasons Elasticsearch does not recommend this, particularly for time-series data. For more information read http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html and watch what happens to your segments when you delete data.],
nor best practices, it is still possible to perform these search & delete
operations yourself, using the {ref}/docs-delete-by-query.html[Delete-by-Query
API]. Curator will not be modified to perform operations such as these, however.
Curator is meant to manage at the index level, rather than the data level.

'''''

[[faq_strange_chars]]
== Q: Can Curator handle index names with strange characters?

=== A: Yes!

This problem can be resolved by using the
<<filtertype_pattern,pattern filtertype>> with <<fe_kind,kind>> set to `regex`,
and <<fe_value,value>> set to the needed regular expression.

[float]
The Problem:
^^^^^^^^^^^^

Illegal characters make it hard to delete indices.

------------------
% curl logs.example.com:9200/_cat/indices
red    }?ebc-2015.04.08.03
                          sip-request{ 5 1         0  0     632b     316b
red    }?ebc-2015.04.08.03
                          sip-response 5 1         0  0     474b     237b
red    ?ebc-2015.04.08.02
                         sip-request{ 5 1         0  0     474b     316b
red
eb                               5 1         0  0     632b     316b
red    ?e                                5 1         0  0     632b     316b
------------------

&nbsp;

You can see it looks like there are some tab characters and maybe newline
characters. This makes it hard to use the HTTP API to delete the indices.

Dumping all the index settings out:

[source,sh]
-------
curl -XGET localhost:9200/*/_settings?pretty
-------

&nbsp;

...reveals the index names as the first key in the resulting JSON.  In this
case, the names were very atypical:

-------
}\b?\u0011ebc-2015.04.08.02\u000Bsip-request{
}\u0006?\u0011ebc-2015.04.08.03\u000Bsip-request{
}\u0003?\u0011ebc-2015.04.08.03\fsip-response
...
-------

&nbsp;

Curator lets you use regular expressions to select indices to perform actions
on.

WARNING: Before attempting an action, see what will be affected by using the
`--dry-run` flag first.

To delete the first three from the above example, use `'.*sip.*'` as your
regular expression.

NOTE: In an <<actionfile,actionfile>>, regular expressions and strftime date
strings _must_ be encapsulated in single-quotes.

The next one is trickier. The real name of the index was `\n\u0011eb`. The
regular expression `.*b$` did not work, but `'\n.*'` did.

The last index can be deleted with a regular expression of `'.*e$'`.

The resulting <<actionfile,actionfile>> might look like this:

[source,yaml]
--------
actions:
  1:
    description: Delete indices with strange characters that match regex '.*sip.*'
    action: delete_indices
    options:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '.*sip.*'
  2:
    description: Delete indices with strange characters that match regex '\n.*'
    action: delete_indices
    options:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '\n.*'
  3:
    description: Delete indices with strange characters that match regex '.*e$'
    action: delete_indices
    options:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '.*e$'
--------