File: flapping.html

package info (click to toggle)
icinga 1.14.2%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 49,264 kB
  • sloc: ansic: 108,564; sql: 9,656; sh: 4,945; perl: 3,439; makefile: 1,213; php: 581; xml: 104
file content (388 lines) | stat: -rw-r--r-- 19,004 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>7.8. Detection and Handling of State Flapping</title>
<link rel="stylesheet" href="../stylesheets/icinga-docs.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.1">
<meta name="keywords" content="Supervision, Icinga, Nagios, Linux">
<link rel="home" href="index.html" title="Icinga Version 1.14 Documentation">
<link rel="up" href="ch07.html" title="Chapter 7. Advanced Topics">
<link rel="prev" href="redundancy.html" title="7.7. Redundant and Failover Network Monitoring">
<link rel="next" href="escalations.html" title="7.9. Notification Escalations">
<script src="../js/jquery-min.js" type="text/javascript"></script><script src="../js/icinga-docs.js" type="text/javascript"></script>
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<CENTER><IMG src="../images/logofullsize.png" border="0" alt="Icinga" title="Icinga"></CENTER>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr><th colspan="3" align="center">7.8. Detection and Handling of State Flapping</th></tr>
<tr>
<td width="20%" align="left">
<a accesskey="p" href="redundancy.html">Prev</a> </td>
<th width="60%" align="center">Chapter 7. Advanced Topics</th>
<td width="20%" align="right"> <a accesskey="n" href="escalations.html">Next</a>
</td>
</tr>
</table>
<hr>
</div>
<div class="section" title="7.8. Detection and Handling of State Flapping">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="flapping"></a>7.8. <a name="state_flapping"></a>Detection and Handling of State Flapping</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section">7.8.1. <a href="flapping.html#introduction">Introduction</a></span></dt>
<dt><span class="section">7.8.2. <a href="flapping.html#howitworks">How Flap Detection Works</a></span></dt>
<dt><span class="section">7.8.3. <a href="flapping.html#example">Example</a></span></dt>
<dt><span class="section">7.8.4. <a href="flapping.html#detectionservices">Flap Detection for Services</a></span></dt>
<dt><span class="section">7.8.5. <a href="flapping.html#detectionhosts">Flap Detection for Hosts</a></span></dt>
<dt><span class="section">7.8.6. <a href="flapping.html#thresholds">Flap Detection Thresholds</a></span></dt>
<dt><span class="section">7.8.7. <a href="flapping.html#statesused">States Used For Flap Detection</a></span></dt>
<dt><span class="section">7.8.8. <a href="flapping.html#handling">Flap Handling</a></span></dt>
<dt><span class="section">7.8.9. <a href="flapping.html#enable">Enabling Flap Detection</a></span></dt>
</dl></div>
  

  <div class="section" title="7.8.1. Introduction">
<div class="titlepage"><div><div><h3 class="title">
<a name="introduction"></a>7.8.1. Introduction</h3></div></div></div>
    

    <p>Icinga supports optional detection of hosts and services that are "flapping". Flapping occurs when a service or host
    changes state too frequently, resulting in a storm of problem and recovery notifications. Flapping can be indicative of configuration
    problems (i.e. thresholds set too low), troublesome services, or real network problems.</p>
  </div>

  <div class="section" title="7.8.2. How Flap Detection Works">
<div class="titlepage"><div><div><h3 class="title">
<a name="howitworks"></a>7.8.2. How Flap Detection Works</h3></div></div></div>
    

    <p>Before we get into this, it is time to say that flapping detection has been a little difficult to implement. How exactly does one
    determine what "too frequently" means in regards to state changes for a particular host or service? When Ethan Galstad first started
    thinking about implementing flap detection he tried to find some information on how flapping could/should be detected. He couldn't find
    any information about what others were using (where they using any?), so he decided to settle with what seemed to him to be a reasonable
    solution...</p>

    <p>Whenever Icinga checks the status of a host or service, it will check to see if it has started or stopped flapping. It does
    this by:</p>

    <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
        <p>Storing the results of the last 21 checks of the host or service [1]</p>
      </li>
<li class="listitem">
        <p>Analyzing the historical check results and determine where state changes/transitions occur</p>
      </li>
<li class="listitem">
        <p>Using the state transitions to determine a percent state change value (a measure of change) for the host or service</p>
      </li>
<li class="listitem">
        <p>Comparing the percent state change value against low and high flapping thresholds</p>
      </li>
</ul></div>

    <p>A host or service is determined to have <span class="emphasis"><em>started</em></span> flapping when its percent state change first exceeds a
    <span class="emphasis"><em>high</em></span> flapping threshold.</p>

    <p>A host or service is determined to have <span class="emphasis"><em>stopped</em></span> flapping when its percent state goes below a
    <span class="emphasis"><em>low</em></span> flapping threshold (assuming that is was previously flapping).</p>

    <p>[1] Results for services in a SOFT state are not stored unless when they return to an OK state.</p>
  </div>

  <div class="section" title="7.8.3. Example">
<div class="titlepage"><div><div><h3 class="title">
<a name="example"></a>7.8.3. Example</h3></div></div></div>
    

    <p>Let's describe in more detail how flap detection works with services...</p>

    <p>The image below shows a chronological history of service states from the most recent 21 service checks. OK states are shown in
    green, WARNING states in yellow, CRITICAL states in red, and UNKNOWN states in orange.</p>

    <p><span class="inlinemediaobject"><img src="../images/statetransitions.png"></span></p>

    <p>The historical service check results are examined to determine where state changes/transitions occur. State changes occur when an
    archived state is different from the archived state that immediately precedes it chronologically. Since we keep the results of the last
    21 service checks in the array, there is a possibility of having at most 20 state changes. In this example there are 7 state changes,
    indicated by arrows in the image above.</p>

    <p>The flap detection logic uses the state changes to determine an overall percent state change for the service. This is a measure of
    volatility/change for the service. Services that never change state will have a 0% state change value, while services that change state
    each time they're checked will have 100% state change. Most services will have a percent state change somewhere in between.</p>

    <p>When calculating the percent state change for the service, the flap detection algorithm will give more weight to new state changes
    compare to older ones. Specfically, the flap detection routines are currently designed to make the newest possible state change carry
    50% more weight than the oldest possible state change. The image below shows how recent state changes are given more weight than older
    state changes when calculating the overall or total percent state change for a particular service.</p>

    <p><span class="inlinemediaobject"><img src="../images/statetransitions2.png"></span></p>

    <p>Using the images above, lets do a calculation of percent state change for the service. You will notice that there are a total of 7
    state changes (at t<sub>3</sub>, t<sub>4</sub>, t<sub>5</sub>, t<sub>9</sub>,
    t<sub>12</sub>, t<sub>16</sub>, and t<sub>19</sub>). Without any weighting of the state changes over
    time, this would give us a total state change of 35%:</p>

    <p>(7 observed state changes / possible 20 state changes) * 100 = 35 %</p>

    <p>Since the flap detection logic will give newer state changes a higher rate than older state changes, the actual calculated percent
    state change will be slightly less than 35% in this example. Let's say that the weighted percent of state change turned out to be
    31%...</p>

    <p>The calculated percent state change for the service (31%) will then be compared against flapping thresholds to see what should
    happen:</p>

    <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
        <p>If the service was <span class="emphasis"><em>not</em></span> previously flapping and 31% is <span class="emphasis"><em>equal to or greater than</em></span> the
        high flap threshold, Icinga considers the service to have just started flapping.</p>
      </li>
<li class="listitem">
        <p>If the service <span class="emphasis"><em>was</em></span> previously flapping and 31% is <span class="emphasis"><em>less than</em></span> the low flap threshold,
        Icinga considers the service to have just stopped flapping.</p>
      </li>
</ul></div>

    <p>If neither of those two conditions are met, the flap detection logic won't do anything else with the service, since it is either
    not currently flapping or it is still flapping.</p>
  </div>

  <div class="section" title="7.8.4. Flap Detection for Services">
<div class="titlepage"><div><div><h3 class="title">
<a name="detectionservices"></a>7.8.4. Flap Detection for Services</h3></div></div></div>
    

    <p>Icinga checks to see if a service is flapping whenever the service is checked (either actively or passively).</p>

    <p>The flap detection logic for services works as described in the example above.</p>
  </div>

  <div class="section" title="7.8.5. Flap Detection for Hosts">
<div class="titlepage"><div><div><h3 class="title">
<a name="detectionhosts"></a>7.8.5. Flap Detection for Hosts</h3></div></div></div>
    

    <p>Host flap detection works in a similiar manner to service flap detection, with one important difference: Icinga will
    attempt to check to see if a host is flapping whenever:</p>

    <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
        <p>The host is checked (actively or passively)</p>
      </li>
<li class="listitem">
        <p>Sometimes when a service associated with that host is checked. More specifically, when at least <span class="emphasis"><em>x</em></span> amount
        of time has passed since the flap detection was last performed, where <span class="emphasis"><em>x</em></span> is equal to the average check interval
        of all services associated with the host.</p>
      </li>
</ul></div>

    <p>Why is this done? With services we know that the minimum amount of time between consecutive flap detection routines is going to be
    equal to the service check interval. However, you might not be monitoring hosts on a regular basis, so there might not be a host check
    interval that can be used in the flap detection logic. Also, it makes sense that checking a service should count towards the detection
    of host flapping. Services are attributes of or things associated with host after all... At any rate, that's the best method I could
    come up with for determining how often flap detection could be performed on a host, so there you have it.</p>
  </div>

  <div class="section" title="7.8.6. Flap Detection Thresholds">
<div class="titlepage"><div><div><h3 class="title">
<a name="thresholds"></a>7.8.6. Flap Detection Thresholds</h3></div></div></div>
    

    <p>Icinga uses several variables to determine the percent state change thresholds is uses for flap detection. For both hosts
    and services, there are <span class="emphasis"><em>global</em></span> high and low thresholds and <span class="emphasis"><em>host-</em></span> or
    <span class="emphasis"><em>service-specific</em></span> thresholds that you can configure. Icinga will use the global thresholds for flap detection
    if you to not specify host- or service- specific thresholds.</p>

    <p>The table below shows the global and host- or service-specific variables that control the various thresholds used in flap
    detection.</p>

    <div class="informaltable">
      <table border="1">
<colgroup>
<col>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><p> <span class="bold"><strong>Object Type</strong></span> </p></td>
<td><p> <span class="bold"><strong>Global Variables</strong></span> </p></td>
<td><p> <span class="bold"><strong>Object-Specific Variables</strong></span> </p></td>
</tr>
<tr>
<td><p>Host</p></td>
<td>
<p>
                 

                <a class="link" href="configmain.html#configmain-low_host_flap_threshold">low_host_flap_threshold</a>

                 
              </p> <p>
                 

                <a class="link" href="configmain.html#configmain-high_host_flap_threshold">high_host_flap_threshold</a>

                 
              </p>
</td>
<td>
<p>
                 

                <a class="link" href="objectdefinitions.html#objectdefinitions-host">low_flap_threshold</a>

                 
              </p> <p>
                 

                <a class="link" href="objectdefinitions.html#objectdefinitions-host">high_flap_threshold</a>

                 
              </p>
</td>
</tr>
<tr>
<td><p>Service</p></td>
<td>
<p>
                 

                <a class="link" href="configmain.html#configmain-low_service_flap_threshold">low_service_flap_threshold</a>

                 
              </p> <p>
                 

                <a class="link" href="configmain.html#configmain-high_service_flap_threshold">high_service_flap_threshold</a>

                 
              </p>
</td>
<td>
<p>
                 

                <a class="link" href="objectdefinitions.html#objectdefinitions-service">low_flap_threshold</a>

                 
              </p> <p>
                 

                <a class="link" href="objectdefinitions.html#objectdefinitions-service">high_flap_threshold</a>

                 
              </p>
</td>
</tr>
</tbody>
</table>
    </div>
  </div>

  <div class="section" title="7.8.7. States Used For Flap Detection">
<div class="titlepage"><div><div><h3 class="title">
<a name="statesused"></a>7.8.7. States Used For Flap Detection</h3></div></div></div>
    

    <p>Normally Icinga will track the results of the last 21 checks of a host or service, regardless of the check result
    (host/service state), for use in the flap detection logic.</p>

    <div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top">
      <p>You can exclude certain host or service states from use in flap detection logic by using the
      <span class="emphasis"><em>flap_detection_options</em></span> directive in your host or service definitions. This directive allows you to specify what
      host or service states (i.e. "UP, "DOWN", "OK, "CRITICAL") you want to use for flap detection. If you don't use this directive, all
      host or service states are used in flap detection.</p>
    </td></tr>
</table></div>
  </div>

  <div class="section" title="7.8.8. Flap Handling">
<div class="titlepage"><div><div><h3 class="title">
<a name="handling"></a>7.8.8. Flap Handling</h3></div></div></div>
    

    <p>When a service or host is first detected as flapping, Icinga will:</p>

    <div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
        <p>Log a message indicating that the service or host is flapping.</p>
      </li>
<li class="listitem">
        <p>Add a non-persistent comment to the host or service indicating that it is flapping.</p>
      </li>
<li class="listitem">
        <p>Send a "flapping start" notification for the host or service to appropriate contacts.</p>
      </li>
<li class="listitem">
        <p>Suppress other notifications for the service or host (this is one of the filters in the <a class="link" href="notifications.html" title="5.11. Notifications">notification logic</a>).</p>
      </li>
</ol></div>

    <p>When a service or host stops flapping, Icinga will:</p>

    <div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
        <p>Log a message indicating that the service or host has stopped flapping.</p>
      </li>
<li class="listitem">
        <p>Delete the comment that was originally added to the service or host when it started flapping.</p>
      </li>
<li class="listitem">
        <p>Send a "flapping stop" notification for the host or service to appropriate contacts.</p>
      </li>
<li class="listitem">
        <p>Remove the block on notifications for the service or host (notifications will still be bound to the normal <a class="link" href="notifications.html" title="5.11. Notifications">notification logic</a>).</p>
      </li>
</ol></div>
  </div>

  <div class="section" title="7.8.9. Enabling Flap Detection">
<div class="titlepage"><div><div><h3 class="title">
<a name="enable"></a>7.8.9. Enabling Flap Detection</h3></div></div></div>
    

    <p>In order to enable the flap detection features in Icinga, you'll need to:</p>

    <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
        <p>Set <a class="link" href="configmain.html#configmain-enable_flap_detection">enable_flap_detection</a> directive is set to 1.</p>
      </li>
<li class="listitem">
        <p>Set the <span class="emphasis"><em>flap_detection_enabled</em></span> directive in your host and service definitions is set to 1.</p>
      </li>
</ul></div>

    <p>If you want to disable flap detection on a global basis, set the <a class="link" href="configmain.html#configmain-enable_flap_detection">enable_flap_detection</a> directive to 0.</p>

    <p>If you would like to disable flap detection for just a few hosts or services, use the <span class="emphasis"><em>flap_detection_enabled</em></span>
    directive in the host and/or service definitions to do so.</p>

    <a class="indexterm" name="idm140381624596992"></a>
  </div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="redundancy.html">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="ch07.html">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="escalations.html">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">7.7. Redundant and Failover Network Monitoring </td>
<td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td>
<td width="40%" align="right" valign="top"> 7.9. Notification Escalations</td>
</tr>
</table>
</div>
<P class="copyright">© 1999-2009 Ethan Galstad, 2009-2017 Icinga Development Team, https://www.icinga.com</P>
</body>
</html>