File: distributed.html

package info (click to toggle)
icinga 1.0.2-2%2Bsqueeze1
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 33,952 kB
  • ctags: 13,294
  • sloc: xml: 154,821; ansic: 99,198; sh: 14,585; sql: 5,852; php: 5,126; perl: 2,838; makefile: 1,268
file content (382 lines) | stat: -rw-r--r-- 25,032 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Distributed Monitoring</title>
<meta name="generator" content="DocBook XSL Stylesheets V1.75.1">
<meta name="keywords" content="Supervision, Icinga, Nagios, Linux">
<link rel="home" href="index.html" title="Icinga Version 1.0.2 Documentation">
<link rel="up" href="ch06.html" title="Chapter 6. Advanced Topics">
<link rel="prev" href="freshness.html" title="Service and Host Freshness Checks">
<link rel="next" href="redundancy.html" title="Redundant and Failover Network Monitoring">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<CENTER><IMG src="../images/logofullsize.png" border="0" alt="Icinga" title="Icinga"></CENTER>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr><th colspan="3" align="center">Distributed Monitoring</th></tr>
<tr>
<td width="20%" align="left">
<a accesskey="p" href="freshness.html">Prev</a> </td>
<th width="60%" align="center">Chapter 6. Advanced Topics</th>
<td width="20%" align="right"> <a accesskey="n" href="redundancy.html">Next</a>
</td>
</tr>
</table>
<hr>
</div>
<div class="section" title="Distributed Monitoring">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="distributed"></a><a name="distributed_monitoring"></a>Distributed Monitoring</h2></div></div></div>
  

  <p><span class="bold"><strong>Introduction</strong></span></p>

  <p>Icinga can be configured to support distributed monitoring of network services and resources. We'll try to briefly
  explan how this can be accomplished...</p>

  <p><span class="bold"><strong>Goals</strong></span></p>

  <p>The goal in the distributed monitoring environment that we will describe is to offload the overhead (CPU usage, etc.) of
  performing service checks from a "central" server onto one or more "distributed" servers. Most small to medium sized shops will
  not have a real need for setting up such an environment. However, when you want to start monitoring hundreds or even thousands
  of <span class="emphasis"><em>hosts</em></span> (and several times that many services) using Icinga, this becomes quite important.</p>

  <p><span class="bold"><strong>Reference Diagram</strong></span></p>

  <p>The diagram below should help give you a general idea of how distributed monitoring works with Icinga. We'll be
  referring to the items shown in the diagram as we explain things...</p>

  <p><span class="inlinemediaobject"><img src="../images/distributed.png"></span></p>

  <p><span class="bold"><strong>Central Server vs. Distributed Servers</strong></span></p>

  <p>When setting up a distributed monitoring environment with Icinga, there are differences in the way the central and
  distributed servers are configured. We'll show you how to configure both types of servers and explain what effects the changes
  being made have on the overall monitoring. For starters, lets describe the purpose of the different types of servers...</p>

  <p>The function of a <span class="emphasis"><em>distributed server</em></span> is to actively perform checks all the services you define for a
  "cluster" of hosts. We use the term "cluster" loosely - it basically just mean an arbitrary group of hosts on your network.
  Depending on your network layout, you may have several cluters at one physical location, or each cluster may be separated by a
  WAN, its own firewall, etc. The important thing to remember to that for each cluster of hosts (however you define that), there
  is one distributed server that runs Icinga and monitors the services on the hosts in the cluster. A distributed server is
  usually a bare-bones installation of Icinga. It doesn't have to have the web interface installed, send out notifications,
  run event handler scripts, or do anything other than execute service checks if you don't want it to. More detailed information
  on configuring a distributed server comes later...</p>

  <p>The purpose of the <span class="emphasis"><em>central server</em></span> is to simply listen for service check results from one or more
  distributed servers. Even though services are occassionally actively checked from the central server, the active checks are only
  performed in dire circumstances, so lets just say that the central server only accepts passive check for now. Since the central
  server is obtaining <a class="link" href="passivechecks.html" title="Passive Checks">passive service check</a> results from one or more distributed servers, it
  serves as the focal point for all monitoring logic (i.e. it sends out notifications, runs event handler scripts, determines host
  states, has the web interface installed, etc).</p>

  <p><span class="bold"><strong>Obtaining Service Check Information From Distributed Monitors</strong></span></p>

  <p>Okay, before we go jumping into configuration detail we need to know how to send the service check results from the
  distributed servers to the central server. We've already discussed how to submit passive check results to Icinga from
  same host that Icinga is running on (as described in the documentation on <a class="link" href="passivechecks.html" title="Passive Checks">passive
  checks</a>), but we haven't given any info on how to submit passive check results from other hosts.</p>

  <p>In order to facilitate the submission of passive check results to a remote host, the <a class="link" href="addons.html#addons-nsca">nsca
  addon</a> was written. The addon consists of two pieces. The first is a client program (send_nsca) which is run from a remote
  host and is used to send the service check results to another server. The second piece is the nsca daemon (nsca) which either
  runs as a standalone daemon or under inetd and listens for connections from client programs. Upon receiving service check
  information from a client, the daemon will sumbit the check information to Icinga (on the central server) by inserting a
  <span class="emphasis"><em>PROCESS_SVC_CHECK_RESULT</em></span> command into the <a class="link" href="configmain.html#configmain-command_file">external command
  file</a>, along with the check results. The next time Icinga checks for <a class="link" href="extcommands.html" title="External Commands">external
  commands</a>, it will find the passive service check information that was sent from the distributed server and process it.
  Easy, huh?</p>

  <p><span class="bold"><strong>Distributed Server Configuration</strong></span></p>

  <p>So how exactly is Icinga configured on a distributed server? Basically, its just a bare-bones installation. You
  don't need to install the web interface or have notifications sent out from the server, as this will all be handled by the
  central server.</p>

  <p>Key configuration changes:</p>

  <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
      <p>Only those services and hosts which are being monitored directly by the distributed server are defined in the <a class="link" href="configobject.html" title="Object Configuration Overview">object configuration file</a>.</p>
    </li>
<li class="listitem">
      <p>The distributed server has its <a class="link" href="configmain.html#configmain-enable_notifications">enable_notifications</a> directive
      set to 0. This will prevent any notifications from being sent out by the server.</p>
    </li>
<li class="listitem">
      <p>The distributed server is configured to <a class="link" href="configmain.html#configmain-obsess_over_services">obsess over
      services</a>.</p>
    </li>
<li class="listitem">
      <p>The distributed server has an <a class="link" href="configmain.html#configmain-ocsp_command">ocsp command</a> defined (as described
      below).</p>
    </li>
</ul></div>

  <p>In order to make everything come together and work properly, we want the distributed server to report the results of
  <span class="emphasis"><em>all</em></span> service checks to Icinga. We could use <a class="link" href="eventhandlers.html" title="Event Handlers">event handlers</a> to
  report <span class="emphasis"><em>changes</em></span> in the state of a service, but that just doesn't cut it. In order to force the distributed
  server to report all service check results, you must enabled the <a class="link" href="configmain.html#configmain-obsess_over_services">obsess_over_services</a> option in the main configuration file and provide a <a class="link" href="configmain.html#configmain-ocsp_command">ocsp_command</a> to be run after every service check. We will use the ocsp command to send
  the results of all service checks to the central server, making use of the send_nsca client and nsca daemon (as described above)
  to handle the tranmission.</p>

  <p>In order to accomplish this, you'll need to define an ocsp command like this:</p>

  <p><span class="bold"><strong>ocsp_command=submit_check_result</strong></span></p>

  <p>The command definition for the <span class="emphasis"><em>submit_check_result</em></span> command looks something like this:</p>

  <pre class="screen"> define command{
      command_name    submit_check_result 
      command_line    /usr/local/icinga/libexec/eventhandlers/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
 } </pre>

  <p>The <span class="emphasis"><em>submit_check_result</em></span> shell scripts looks something like this (replace
  <span class="emphasis"><em>central_server</em></span> with the IP address of the central server):</p>

  <pre class="screen">#!/bin/sh

# Arguments:
#  $1 = host_name (Short name of host that the service is
#       associated with)
#  $2 = svc_description (Description of the service)
#  $3 = state_string (A string representing the status of
#       the given service - "OK", "WARNING", "CRITICAL"
#       or "UNKNOWN")
#  $4 = plugin_output (A text string that should be used
#       as the plugin output for the service checks)
#

# Convert the state string to the corresponding return code
return_code=-1

case "$3" in
    OK)
        return_code=0
        ;;
    WARNING)
        return_code=1
        ;;
    CRITICAL)
        return_code=2
        ;;
    UNKNOWN)
        return_code=-1
        ;;
esac

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/local/icinga/bin/send_nsca -H <span class="emphasis"><em> central_server</em></span> -c /usr/local/icinga/etc/send_nsca.cfg</pre>

  <p>The script above assumes that you have the send_nsca program and it configuration file (send_nsca.cfg) located in the
  <span class="emphasis"><em>/usr/local/icinga/bin/</em></span> and <span class="emphasis"><em>/usr/local/icinga/etc/</em></span> directories, respectively.</p>

  <p>That's it! We've sucessfully configured a remote host running Icinga to act as a distributed monitoring server.
  Let's go over exactly what happens with the distributed server and how it sends service check results to Icinga (the
  steps outlined below correspond to the numbers in the reference diagram above):</p>

  <div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
      <p>After the distributed server finishes executing a service check, it executes the command you defined by the <a class="link" href="configmain.html#configmain-ocsp_command">ocsp_command</a> variable. In our example, this is the
      <span class="emphasis"><em>/usr/local/icinga/libexec/eventhandlers/submit_check_result</em></span> script. Note that the definition for the
      <span class="emphasis"><em>submit_check_result</em></span> command passed four pieces of information to the script: the name of the host the
      service is associated with, the service description, the return code from the service check, and the plugin output from the
      service check.</p>
    </li>
<li class="listitem">
      <p>The <span class="emphasis"><em>submit_check_result</em></span> script pipes the service check information (host name, description,
      return code, and output) to the <span class="emphasis"><em>send_nsca</em></span> client program.</p>
    </li>
<li class="listitem">
      <p>The <span class="emphasis"><em>send_nsca</em></span> program transmits the service check information to the <span class="emphasis"><em>nsca</em></span>
      daemon on the central monitoring server.</p>
    </li>
<li class="listitem">
      <p>The <span class="emphasis"><em>nsca</em></span> daemon on the central server takes the service check information and writes it to the
      external command file for later pickup by Icinga.</p>
    </li>
<li class="listitem">
      <p>The Icinga process on the central server reads the external command file and processes the passive service
      check information that originated from the distributed monitoring server.</p>
    </li>
</ol></div>

  <p><span class="bold"><strong>Central Server Configuration</strong></span></p>

  <p>We've looked at how distributed monitoring servers should be configured, so let's turn to the central server. For all
  intensive purposes, the central is configured as you would normally configure a standalone server. It is setup as
  follows:</p>

  <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
      <p>The central server has the web interface installed (optional, but recommended)</p>
    </li>
<li class="listitem">
      <p>The central server has its <a class="link" href="configmain.html#configmain-enable_notifications">enable_notifications</a> directive set
      to 1. This will enable notifications. (optional, but recommended)</p>
    </li>
<li class="listitem">
      <p>The central server has <a class="link" href="configmain.html#configmain-execute_service_checks">active service checks</a> disabled
      (optional, but recommended - see notes below)</p>
    </li>
<li class="listitem">
      <p>The central server has <a class="link" href="configmain.html#configmain-check_external_commands">external command checks</a> enabled
      (required)</p>
    </li>
<li class="listitem">
      <p>The central server has <a class="link" href="configmain.html#configmain-accept_passive_service_checks">passive service checks</a> enabled
      (required)</p>
    </li>
</ul></div>

  <p>There are three other very important things that you need to keep in mind when configuring the central server:</p>

  <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
      <p>The central server must have service definitions for <span class="emphasis"><em>all services</em></span> that are being monitored by all
      the distributed servers. Icinga will ignore passive check results if they do not correspond to a service that has
      been defined.</p>
    </li>
<li class="listitem">
      <p>If you're only using the central server to process services whose results are going to be provided by distributed
      hosts, you can simply disable all active service checks on a program-wide basis by setting the <a class="link" href="configmain.html#configmain-execute_service_checks">execute_service_checks</a> directive to 0. If you're using the central server
      to actively monitor a few services on its own (without the aid of distributed servers), the
      <span class="emphasis"><em>enable_active_checks</em></span> option of the defintions for service being monitored by distributed servers should
      be set to 0. This will prevent Icinga from actively checking those services.</p>
    </li>
</ul></div>

  <p>It is important that you either disable all service checks on a program-wide basis or disable the
  <span class="emphasis"><em>enable_active_checks</em></span> option in the definitions for each service that is monitored by a distributed server.
  This will ensure that active service checks are never executed under normal circumstances. The services will keep getting
  rescheduled at their normal check intervals (3 minutes, 5 minutes, etc...), but the won't actually be executed. This
  rescheduling loop will just continue all the while Icinga is running. We'll explain why this is done in a bit...</p>

  <p>That's it! Easy, huh?</p>

  <p><span class="bold"><strong>Problems With Passive Checks</strong></span></p>

  <p>For all intensive purposes we can say that the central server is relying solely on passive checks for monitoring. The main
  problem with relying completely on passive checks for monitoring is the fact that Icinga must rely on something else to
  provide the monitoring data. What if the remote host that is sending in passive check results goes down or becomes unreachable?
  If Icinga isn't actively checking the services on the host, how will it know that there is a problem?</p>

  <p>Fortunately, there is a way we can handle these types of problems...</p>

  <p><span class="bold"><strong>Freshness Checking</strong></span></p>

  <p>Icinga supports a feature that does "freshness" checking on the results of service checks. More information
  freshness checking can be found <a class="link" href="freshness.html" title="Service and Host Freshness Checks">here</a>. This features gives some protection against situations
  where remote hosts may stop sending passive service checks into the central monitoring server. The purpose of "freshness"
  checking is to ensure that service checks are either being provided passively by distributed servers on a regular basis or
  performed actively by the central server if the need arises. If the service check results provided by the distributed servers
  get "stale", Icinga can be configured to force active checks of the service from the central monitoring host.</p>

  <p>So how do you do this? On the central monitoring server you need to configure services that are being monitoring by
  distributed servers as follows...</p>

  <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
      <p>The <span class="emphasis"><em>check_freshness</em></span> option in the service definitions should be set to 1. This enables
      "freshness" checking for the services.</p>
    </li>
<li class="listitem">
      <p>The <span class="emphasis"><em>freshness_threshold</em></span> option in the service definitions should be set to a value (in seconds)
      which reflects how "fresh" the results for the services (provided by the distributed servers) should be.</p>
    </li>
<li class="listitem">
      <p>The <span class="emphasis"><em>check_command</em></span> option in the service definitions should reflect valid commands that can be
      used to actively check the service from the central monitoring server.</p>
    </li>
</ul></div>

  <p>Icinga periodically checks the "freshness" of the results for all services that have freshness checking enabled.
  The <span class="emphasis"><em>freshness_threshold</em></span> option in each service definition is used to determine how "fresh" the results for
  each service should be. For example, if you set this value to 300 for one of your services, Icinga will consider the
  service results to be "stale" if they're older than 5 minutes (300 seconds). If you do not specify a value for the
  <span class="emphasis"><em>freshness_threshold</em></span> option, Icinga will automatically calculate a "freshness" threshold by looking
  at either the <span class="emphasis"><em>normal_check_interval</em></span> or <span class="emphasis"><em>retry_check_interval</em></span> options (depending on what
  <a class="link" href="statetypes.html" title="State Types">type of state</a> the service is in). If the service results are found to be "stale",
  Icinga will run the service check command specified by the <span class="emphasis"><em>check_command</em></span> option in the service
  definition, thereby actively checking the service.</p>

  <p>Remember that you have to specify a <span class="emphasis"><em>check_command</em></span> option in the service definitions that can be used
  to actively check the status of the service from the central monitoring server. Under normal circumstances, this check command
  is never executed (because active checks were disabled on a program-wide basis or for the specific services). When freshness
  checking is enabled, Icinga will run this command to actively check the status of the service <span class="emphasis"><em>even if active
  checks are disabled on a program-wide or service-specific basis</em></span>.</p>

  <p>If you are unable to define commands to actively check a service from the central monitoring host (or if turns out to be a
  major pain), you could simply define all your services with the <span class="emphasis"><em>check_command</em></span> option set to run a dummy
  script that returns a critical status. Here's an example... Let's assume you define a command called 'service-is-stale' and use
  that command name in the <span class="emphasis"><em>check_command</em></span> option of your services. Here's what the definition would look
  like...</p>

  <pre class="screen"> define command{ 
     command_name service-is-stale 
     command_line /usr/local/icinga/libexec/check_dummy 2 "CRITICAL: Service results are stale"
 }</pre>

  <p>When Icinga detects that the service results are stale and runs the <span class="bold"><strong>service-is-stale</strong></span> command, the <span class="emphasis"><em>check_dummy</em></span> plugin is executed and the service will go
  into a critical state. This would likely cause notifications to be sent out, so you'll know that there's a problem.</p>

  <p><span class="bold"><strong>Performing Host Checks</strong></span></p>

  <p>At this point you know how to obtain service check results passivly from distributed servers. This means that the central
  server is not actively checking services on its own. But what about host checks? You still need to do them, so how?</p>

  <p>Since host checks usually compromise a small part of monitoring activity (they aren't done unless absolutely necessary),
  we'd recommend that you perform host checks actively from the central server. That means that you define host checks on the
  central server the same way that you do on the distributed servers (and the same way you would in a normal, non-distributed
  setup).</p>

  <p>Passive host checks are available (read <a class="link" href="passivechecks.html" title="Passive Checks">here</a>), so you could use them in your
  distributed monitoring setup, but they suffer from a few problems. The biggest problem is that Icinga does not translate
  passive host check problem states (DOWN and UNREACHABLE) when they are processed. This means that if your monitoring servers
  have a different parent/child host structure (and they will, if you monitoring servers are in different locations), the central
  monitoring server will have an inaccurate view of host states.</p>

  <p>If you do want to send passive host checks to a central server in your distributed monitoring setup, make sure:</p>

  <div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
      <p>The central server has <a class="link" href="configmain.html#configmain-accept_passive_host_checks">passive host checks</a> enabled
      (required)</p>
    </li>
<li class="listitem">
      <p>The distributed server is configured to <a class="link" href="configmain.html#configmain-obsess_over_hosts">obsess over hosts</a>.</p>
    </li>
<li class="listitem">
      <p>The distributed server has an <a class="link" href="configmain.html#configmain-ochp_command">ochp command</a> defined.</p>
    </li>
</ul></div>

  <p>The ochp command, which is used for processing host check results, works in a similiar manner to the ocsp command, which
  is used for processing service check results (see documentation above). In order to make sure passive host check results are up
  to date, you'll want to enable <a class="link" href="freshness.html" title="Service and Host Freshness Checks">freshness checking</a> for hosts (similiar to what is described
  above for services).</p>
  <a class="indexterm" name="id1992957"></a>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="freshness.html">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="ch06.html">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="redundancy.html">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Service and Host Freshness Checks </td>
<td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td>
<td width="40%" align="right" valign="top"> Redundant and Failover Network Monitoring</td>
</tr>
</table>
</div>
<P class="copyright">© 2009-2010 Icinga Development Team, http://www.icinga.org</P>
</body>
</html>