File: freshness.html

package info (click to toggle)
nagios2 2.6-2%2Betch5
  • links: PTS
  • area: main
  • in suites: etch
  • size: 6,856 kB
  • ctags: 4,475
  • sloc: ansic: 64,870; sh: 4,676; makefile: 787; perl: 722
file content (165 lines) | stat: -rw-r--r-- 8,329 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
<title>Service and Host Result Freshness Checks</title>

<STYLE type="text/css">
<!--
        .Default { font-family: verdana,arial,serif; font-size: 8pt; }
        .PageTitle { font-family: verdana,arial,serif; font-size: 12pt; font-weight: bold; }
-->      
</STYLE>

</head>

<body bgcolor="#FFFFFF" text="black" class="Default">

<p>
<div align="center">
<h2 class="PageTitle">Service and Host Result Freshness Checks</h2>
</div>
</p>

<hr>

<p>
<strong><u>Introduction</u></strong>
</p>

<p>
Nagios supports a feature that does "freshness" checking on the results of host and service checks.  This feature is useful when you want to ensure that <a href="passivechecks.html">passive checks</a> are being received as frequently as you want.  Although freshness checking can be used in a number of situations, it is primarily useful when attempting to configure a <a href="distributed.html">distributed monitoring environment</a>.
</p>

<p>
The purpose of "freshness" checking is to ensure that host and service checks are being provided passively by external applications on a regular basis.  If the results of a particular host or service check (for which freshness checking has been enabled) is determined to be "stale", Nagios will force an active check of that host or service.
</p>

<p>
<strong><u>Host vs. Service Freshness Checking</u></strong>
</p>

<p>
The documentation below discusses service freshness checking.  Host freshness checking (which is not documented seperately) works in a similiar way to service freshness checking - except, of course, that its for hosts instead of services.  If you need to configure host freshness checking, adjust the directions given below appropriately.
</p>

<p>
<strong><u>Configuring Service Freshness Checking</u></strong>
</p>

<p>
Before you configure per-service freshness threshold, you must enable freshness checking using the <a href="configmain.html#check_service_freshness">check_service_freshness</a> and <a href="configmain.html#service_freshness_check_interval">service_freshness_check_interval</a> directives in the main config file.  If you were configuring host freshness checking, you would use the <a href="configmain.html#check_host_freshness">check_host_freshness</a> and <a href="configmain.html#host_freshness_check_interval">host_freshness_check_interval</a> directives.  
</p>

<p>
So how do you go about enabling freshness checking for a particular service?  You need to configure <a href="xodtemplate.html#service">service definitions</a> as follows.
</p>

<p>
<ul>
<li>The <b>check_freshness</b> option in the service definition should be set to 1.  This enables "freshness" checking for the service.
<li>The <b>freshness_threshold</b> option in the service definition should be set to a value (in seconds) which reflects how "fresh" the results for the service should be.
<li>The <b>check_command</b> option in the service definition should reflect valid command that should be used to actively check the service when it is detected as being "stale".
<li>The <b>normal_check_interval</b> option in the service definition needs to be greater than zero (0) if the <b>freshness_threshold</b> option is setup to zero (0).
<li>The <b>check_period</b> option in the service definitions needs to be set to a valid timeperiod.  The times allowed by the specified timeperiod determine when freshness checks can be performed for the service.
</ul>
</p>

<p>
<strong><u>How The Freshness Threshold Works</u></strong>
</p>

<p>
Nagios periodically checks the "freshness" of the results for all services that have freshness checking enabled.  The <i>freshness_threshold</i> option in each service definition is used to determine how "fresh" the results for each service should be.  For example, if you set the <i>freshness_threshold</i> option to 60 for one of your services, Nagios will consider that service to be "stale" if its results are older than 60 seconds (1 minute).  If you do not specify a value for the <i>freshness_threshold</i> option (or you set it to zero), Nagios will automatically calculate a "freshness" threshold to use by looking at either the <i>normal_check_interval</i> or <i>retry_check_interval</i> options (depending on what <a href="statetypes.html">type of state</a> the service is currently in).
</p>

<p>
<strong><u>What Happens When A Service Check Result Becomes "Stale"</u></strong>
</p>

<p>
If the check results of a service are found to be "stale" (as described above), Nagios will force an active check of the service by executing the command specified by the <i>check_command</i> option in the service definition.  It is important to note that an active service check which is being forced because the service was detected as being "stale" gets executed <i>even if active service checks are disabled on a program-wide or service-specific basis</i>.
</p>

<p>
<strong><u>Working With Passive-Only Checks</u></strong>
</p>

<p>
As I mentioned earlier, freshness checking is of most use when you are dealing with services that get their results from <a href="passivechecks.html">passive checks</a>.  More often than not (as in the case with <a href="distributed.html">distributed monitoring setups</a>), these services may not be getting <i>all</i> of their results from passive checks - no results are obtained from active checks.
</p>

<p>
An example of a passive-only service might be one that reports the status of your nightly backup jobs.  Perhaps you have a external script that submit the results of the backup job to Nagios once the backup is completed.  In this case, all of the checks/results for the service are provided by an external application using passive checks.  In order to ensure that the status of the backup job gets reported every day, you may want to enable freshness checking for the service.  If the external script doesn't submit the results of the backup job, you can have Nagios fake a critical result by doing something like this...
</p>

<p>
Here's what the definition for the service might look like (some required options are omitted)...
</p>

<p>
<strong>
<font color="red">
<pre>
define service{
	host_name		backup-server
	service_description	ArcServe Backup Job
	active_checks_enabled	0			; active checks are NOT enabled
	passive_checks_enabled	1			; passive checks are enabled (this is how results are reported)
	check_freshness		1
	freshness_threshold	93600			; 26 hour threshold, since backups may not always finish at the same time
	check_command		no-backup-report	; this command is run only if the service results are "stale"
	...other options...
	}
</pre>
</font>
</strong>
</p>

<p>
Notice that active checks are disabled for the service.  This is because the results for the service are only made by an external application using passive checks.  Freshness checking is enabled and the freshness threshold has been set to 26 hours.  This is a bit longer than 24 hours because backup jobs sometimes run late from day to day (depending on how much data there is to backup, how much network traffic is present, etc.).  The <i>no-backup-report</i> command is executed only if the results of the service are determined to be "stale".  The definition of the <i>no-backup-report</i> command might look like this...
</p>

<p>
<strong>
<font color="red">
<pre>
define command{
	command_name	no-backup-report
	command_line	/usr/local/nagios/libexec/nobackupreport.sh
	}
</pre>
</font>
</strong>
</p>

<p>
The <b>nobackupreport.sh</b> script in your <i>/usr/local/nagios/libexec</i> directory might look something like this:
</p>

<p>
<dir>
<table border=1>
<tr>
<td>
<pre>
#!/bin/sh

/bin/echo "CRITICAL: Results of backup job were not reported!"

exit 2
</pre>
</td>
</tr>
</table>
</dir>
</p>

<p>
If Nagios detects that the service results are stale, it will run the <b>no-backup-report</b> command as an active service check (even though active checks are disabled for this specific service - remember that this is a special case).  This causes the <i>/usr/local/nagios/libexec/nobackupreport.sh</i> script to be executed, which returns a critical state.  The service go into to a critical state (if it isn't already there) and someone will probably get notified of the problem.
</p>

<hr>

</body>
</html>