1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Service and Host Result Freshness Checks</title>
<STYLE type="text/css">
<!--
.Default { font-family: verdana,arial,serif; font-size: 8pt; }
.PageTitle { font-family: verdana,arial,serif; font-size: 12pt; font-weight: bold; }
-->
</STYLE>
</head>
<body bgcolor="#FFFFFF" text="black" class="Default">
<p>
<div align="center">
<h2 class="PageTitle">Service and Host Result Freshness Checks</h2>
</div>
</p>
<hr>
<p>
<strong><u>Introduction</u></strong>
</p>
<p>
Nagios supports a feature that does "freshness" checking on the results of host and service checks. This feature is useful when you want to ensure that <a href="passivechecks.html">passive checks</a> are being received as frequently as you want. Although freshness checking can be used in a number of situations, it is primarily useful when attempting to configure a <a href="distributed.html">distributed monitoring environment</a>.
</p>
<p>
The purpose of "freshness" checking is to ensure that host and service checks are being provided passively by external applications on a regular basis. If the results of a particular host or service check (for which freshness checking has been enabled) is determined to be "stale", Nagios will force an active check of that host or service.
</p>
<p>
<strong><u>Host vs. Service Freshness Checking</u></strong>
</p>
<p>
The documentation below discusses service freshness checking. Host freshness checking (which is not documented seperately) works in a similiar way to service freshness checking - except, of course, that its for hosts instead of services. If you need to configure host freshness checking, adjust the directions given below appropriately.
</p>
<p>
<strong><u>Configuring Service Freshness Checking</u></strong>
</p>
<p>
Before you configure per-service freshness threshold, you must enable freshness checking using the <a href="configmain.html#check_service_freshness">check_service_freshness</a> and <a href="configmain.html#service_freshness_check_interval">service_freshness_check_interval</a> directives in the main config file. If you were configuring host freshness checking, you would use the <a href="configmain.html#check_host_freshness">check_host_freshness</a> and <a href="configmain.html#host_freshness_check_interval">host_freshness_check_interval</a> directives.
</p>
<p>
So how do you go about enabling freshness checking for a particular service? You need to configure <a href="xodtemplate.html#service">service definitions</a> as follows.
</p>
<p>
<ul>
<li>The <b>check_freshness</b> option in the service definition should be set to 1. This enables "freshness" checking for the service.
<li>The <b>freshness_threshold</b> option in the service definition should be set to a value (in seconds) which reflects how "fresh" the results for the service should be.
<li>The <b>check_command</b> option in the service definition should reflect valid command that should be used to actively check the service when it is detected as being "stale".
<li>The <b>normal_check_interval</b> option in the service definition needs to be greater than zero (0) if the <b>freshness_threshold</b> option is setup to zero (0).
<li>The <b>check_period</b> option in the service definitions needs to be set to a valid timeperiod. The times allowed by the specified timeperiod determine when freshness checks can be performed for the service.
</ul>
</p>
<p>
<strong><u>How The Freshness Threshold Works</u></strong>
</p>
<p>
Nagios periodically checks the "freshness" of the results for all services that have freshness checking enabled. The <i>freshness_threshold</i> option in each service definition is used to determine how "fresh" the results for each service should be. For example, if you set the <i>freshness_threshold</i> option to 60 for one of your services, Nagios will consider that service to be "stale" if its results are older than 60 seconds (1 minute). If you do not specify a value for the <i>freshness_threshold</i> option (or you set it to zero), Nagios will automatically calculate a "freshness" threshold to use by looking at either the <i>normal_check_interval</i> or <i>retry_check_interval</i> options (depending on what <a href="statetypes.html">type of state</a> the service is currently in).
</p>
<p>
<strong><u>What Happens When A Service Check Result Becomes "Stale"</u></strong>
</p>
<p>
If the check results of a service are found to be "stale" (as described above), Nagios will force an active check of the service by executing the command specified by the <i>check_command</i> option in the service definition. It is important to note that an active service check which is being forced because the service was detected as being "stale" gets executed <i>even if active service checks are disabled on a program-wide or service-specific basis</i>.
</p>
<p>
<strong><u>Working With Passive-Only Checks</u></strong>
</p>
<p>
As I mentioned earlier, freshness checking is of most use when you are dealing with services that get their results from <a href="passivechecks.html">passive checks</a>. More often than not (as in the case with <a href="distributed.html">distributed monitoring setups</a>), these services may not be getting <i>all</i> of their results from passive checks - no results are obtained from active checks.
</p>
<p>
An example of a passive-only service might be one that reports the status of your nightly backup jobs. Perhaps you have a external script that submit the results of the backup job to Nagios once the backup is completed. In this case, all of the checks/results for the service are provided by an external application using passive checks. In order to ensure that the status of the backup job gets reported every day, you may want to enable freshness checking for the service. If the external script doesn't submit the results of the backup job, you can have Nagios fake a critical result by doing something like this...
</p>
<p>
Here's what the definition for the service might look like (some required options are omitted)...
</p>
<p>
<strong>
<font color="red">
<pre>
define service{
host_name backup-server
service_description ArcServe Backup Job
active_checks_enabled 0 ; active checks are NOT enabled
passive_checks_enabled 1 ; passive checks are enabled (this is how results are reported)
check_freshness 1
freshness_threshold 93600 ; 26 hour threshold, since backups may not always finish at the same time
check_command no-backup-report ; this command is run only if the service results are "stale"
...other options...
}
</pre>
</font>
</strong>
</p>
<p>
Notice that active checks are disabled for the service. This is because the results for the service are only made by an external application using passive checks. Freshness checking is enabled and the freshness threshold has been set to 26 hours. This is a bit longer than 24 hours because backup jobs sometimes run late from day to day (depending on how much data there is to backup, how much network traffic is present, etc.). The <i>no-backup-report</i> command is executed only if the results of the service are determined to be "stale". The definition of the <i>no-backup-report</i> command might look like this...
</p>
<p>
<strong>
<font color="red">
<pre>
define command{
command_name no-backup-report
command_line /usr/local/nagios/libexec/nobackupreport.sh
}
</pre>
</font>
</strong>
</p>
<p>
The <b>nobackupreport.sh</b> script in your <i>/usr/local/nagios/libexec</i> directory might look something like this:
</p>
<p>
<dir>
<table border=1>
<tr>
<td>
<pre>
#!/bin/sh
/bin/echo "CRITICAL: Results of backup job were not reported!"
exit 2
</pre>
</td>
</tr>
</table>
</dir>
</p>
<p>
If Nagios detects that the service results are stale, it will run the <b>no-backup-report</b> command as an active service check (even though active checks are disabled for this specific service - remember that this is a special case). This causes the <i>/usr/local/nagios/libexec/nobackupreport.sh</i> script to be executed, which returns a critical state. The service go into to a critical state (if it isn't already there) and someone will probably get notified of the problem.
</p>
<hr>
</body>
</html>
|