File: eventhandlers.html

package info (click to toggle)
nagios2 2.6-2%2Betch5
  • links: PTS
  • area: main
  • in suites: etch
  • size: 6,856 kB
  • ctags: 4,475
  • sloc: ansic: 64,870; sh: 4,676; makefile: 787; perl: 722
file content (239 lines) | stat: -rw-r--r-- 9,584 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
<title>Event Handlers</title>

<STYLE type="text/css">
<!--
        .Default { font-family: verdana,arial,serif; font-size: 8pt; }
        .PageTitle { font-family: verdana,arial,serif; font-size: 12pt; font-weight: bold; }
-->      
</STYLE>

</head>

<body bgcolor="#FFFFFF" text="black" class="Default">

<p>
<div align="center">
<h2 class="PageTitle">Event Handlers</h2>
</div>
</p>

<hr>

<p>
<strong><u>Introduction</u></strong>
</p>
<p>
Event handlers are optional commands that are executed whenever a host or service state change occurs.  An obvious use for event handlers (especially with services) is the ability for Nagios to proactively fix problems before anyone is notified.  Another potential use for event handlers might be to log service or host events to an external database.
</p>

<p>
<strong><u>Event Handler Types</u></strong>
</p>
<p>
There are two main types of event handlers than can be defined - service event handlers and host event handlers.  Event handler commands are (optionally) defined in each host and service definition.  Because these event handlers are only associated with particular services or hosts, I will call these "local" event handlers.  If a local event handler has been defined for a service or host, it will be executed when that host or service changes state.
</p>
<p>
You may also specify global event handlers that should be run for <i>every</i> host or service state change by using the <a href="configmain.html#global_host_event_handler">global_host_event_handler</a> and <a href="configmain.html#global_service_event_handler">global_service_event_handler</a> options in your main configuration file.  Global event handlers are run immediately <i>prior</i> to running a local service or host event handler.
</p>

<p>
<strong><u>When Are Event Handler Commands Executed?</u></strong>
</p>
<p>
Service and host event handler commands are executed when a service or host:
</p>
<p>
<ul>
<li>is in a "soft" error state
<li>initially goes into a "hard" error state
<li>recovers from a "soft" or "hard" error state
</ul>
</p>
<p>
What are "soft" and "hard" states you ask?  They are described <a href="statetypes.html">here</a> .
</p>

<p>
<strong><u>Event Handler Execution Order</u></strong>
</p>
<p>
Global event handlers are executed before any local event handlers that you have configured for specific hosts or services.  
</p>

<p>
<strong><u>Writing Event Handler Commands</u></strong>
</p>
<p>
In most cases, event handler commands will be shell or perl scripts.  At a minimum, the scripts should take
the following <a href="macros.html">macros</a> as arguments:
</p>
<p>
Service event handler macros: <b>$SERVICESTATE$</b>, <b>$SERVICESTATETYPE$</b>, <b>$SERVICEATTEMPT$</b><br>
Host event handler macros: <b>$HOSTSTATE$</b>, <b>$HOSTSTATETYPE$</b>, <b>$HOSTATTEMPT$</b>
</p>
<p>
The scripts should examine the values of the arguments passed in and take any necessary action based upon those values.  The best way to understand how event handlers should work is to see and example.  Lucky for you, one is provided <a href="#example">below</a>.  There are also some sample event handler scripts included in the <b>eventhandlers/</b> subdirectory of the Nagios distribution.  Some of these sample scripts demonstrate the use of <a href="extcommands.html">external commands</a> to implement <a href="redundancy.html">redundant monitoring hosts</a>.
</p>

<p>
<strong><u>Permissions For Event Handler Commands</u></strong>
</p>
<p>
Any event handler commands you configure will execute with the same permissions as the user under which 
Nagios is running on your machine.  This presents a problem with scripts that attempt to restart system 
services, as root privileges are generally required to do these sorts of tasks.  
</p>
<p>
Ideally you should evaluate the types of event handlers you will be implementing and grant just enough permissions
to the Nagios user for executing the necessary system commands.  You might want to try using <a href="http://www.courtesan.com/sudo/sudo.html">sudo</a> to accomplish this.  Implementation of this is your job, so read the docs and decide if its what you need.
</p>

<p>
<strong><u>Debugging Event Handler Commands</u></strong>
</p>
<p>
When you are debugging event handler commands, I would highly recommend that you enable logging of 
<a href="configmain.html#log_service_retries">service retries</a>, <a href="configmain.html#log_host_retries">host retries</a>, and <a href="configmain.html#log_event_handlers">event handler commands</a>.  All of these logging options are configured in the <a href="configmain.html">main configuration file</a>.  Enabling logging for these options will allow you to see exactly when and why event handler commands are being executed.
</p>
<p>
When you're done debugging your event handler commands you'll probably want to disable logging of service and host
retries.  They can fill up your log file fast, but if you have enabled <a href="configmain.html#log_rotation_method">log rotation</a> you might not care.
</p>

<a name="example"></a>
<p>
<strong><u>Service Event Handler Example</u></strong>
</p>
<p>
The example below assumes that you are monitoring the HTTP server on the local machine and have specified <b>restart-httpd</b> as the event handler command for the HTTP service definition.  Also, I will be assuming that you have set the &lt;max_check_attempts&gt; option for the service to be a value of 4 or greater (i.e. the service is checked 4 times before it is considered to have a real problem).  An example service definition (w/ only the fields we discuss) might look like this...
</p>

<p>
<font color="red"><strong>
<pre>
define service{
	host_name			somehost
	service_description		HTTP
	max_check_attempts		4
	event_handler			restart-httpd
	<i>...other service variables...</i>
	}
</pre>
</strong></font>
</p>


<p>
Once the service has been defined with an event handler, we must define that event handler as a command.  Notice the macros in the command line that I am passing to the event handler - these are important!
</p>

<p>
<font color="red"><strong>
<pre>
define command{
	command_name	restart-httpd
	command_line	/usr/local/nagios/libexec/eventhandlers/restart-httpd  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
	}
</pre>
</strong></font>
</p>

<p>
Now, let's actually write the event handler script (this is the <b>/usr/local/nagios/libexec/eventhandlers/restart-httpd</b> file).
</p>
<p>
<table border=1>
<tr>
<td>
<font size=-1>
<pre>
#!/bin/sh
#
# Event handler script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
#       retried 3 times (in a "soft" state) or if the web service somehow
#       manages to fall into a "hard" error state.
#


# What state is the HTTP service in?
case "$1" in
OK)
	# The service just came back up, so don't do anything...
	;;
WARNING)
	# We don't really care about warning states, since the service is probably still running...
	;;
UNKNOWN)
	# We don't know what might be causing an unknown error, so don't do anything...
	;;
CRITICAL)
	# Aha!  The HTTP service appears to have a problem - perhaps we should restart the server...

	# Is this a "soft" or a "hard" state?
	case "$2" in
		
	# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
	# check before it turns into a "hard" state and contacts get notified...
	SOFT)
			
		# What check attempt are we on?  We don't want to restart the web server on the first
		# check, because it may just be a fluke!
		case "$3" in
				
		# Wait until the check has been tried 3 times before restarting the web server.
		# If the check fails on the 4th time (after we restart the web server), the state
		# type will turn to "hard" and contacts will be notified of the problem.
		# Hopefully this will restart the web server successfully, so the 4th check will
		# result in a "soft" recovery.  If that happens no one gets notified because we
		# fixed the problem!
		3)
			echo -n "Restarting HTTP service (3rd soft critical state)..."
			# Call the init script to restart the HTTPD server
			/etc/rc.d/init.d/httpd restart
			;;
			esac
		;;
				
	# The HTTP service somehow managed to turn into a hard error without getting fixed.
	# It should have been restarted by the code above, but for some reason it didn't.
	# Let's give it one last try, shall we?  
	# Note: Contacts have already been notified of a problem with the service at this
	# point (unless you disabled notifications for this service)
	HARD)
		echo -n "Restarting HTTP service..."
		# Call the init script to restart the HTTPD server
		/etc/rc.d/init.d/httpd restart
		;;
	esac
	;;
esac
exit 0
</pre>
</font>
</td>
</tr>
</table>
</p>

<p>
The sample script provided above will attempt to restart the web server on the local machine in two different instances - after the HTTP service is being retried for the 3rd time (in an "soft" error state) and after the service falls into a 
"hard" state.  The "hard" state situation shouldn't really occur, since the script should restart the service 
when its still in a "soft" state (i.e. the 3rd check retry), but its left as a fallback anyway.
</p>
<p>
It should be noted that the service event handler will only be execute the first time that the service falls 
into a "hard" state.  This will prevent Nagios from continuously executing the script to restart the web
server when it is in a "hard" state.
</p>


<hr>

</body>
</html>