File: README.monitors

package info (click to toggle)
mon 0.99.2-9
  • links: PTS
  • area: main
  • in suites: etch-m68k
  • size: 908 kB
  • ctags: 299
  • sloc: perl: 9,801; ansic: 778; sh: 372; makefile: 122
file content (373 lines) | stat: -rw-r--r-- 12,839 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
$Id: README.monitors 1.10 Mon, 13 Aug 2001 07:12:28 -0400 trockij $

The following monitors are provided with the distribution, to get
you started. It's simple to add your own monitors. See the man page
for "mon" to learn how.

fping.monitor
-------------
    This pings a list of hosts efficiently using the fping program,
    from the Satan distribution. fping.monitor is just a simple
    shell wrapper for fping, and is normally invoked with just the
    list of hosts to ping. Here's a trick: say you don't want to
    trigger an alert until a machine has unpingable for some number
    of minutes. Give fping.monitor the arguments "-r 3 -t 240000".

    arguments:

    	-a		only report failure if all hosts fail
	-r num		retry "num" times for each host before
			reporting failure
	-t num		set timeout between retries to "num"
			milliseconds
	-s num		consider hosts whose response time exceed
			"num" milliseconds as failures
	-T		for each failed host (no response only),
			traceroute to the system and report the
			output
			

ping.monitor
------------
    Similar to fping.monitor, but uses the system's ping program.
    This serializes the pings, which is normally bad to do. This
    is simply an alternative to fping.monitor, if you can't get
    fping to compile. I've only tested it with Linux and Solaris.

freespace.monitor
-----------------
    This will monitor disk space usage of a particular NFS server.
    Arguments are supplied as "path:kBfree [path:kBfree...]". If
    free space dips below kBfree, then it returns a failure condition,
    and the output is how much space is left on that server.

    If you use this monitor, use separate mounts for the volumes that
    you want to test, mounting them with the "-o ro,intr,soft" options,
    so things don't hang too bad if the server is down.

    You should use the ";;" directive to the monitor line, because
    freespace.monitor doesn't take a list of hosts. Here's an example:

watch nfsservers
        service fping
            interval 5m
            monitor fping.monitor
            alert mail.alert mis-alert@company.com
            alert netpage.alert mis-pager
            alertevery 60m
        service freespace
            interval 10m
            monitor freespace.monitor /server1:5000000 /server2:5000000 ;;
            alert mail.alert mis-alert@company.com
            alert netpage.alert mis-pager
            alertevery 60m


tcp.monitor
-----------
    Useful to see if it's possible to connect to a particular port
    on a particular server. This is over-simplified, and does not yet
    support parsing of the output from these services. Options are
    "-p port" to tell which port to check, and a list of hosts.


http_t.monitor
--------------
    This monitor, contributed by Jon Meek (meekj@pt.cyanamid.com), will
    use HTTP to connect to a server, get a page, and log the transfer
    speed of the transaction. It uses the Time::HiRes Perl module,
    available from CPAN. It can also register a failure if the transfer
    doesn't complete within a certain number of seconds. See the
    source code for an explanation of the arguments.

http_tp.monitor
---------------
    Used to measure and log http file transfer speed and use a proxy.
    See the comments in the source code for instructions.
    Requires Time::HiRes, LWP::UserAgent, and HTTP::Request.

dns.monitor
-----------
    dns.monitor will make several DNS queries to verify that a server is
    providing the correct information for a zone. The zone argument is
    the zone to check. There can be multiple zone arguments. The master
    argument is the master server for the zone.  It will be queried for
    the base information.  Then each server will be queried to verify
    that it has the correct answers.  It is assumed that each server is
    supposed to be authoritative for the zone.

ftp.monitor
-----------
    Connect to an ftp server, wait for an acceptible prompt, and log
    out.

hpnp.monitor
------------
    Uses SNMP to monitor HP JetDirect-equipped printers. Reports
    failures as told by the various objects in HP's MIB, and returns
    the message that is showing on the printer's LCD ("LOW TONER",
    "LOAD LETTER", etc.).

http.monitor
------------
    Connects to an http server, retrieves a URL, and returns true
    if everything is OK.

imap.monitor
------------
    Connects to an IMAP server, checks for a sane response, and
    then logs out.

ldap.monitor
------------
    This script will search an LDAP server for objects that match the
    -filter option, starting at the DN given by the -basedn option. Each
    DN found must contain the attribute given by the -attribute option
    and the attribute's value must match the value given by the -value
    option.  Servers are given on the command line. At least one server
    must be specified.

netappfree.monitor
------------------
    Use SNMP to get free disk space from a Network Appliance
    exits with value of 1 if free space on any host drops below
    the supplied parameter, or exits with the value of 2 if
    there is a "soft" error (SNMP library error, or could not get a
    response from the server).

    This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
    module.

    Supply a configuration file with "--config file" option (see
    etc/netappfree.cf for an example), or "--list" for a listing
    of filesystems which are on your filers. Use --list for help in
    building a configuration file.

nntp.monitor
------------
    Tries to connect to a nntp server, and wait for the right output.

ping.monitor
------------
    Returns a list of hosts which not reachable via ICMP echo. Uses the
    system's default ping, rather than fping.

pop3.monitor
------------
    Connects to a POP3 server, waits for the OK prompt, then logs out.

process.monitor
---------------
    Monitor snmp processes.

    Arguments are:  [-c community] host [host ...]

    This script will exit with value 1 if host:community has
    processErrorFlag set.  The summary output line will be the host names
    that failed and the name of the process.  The detail lines are what
    UCD snmp returns for an ErrorMsg.  ('Too (many|few) (name) running (#
    = x)').  If there is an SNMP error (either a problem with the SNMP
    libraries, or a problem communicating via SNMP with the destination
    host), this script will exit with a warning value of 2.

    There probably should be a better way to specify a given process to
    watch instead of everything-ucd-snmp-is-watching.

reboot.monitor
--------------
    Polls the SNMP agent on hosts, and triggers a failure when a reboot
    is detected.

smtp.monitor
------------
    Connects to an SMTP server, waits for a prompt, and then logs out.

telnet.monitor
--------------
    Use tcp_scan to try to connect to the telnet port on a bunch of hosts,
    and look for a "login" prompt.

msql-mysql.monitor, rpc.monitor
-------------------------------
    See the separate README for these monitors.

readdir.monitor
---------------
    From: gilles LamiraL <lamiral@mail.dotcom.fr>
    To: "mon@linux.kernel.org" <mon@linux.kernel.org>
    Subject: readdir monitor

    Hello,

    I wrote a monitor that reads several directories and tells
    if the number of files in each directory exceeds a given number.
    Possible uses are testing /var/spool/mqueue or /var/spool/lp/
    It is a local monitor. No SNMP here. I think it can be easyly
    called from an SNMP agent.

    1) The allowed number can be specified for each directory.

    2) You can add a regex filter to match the file names.
       Only one regex is allowed for all directories.
       Tell me if you want one for directory.

    3) The return status is interesting. It gives the exceeded values 
       in a log based 2 way.

    For example: 
    You want to check if /var/spool/mqueue contains less than 100 messages

    $ ls /var/spool/mqueue | wc -l
    479
    $ ./my-readdir.monitor /var/spool/mqueue:100
    /var/spool/mqueue:479
    $ echo $?
    3

      1 means more than 100 messages
      2 means more than 200 messages
      3 means more than 400 messages
      4 means more than 800 messages

    ...
    255 means more than 5.79 * 10^76 messages (579 + 74 zeros !)
       
    Nice ?
    See more example in the script itself.


up_rtt.monitor
--------------
    mon monitor to check for circuit up and measure RTT.
    Jon Meek - 09-May-1998.  Requires Perl Modules "Time::HiRes" and
    "Statistics::Descriptive".


dialin.monitor
--------------
    Dials in to a modem and fails if a carrier and a prompt is not
    detected.  Useful for telling if your modem pool is down or if some
    spaz modem has quit answering the phone.

    dialin.monitor requires the Perl Expect module, available from
    CPAN.

    This program performs UUCP-style locking, and needs to run setgid
    uucp to accomplish this. Provided is dialin.monitor.wrap.c, a simple
    little C program which is installed as setgid uucp and directly
    executes the actual dialin.monitor Perl script. This is required
    because some systems (e.g. Linux) do not allow setuid/setgid scripts.

    To build, edit the Makefile in mon.d, and adjust monpath to your
    environment. The do:

    make && make install

    dialin.monitor accepts several arguments. The only required argument
    is "-n", which specifies the phone number to dial.

    -n number	dial in to "number"
    -t secs	timeout to wait for "CONNECT" from modem (60)
    -l lockdir	directory to use for UUCP-style locking ("/var/lock")
    -D device	serial device to use ("/dev/modem")


foundry-chassis.monitor
-----------------------
    Reports the power supply and fan status of Foundry chassis-based
    switches, like the BigIron and the FastIron. This uses the
    "FOUNDRY-SN-AGENT-MIB" and "FOUNDRY-SN-ROOT-MIB". Foundry annoyingly
    ships their MIBs in one giant file. What I do is separate them into
    distinct files so that the UCD tools don't need to parse the single
    giant file.

    I've tested this with staged failures of PSUs and it works fine.
    It actually caught an actual non-staged failure once.

    Arguments are:

    -c community     SNMP community to use


silkworm.monitor
---------------
    Reports port, fan, power supply, and temperature failures in Brocade
    SilkWorm FCAL switches. It requires Brocade's "SW-MIB" MIB.

    Sensor failures are explicitly reported by the agent, read by this
    monitor, and reported to mon. This monitor identifies port problems
    by paying attention to only those ports whose administrative status is
    "online", yet the actual operational status is not "online".

    This monitor has not yet been tested in the case of an actual (or
    staged) failure. That doesn't mean it doesn't work--it's just that
    it hasn't been tested :)

    Arguments are:

    -c community     SNMP community to use


cpqhealth.monitor
-----------------
    Report fan, PSU, and temperature failures from systems running the
    Compaq "Insight Manager". It requires the "CPQHLTH-MIB" MIB,
    and the UCD SNMP libs w/the Perl module.
    
    We've had this running for a little while now, and both tested it with
    "staged" failures and actual failures, and it seems to work rather
    well. The Insight agent is a bit quirky, though. I've seen where
    it reports that both PSUs are installed, running without error,
    yet it says it is not in a redundant configuration.

    Arguments are:

    -c community     SNMP community to use


mon.monitor
-----------
    Report the running status of a mon server.

    Arguments are:

    -p port		port to use, defaults to 2583
    -t timeout		timeout in seconds, defaults to 30
    -u username		username (optional)
    -p password		password (optional)


traceroute.monitor
------------------
	Monitor routes from monitor machine to a remote system
	using traceroute. Alarm and log when changes are detected.
	See embedded POD documentation for details.


smtp3.monitor
-------------
	smtp monior which performs logging of connect times.
	See embedded POD documentation for details.


http_tpp.monitor
---------------
	Parallel query http server monitor for mon. Logs timing
	and size results, can use a proxy server, and can
	incorporate a "Smart Alarm" function via a user supplied
	Perl module.  See embedded POD documentation for details.


file_change.monitor
-------------------
	file_change.monitor will watch specified files in a
	directory and trigger an alert when any monitored file
	changes, or is missing. File changes can optionally be
	logged using RCS.
	See embedded POD documentation for details.


na_quota.monitor
----------------
	report quota limits on network appliance filers.  see the comments
	in the file for details.