File: TODO

package info (click to toggle)
mon 0.37h-3
  • links: PTS
  • area: main
  • in suites: hamm
  • size: 336 kB
  • ctags: 72
  • sloc: perl: 2,167; sh: 138; makefile: 48
file content (77 lines) | stat: -rw-r--r-- 2,971 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
$Id: TODO,v 1.18 1998/01/13 05:51:39 trockij Exp $

-make the "term" or "reset" command only work from the current
 host that mon is running on.

-when a watch/host/service is disabled, save the state to disk.
 When starting mon, read the current disabled state from that file.
 Possibly make this a command-line option.

-re-vamp the host disabling. 1) store them in a table with a timeout
 on each so that they can automatically re-enable themselves so
 people don't forget to re-enable them manually. 2) don't do
 the disabling by "commenting" them out of the host groups.
 We still want them to be tested for failure, but just disable
 alerts that have to do with the disabled hosts.
 When a host is commented out, accept a "reason" field that
 is later accessible so that you can tell why someone disabled
 the host.

-Support general asynchronous traps from any other programs

-allow checking a service at a particular time of day, maybe using
 inPeriod.

-add a variable for each period that says whether to ignore
 prior output when deciding to send an alert

-add an option to send an alert when a service makes a state transition
 from failure to success

-service templates in the config file would be nice. Maybe this can
 be done with M4 macros or something. Maybe it doesn't belong in
 mon itself.

-add a "time first noticed" to each service, which is the time
 of the first failure. When a monitor succeeds, reset "time first
 noticed" to zero. Add a new argument to alert programs passing this
 variable.

-implement some sort of "worsening-condition" rule on a per-service
 basis, triggering different alerts when things are really bad.

-make Wietse's tcp_scan.c multiplex hosts in addition to ports

-add authentication to mon server

-make a name for this silly program other than mon (well, mon *is*
 easy to type!)

-maybe make a command that will disable an alert for a certain amount
 of time (maybe implement this as an at(1) job??)

-make it possible to disable just one of multiple alarms in a service

-handle client timeouts by adding a command I/O sub that calls select(2)
 on the client's filehandle, and closes the connection when it times out

-Add better grammar checking to the config file. Don't allow watch
 records for groups that haven't been defined, etc.

-Do something to control rampant alerts because of a lot of
 inter-dependent test fail. E.g, you're testing if a hub is down,
 and another test depends on a machine plugged into that hub. The hub
 burns up or some goon trips over the power cord, and you get pages
 because the hub is down, and because you can't get to the service
 from

-change $watch{$group}[$service] to be $watch{$group}{$service}, because
 services are now required to have a tag

-Handle alert processes like monitor processes, meaning only allow
 one running at a time

-What happens to stderr from monitors and alerts???

-Enable definitions of composite groups, like "building2=@group1 @group2"