File: TODO

package info (click to toggle)
xymon 4.3.30-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 11,384 kB
  • sloc: ansic: 69,137; sh: 3,601; makefile: 863; javascript: 452; perl: 48
file content (151 lines) | stat: -rw-r--r-- 7,697 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
Bugfixes
--------
* From: <Stewart_Larsen@doh.state.fl.us>
  Date: Mon, 10 Jan 2005 15:06:36 -0500
  Subject: {bb} Bbgen depends not working for conn tests
    10.0.0.1     host1.domain.com #   depends=(conn:host2.domain.com/conn)
    10.0.0.2     host2.domain.com #
  Both hosts have red connectivity.  My understanding is that since host2
  can't be pinged, host1's conn test should be clear, not red. Is this
  right?

  Analysis: "depends" is not evaluated for "conn" tests, 
  only the "router" setting is. Simple fix would be to
  change "conn" dependencies into router tags on the fly,
  or implement "depends" throughout and treat "route"
  as a special-case of depends.

* SMTP network check violates the SMTP protocol by sending
  commands before the banner has appeared. Some servers 
  recognize this as a spam-client, and refuses with a status
  554, causing a red status.
  The correct fix would be to implement a full expect-send
  engine for the TCP tests (would help with other things also).

* Make a common vmstat RRD layout, to allow for systems that
  grow more advanced. Use this to add Solaris I/O wait data.
  AIX also needs it. Will break compatibility with existing
  RRD files, unless we look at what datasets exist and drop
  data that cannot be stored.
  For AIX (bug-report Nov.10-14-16 2006 from Andy France):
  > There's a mix of "cpu_w" and "cpu_wait" definitions in the Xymon RRD
  > module, depending on what operating system the vmstat data comes from.
  > But all of the graph definitions in graphs.cfg use the cpu_wait
  > definition.
  Also, postponed from the 4.2 release:
  o vmstat columns on HP-UX 11.0 are different from 11.11i. Marco Avvissano March 10.
  o vmstat columns differ between various Red Hat Enterprise versions.
  o sar data parsing for IRIX client instead of vmstat data.

* A host cannot be configured to appear on multiple pages of an
  alternate pageset. Configuring this will cause it to appear
  twice on each page. Fixing this requires a complete re-design
  of how alternate pagesets are built, and probably also quite
  a bit of work on the internal datastructures in xymongen.


Things I must remember to look at
---------------------------------
* IIS6Check: Log performance data in graphs.

* Scott Walters suggest larger RRA's for graphs:
  "I think 3 RRAs is good.  A month of 5m samples, 2 years 
  of 1 hour samples, and 7 years of one day samples.
  This doesn't address keeping the MAXs, but is worth 
  considering as a blanket change for all RRDs.
  You could then generate *real* 9AM-5PM Load avareage 
  reports for the last year Monday - Friday."
  It will require re-generating all of the RRD's.

* configuration file for NCV.
  - filter out unwanted lines
  - more flexible DS configuration than the env settings

* Create a new xymond worker module off the stachg channel.
  This will dynamically receive status updates, and therefore
  it can have the full status of each PAGE without having to
  load the xymond board (should do so regularly just in case).
  This can be used to switch the overview pages to a CGI tool
  instead of generating the static pages. NB: Must be able to
  handle multiple page setups - or should we just have one
  worker per setup with different config files ?
* "cpu" status determined by the non-idle time reported by
  vmstat, instead of the rather meaningless load average.
* xymond_client process/disk alarms to different people depending
  on *which* process/disk is in error.
* process checks that relate to a group of host (process "foo"
  must exist on at least X of these Y nodes: HostA, HostB, HostC.
* Configuration of which graph(s) to show by default, including 
  limiting it to e.g. one of the 7 disk graphs. Ref. mail from
  Charles Jones 15-feb-2005.
  What we really want to do is customize on a per host/test
  basis which graphs appear for which tests. So this means some
  way of customizing svcstatus.cgi to include specific graphs.
* Something similar to larrd-graphs.cgi for picking out a bunch
  of graphs to show on one page.
* Move all of the xymonnet "badFOO" etc. stuff away from xymonnet
  and into xymond.
* On Fri, Aug 05, 2005 at 09:39:15AM +0200, Thomas Bergauer wrote:
  2. the NOPROP(RED|YELLOW|..) command in the hosts.cfg file works as
  announced, but I am looking for a possibility to tell NOPROP a "level"
  of propagation. This means that an alarm should propagate to its
  sub-page, but not further up to the main page.
* Dialog-style network tests. Currently when we connect, we immediately
  blast all of the SEND string to the remote end, which in many cases
  is a protocol violation (e.g. SMTP servers may refuse us because
  we send data before seeing their "220" greeting). Should do this
  right and also cater for multiple http exchanges to follow redirects.
* Better dependencies between tests. If you have multiple http tests
  for one host, be able to make them depend on each other - also such
  that one http test depends on another on the same host. And direct alerts 
  for one URL to one group, and for another URL to a different group 
  (like GROUP in client handling).
  See http://www.xymon.com/archive/2006/06/msg00210.html
* Better selection of which graphs go with what statuses. 
* Easy way of grouping hosts for multi-graphs.


Improvements
------------
* showgraph.cgi change to make zoom work in two dimensions. 
  Requires RRDtool 1.2.x.
* More reports: Check out bb-reports on deadcat
* Multi-line macros in alerts.cfg
* Allow for regex's in the TCP response match code.
* Merging of alerts based on some criteria, e.g. merge all
  purple alerts for a host into one message.
* Implement "--follow" in the new HTTP tester.
* https proxying (proxy CONNECT protocol)
* Optionally hide the URL and content output from HTTP/content
  checks for "security reasons". Marco Avvisano, 20-sep-2004.
* Set a "BASE" URL in the content status message, so the web
  page we show will link back to the original page for images etc.
* Provide a way of sending http status-messages with individual
  test (column) names for each URL - apparently, Big Sister does
  this. Suggested by Darshan Purandare. Repeated by Scott Walker.
* Provide a way for a "cont" check to NOT be included in the
  "http" column. Suggested by Kim Scarborough.
* Allow for enable/disable of TCP response check per host/service.
* Use the "acked" gif for subpage/page/etc. links when there
  are only acked tests on the page. Marco Avvisano.
* Handle "summary" pages for alternate pagesets. Need to find a way
  of detecting what color a page has when it was NOT generated by
  the current pageset (allowing for summaries across pagesets).
* Improve the history colorbars in cases where there are many
  shortlived statuses. They should not automatically be given 1
  pixel each, as that will cause the history graph to be *very*
  wide.
* Display-only tags should work on duplicate host-entries in hosts.cfg,
  e.g. you should be able to put a "NAME:foo" tag on a host and have it
  show up with different names for the same host.

Various ideas that have appeared on the mailing lists
-----------------------------------------------------
* A report generater capable of displaying for a certain time frame:
  1) List of the top XX "host.service" state changes.  This is to help us
     understand what is barking the most in our environment and focus efforts
     on fixing chronic issues rather than band aiding.
  2) List of lowest XX "Availablity" for host.service.
  And since I am throwing things out, how about embedding a 13 week rolling
  availability into the status/history page?
  Scott Walters, "Reporting based on history", Sep 9 2004