1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of ANALYSIS.CFG</TITLE>
</HEAD><BODY>
<H1>ANALYSIS.CFG</H1>
Section: File Formats (5)<BR>Updated: Version 4.3.17: 23 Feb 2014<BR><A HREF="#index">Index</A>
<A HREF="../index.html">Return to Main Contents</A><HR>
<A NAME="lbAB"> </A>
<H2>NAME</H2>
analysis.cfg - Configuration file for the xymond_client module
<P>
<A NAME="lbAC"> </A>
<H2>SYNOPSIS</H2>
<B>~Xymon/server/etc/analysis.cfg</B>
<P>
<A NAME="lbAD"> </A>
<H2>DESCRIPTION</H2>
The analysis.cfg file controls what color is assigned to
the status-messages that are generated from the Xymon client
data - typically the cpu, disk, memory, procs- and msgs-columns. Color
is decided on the basis of some <B>settings</B> defined in this file;
settings apply to specific hosts through a set of <B>rules</B>.
<P>
Note: This file is only used on the Xymon server - it is not
used by the Xymon client, so there is no need to distribute
it to your client systems.
<P>
<A NAME="lbAE"> </A>
<H2>FILE FORMAT</H2>
Blank lines and lines starting with a hash mark (#) are treated as
comments and ignored.
<P>
<P>
<A NAME="lbAF"> </A>
<H2>CPU STATUS COLUMN SETTINGS</H2>
<P>
<B>LOAD warnlevel paniclevel</B>
<P>
If the system load exceeds "warnlevel" or "paniclevel", the "cpu"
status will go yellow or red, respectively. These are decimal
numbers.
<P>
Defaults: warnlevel=5.0, paniclevel=10.0
<P>
<B>UP bootlimit toolonglimit [color]</B>
<P>
The cpu status goes yellow/red if the system has been up for less than
"bootlimit" time, or longer than "toolonglimit". The time is in
minutes, or you can add h/d/w for hours/days/weeks - eg. "2h" for
two hours, or "4w" for 4 weeks.
<P>
Defaults: bootlimit=1h, toolonglimit=-1 (infinite), color=yellow.
<P>
<P>
<B>CLOCK max.offset [color]</B>
<P>
The cpu status goes yellow/red if the system clock on the client
differs more than "max.offset" seconds from that of the Xymon
server. Note that this is not a particularly accurate test, since
it is affected by network delays between the client and the server,
and the load on both systems. You should therefore not rely on this
being accurate to more than +/- 5 seconds, but it will let you
catch a client clock that goes completely wrong. The default is
NOT to check the system clock.
<BR>
<B>NOTE:</B> Correct operation of this test obviously requires that
the system clock of the Xymon server is correct. You should therefore
make sure that the Xymon server is synchronized to the real clock,
e.g. by using NTP.
<P>
<P>
Example: Go yellow if the load average exceeds 5, and red if it
exceeds 10. Also, go yellow for 10 minutes after a reboot, and after
4 weeks uptime. Finally, check that the system clock is at most
15 seconds offset from the clock of the Xymon system and go red if
that is exceeded.
<DL COMPACT>
<DT><DD>
<PRE>
LOAD 5 10
UP 10m 4w yellow
CLOCK 15 red
</PRE>
</DL>
<P>
<P>
<A NAME="lbAG"> </A>
<H2>DISK STATUS COLUMN SETTINGS</H2>
<P>
<B>DISK filesystem warnlevel paniclevel</B>
<BR>
<B>DISK filesystem IGNORE</B>
<BR>
<B>INODE filesystem warnlevel paniclevel</B>
<BR>
<B>INODE filesystem IGNORE</B>
<P>
If the utilization of "filesystem" is reported to exceed "warnlevel"
or "paniclevel", the "disk" status will go yellow or red, respectively.
"warnlevel" and "paniclevel" are either the percentage used, or the
space available as reported by the local "df" command on the host.
For the latter type of check, the "warnlevel" must be followed by the
letter "U", e.g. "1024U".
<P>
The special keyword "IGNORE" causes this filesystem to be ignored
completely, i.e. it will not appear in the "disk" status column and
it will not be tracked in a graph. This is useful for e.g. removable
devices, backup-disks and similar hardware.
<P>
"filesystem" is the mount-point where the filesystem is mounted, e.g.
"/usr" or "/home". A filesystem-name that begins with "%" is interpreted
as a Perl-compatible regular expression; e.g. "%^/oracle.*/" will match
any filesystem whose mountpoint begins with "/oracle".
<P>
"INODE" works identical to "DISK", but uses the count of i-nodes in
the filesystem instead of the amount of disk space.
<P>
Defaults DISK: warnlevel=90%, paniclevel=95%
Defaults INODE: warnlevel=70%, paniclevel=90%
<P>
<P>
<A NAME="lbAH"> </A>
<H2>MEMORY STATUS COLUMN SETTINGS</H2>
<P>
<B>MEMPHYS warnlevel paniclevel</B>
<BR>
<B>MEMACT warnlevel paniclevel</B>
<BR>
<B>MEMSWAP warnlevel paniclevel</B>
<P>
If the memory utilization exceeds the "warnlevel" or "paniclevel", the
"memory" status will change to yellow or red, respectively.
Note: The words "PHYS", "ACT" and "SWAP" are also recognized.
<P>
Example: Go yellow if more than 20% swap is used, and red if
more than 40% swap is used or the actual memory utilisation exceeds
90%. Don't alert on physical memory usage.
<DL COMPACT>
<DT><DD>
<PRE>
MEMSWAP 20 40
MEMACT 90 90
MEMPHYS 101 101
</PRE>
</DL>
<P>
Defaults:
<DL COMPACT>
<DT><DD>
<PRE>
MEMPHYS warnlevel=100 paniclevel=101 (i.e. it will never go red).
MEMSWAP warnlevel=50 paniclevel=80
MEMACT warnlevel=90 paniclevel=97
</PRE>
</DL>
<P>
<P>
<A NAME="lbAI"> </A>
<H2>PROCS STATUS COLUMN SETTINGS</H2>
<P>
<B>PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text]</B>
<P>
The "ps" listing sent by the client will be scanned for how many
processes containing "processname" are running, and this is then
matched against the min/max settings defined here. If the running
count is outside the thresholds, the color of the "procs" status
changes to "color".
<P>
To check for a process that must NOT be running: Set minimum and
maximum to 0.
<P>
"processname" can be a simple string, in which case this string must
show up in the "ps" listing as a command. The scanner will find
a ps-listing of e.g. "/usr/sbin/cron" if you only specify "processname"
as "cron".
"processname" can also be a Perl-compatiable regular expression, e.g.
"%java.*inst[0123]" can be used to find entries in the ps-listing for
"java -Xmx512m inst2" and "java -Xmx256 inst3". In that case,
"processname" must begin with "%" followed by the regular expression.
Note that Xymon defaults to case-insensitive pattern matching; if that
is not what you want, put "(?-i)" between the "%" and the regular
expression to turn this off. E.g. "%(?-i)HTTPD" will match the
word HTTPD only when it is upper-case.
<BR>
If "processname" contains whitespace (blanks or TAB), you must enclose
the full string in double quotes - including the "%" if you use regular
expression matching. E.g.
<P>
<BR> PROC "%xymond_channel --channel=data.*xymond_rrd" 1 1 yellow
<P>
or
<P>
<BR> PROC "java -DCLASSPATH=/opt/java/lib" 2 5
<P>
You can have multiple "PROC" entries for the same host, all of the
checks are merged into the "procs" status and the most severe
check defines the color of the status.
<P>
The optional <B>TRACK=id</B> setting causes Xymon to track the number of
processes found in an RRD file, and put this into a graph which is shown
on the "procs" status display. The <B>id</B> setting is a simple text string
which will be used as the legend for the graph, and also as part of the
RRD filename. It is recommended that you use only letters and digits for
the ID.
<BR>
Note that the process counts which are tracked are only performed once
when the client does a poll cycle - i.e. the counts represent snapshots
of the system state, not an average value over the client poll cycle.
Therefore there may be peaks or dips in the actual process counts which
will not show up in the graphs, because they happen while the Xymon client
is not doing any polling.
<P>
The optional <B>TEXT=text</B> setting is used in the summary of the "procs"
status. Normally, the summary will show the "processname" to identify the
process and the related count and limits. But this may be a regular
expression which is not easily recognizable, so if defined, the <B>text</B>
setting string will be used instead. This only affects the "procs" status
display - it has no effect on how the rule counts or recognizes processes
in the "ps" output.
<P>
Example: Check that "cron" is running:
<BR>
<TT> </TT>PROC cron<BR>
<P>
Example: Check that at least 5 "httpd" processes are running, but not more than 20:
<BR>
<TT> </TT>PROC httpd 5 20<BR>
<P>
Defaults:
<BR>
<TT> </TT>mincount=1, maxcount=-1 (unlimited), color="red".<BR>
<BR>
<TT> </TT>Note that no processes are checked by default.<BR>
<P>
<A NAME="lbAJ"> </A>
<H2>MSGS STATUS COLUMN SETTINGS</H2>
<P>
<B>LOG logfilename pattern [COLOR=color] [IGNORE=excludepattern] [OPTIONAL]</B>
<P>
The Xymon client extracts interesting lines from one or
more logfiles - see the
<I><A HREF="../man5/client-local.cfg.5.html">client-local.cfg</A>(5)</I>
man-page for information about how to configure which
logs a client should look at.
<P>
The <B>LOG</B> setting determine how these extracts of log entries
are processed, and what warnings or alerts trigger as a result.
<P>
"logfilename" is the name of the logfile. Only logentries from this filename
will be matched against this rule. Note that "logfilename" can be a regular
expression (if prefixed with a '%' character).
<P>
"pattern" is a string or regular expression. If the logfile data matches
"pattern", it will trigger the "msgs" column to change color. If
no "color" parameter is present, the default is to go "red" when
the pattern is matched. To match against a regular expression, "pattern"
must begin with a '%' sign - e.g "%WARNING|NOTICE" will match any lines
containing either of these two words.
Note that Xymon defaults to case-insensitive pattern matching; if that
is not what you want, put "(?-i)" between the "%" and the regular
expression to turn this off. E.g. "%(?-i)WARNING" will match the
word WARNING only when it is upper-case.
<P>
"excludepattern" is a string or regular expression that can be used to
filter out any unwanted strings that happen to match "pattern".
<P>
The <B>OPTIONAL</B> keyword causes the check to be skipped if the logfile
does not exist.
<P>
Example: Trigger a red alert when the string "ERROR" appears in the "/var/adm/syslog" file:
<BR>
<TT> </TT>LOG /var/adm/syslog ERROR<BR>
<P>
Example: Trigger a yellow warning on all occurrences of the word "WARNING"
or "NOTICE" in the "daemon.log" file, except those from the "lpr" system:
<BR>
<TT> </TT>LOG /var/log/daemon.log %WARNING|NOTICE COLOR=yellow IGNORE=lpr<BR>
<P>
Defaults:
<BR>
<TT> </TT>color="red", no "excludepattern".<BR>
<P>
Note that no logfiles are checked by default. Any log data reported by a client
will just show up on the "msgs" column with status OK (green).
<P>
<P>
<A NAME="lbAK"> </A>
<H2>FILES STATUS COLUMN SETTINGS</H2>
<P>
<B>FILE filename [color] [things to check] [OPTIONAL] [TRACK]</B>
<P>
<B>DIR directoryname [color] [size<MAXSIZE] [size>MINSIZE] [TRACK]</B>
<P>
These entries control the status of the "files" column. They allow you to
check on various data for files and directories.
<P>
<B>filename</B> and <B>directoryname</B> are names of files or directories,
with a full path. You can use a regular expression to match the names of
files and directories reported by the client, if you prefix the expression
with a '%' character.
<P>
<B>color</B> is the color that triggers when one or more of the checks fail.
<P>
The <B>OPTIONAL</B> keyword causes this check to be skipped if the file does
not exist. E.g. you can use this to check if files that should be temporary are
not deleted, by checking that they are not older than the max time you would
expect them to stick around, and then using OPTIONAL to ignore the state
where no files exist.
<P>
The <B>TRACK</B> keyword causes the size of the file or directory to be tracked
in an RRD file, and presented in a graph on the "files" status display.
<P>
For files, you can check one or more of the following:
<DL COMPACT>
<DT>noexist<DD>
triggers a warning if the file exists. By default,
a warning is triggered for files that have a FILE entry, but
which do not exist.
<DT>type=TYPE<DD>
where TYPE is one of "file", "dir", "char", "block",
"fifo", or "socket". Triggers warning if the file is not of the
specified type.
<DT>ownerid=OWNER<DD>
triggers a warning if the owner does not match what is listed here.
OWNER is specified either with the numeric uid, or the user name.
<DT>groupid=GROUP<DD>
triggers a warning if the group does not match what is listed here.
GROUP is specified either with the numeric gid, or the group name.
<DT>mode=MODE<DD>
triggers a warning if the file permissions are not
as listed. MODE is written in the standard octal notation, e.g.
"644" for the rw-r--r-- permissions.
<DT>size<MAX.SIZE and size>MIN.SIZE<DD>
triggers a warning it the file size is greater than MAX.SIZE or
less than MIN.SIZE, respectively. For filesizes, you can use the
letters "K", "M", "G" or "T" to indicate that the filesize is in
Kilobytes, Megabytes, Gigabytes or Terabytes, respectively. If there
is no such modifier, Kilobytes is assumed. E.g. to warn if a file
grows larger than 1MB, use <B>size<1024M</B>.
<DT>mtime>MIN.MTIME mtime<MAX.MTIME<DD>
checks how long ago the file was last modified (in seconds). E.g.
to check if a file was updated within the past 10 minutes (600
seconds): <B>mtime<600</B>. Or to check that a file has NOT been updated
in the past 24 hours: <B>mtime>86400</B>.
<DT>mtime=TIMESTAMP<DD>
checks if a file was last modified at TIMESTAMP. TIMESTAMP is a unix epoch
time (seconds since midnight Jan 1 1970 UTC).
<DT>ctime>MIN.CTIME, ctime<MAX.CTIME, ctime=TIMESTAMP<DD>
acts as the mtime checks, but for the ctime timestamp (when the directory
entry of the file was last changed, eg. by chown, chgrp or chmod).
<DT>md5=MD5SUM, sha1=SHA1SUM, rmd160=RMD160SUM<DD>
trigger a warning if the file checksum using the MD5, SHA1 or RMD160
message digest algorithms do not match the one configured here. Note:
The "file" entry in the
<I><A HREF="../man5/client-local.cfg.5.html">client-local.cfg</A>(5)</I>
file must specify which algorithm to use.
<P>
</DL>
<P>
For directories, you can check one or more of the following:
<DL COMPACT>
<DT>size<MAX.SIZE and size>MIN.SIZE<DD>
triggers a warning it the directory size is greater than MAX.SIZE or
less than MIN.SIZE, respectively. Directory sizes are reported in
whatever unit the <B>du</B> command on the client uses - often KB
or diskblocks - so MAX.SIZE and MIN.SIZE must be given in the same
unit.
<P>
</DL>
<P>
Experience shows that it can be difficult to get these rules right.
Especially when defining minimum/maximum values for file sizes, when
they were last modified etc. The one thing you must remember when
setting up these checks is that the rules describe criteria that must
be met - only when they are met will the status be green.
<P>
So "mtime<600" means "the difference between current time and the mtime
of the file must be less than 600 seconds - if not, the file status will
go red".
<P>
<P>
<A NAME="lbAL"> </A>
<H2>PORTS STATUS COLUMN SETTINGS</H2>
<P>
<B>PORT criteria [MIN=mincount] [MAX=maxcount] [COLOR=color] [TRACK=id] [TEXT=displaytext]</B>
<P>
The "netstat" listing sent by the client will be scanned for how many
sockets match the <B>criteria</B> listed. The criteria you can use are:
<DL COMPACT>
<DT>LOCAL=addr<DD>
"addr" is a (partial) local address specification in the format used on
the output from netstat.
<DT>EXLOCAL=addr<DD>
Exclude certain local addresses from the rule.
<DT>REMOTE=addr<DD>
"addr" is a (partial) remote address specification in the format used on
the output from netstat.
<DT>EXREMOTE=addr<DD>
Exclude certain remote addresses from the rule.
<DT>STATE=state<DD>
Causes only the sockets in the specified state to be included, "state"
is usually LISTEN or ESTABLISHED but can be any socket state reported by
the clients "netstat" command.
<DT>EXSTATE=state<DD>
Exclude certain states from the rule.
</DL>
<P>
"addr" is typically "10.0.0.1:80" for the IP 10.0.0.1, port 80.
Or "*:80" for any local address, port 80. Note that the Xymon clients
normally report only the numeric data for IP-addresses and port-numbers,
so you must specify the port number (e.g. "80") instead of the service
name ("www").
<BR>
"addr" and "state" can also be a Perl-compatiable regular expression, e.g.
"LOCAL=%[.:](80|443)" can be used to find entries in the netstat local port for
both http (port 80) and https (port 443). In that case, portname or state must
begin with "%" followed by the reg.expression.
<P>
The socket count found is then matched against the min/max settings defined
here. If the count is outside the thresholds, the color of the "ports"
status changes to "color". To check for a socket that must NOT exist: Set
minimum and maximum to 0.
<P>
The optional <B>TRACK=id</B> setting causes Xymon to track the number of
sockets found in an RRD file, and put this into a graph which is shown
on the "ports" status display. The <B>id</B> setting is a simple text string
which will be used as the legend for the graph, and also as part of the
RRD filename. It is recommended that you use only letters and digits for
the ID.
<BR>
Note that the sockets counts which are tracked are only performed once
when the client does a poll cycle - i.e. the counts represent snapshots
of the system state, not an average value over the client poll cycle.
Therefore there may be peaks or dips in the actual sockets counts which
will not show up in the graphs, because they happen while the Xymon client
is not doing any polling.
<P>
The <B>TEXT=displaytext</B> option affects how the port appears on the
"ports" status page. By default, the port is listed with the
local/remote/state rules as identification, but this may be somewhat
difficult to understand. You can then use e.g. "TEXT=Secure Shell" to make
these ports appear with the name "Secure Shell" instead.
<P>
Defaults: mincount=1, maxcount=-1 (unlimited), color="red".
Note: No ports are checked by default.
<P>
Example: Check that the SSH daemon is listening on port 22. Track the
number of active SSH connections, and warn if there are more than 5.
<BR>
<BR> PORT LOCAL=%[.:]22$ STATE=LISTEN "TEXT=SSH listener"
<BR>
<BR> PORT LOCAL=%[.:]22$ STATE=ESTABLISHED MAX=5 TRACK=ssh TEXT=SSH
<P>
<P>
<A NAME="lbAM"> </A>
<H2>SVCS status (Microsoft Windows clients)</H2>
<P>
<B>SVC servicename status=(started|stopped) [startup=automatic|disabled|manual]</B>
<P>
<A NAME="lbAN"> </A>
<H2>DS - RRD based status override</H2>
<P>
<B>DS column filename:dataset rules COLOR=colorname TEXT=explanation</B>
<P>
"column" is the statuscolumn that will be modified. "filename" is
the name of the RRD file holding the data you use for comparison.
"dataset" is the name of the dataset in the RRD file - the "rrdtool info"
command is useful when determining these.
"rules" determine when to apply the override. You can use
">", ">=", "<" or "<=" to compare the current measurement
value against one or more thresholds. "explanation" is a text
that will be shown to explain the override - you can use some
placeholders in the text: "&N" is replaced with the name of the
dataset, "&V" is replaced with the current value, "&L" is replaced
by the low threshold, "&U" is replaced with the upper threshold.
<P>
NOTE: This rule uses the raw data value from a client
to examine the rules. So this type of test is only really
suitable for datasets that are of the "GAUGE" type. It cannot
be used meaningfully for datasets that use "COUNTER" or
"DERIVE" - e.g. the datasets that are used to capture network
packet traffic - because the data stored in the RRD for
COUNTER-based datasets undergo a transformation (calculation)
when going into the RRD. Xymon does not have direct access to
the calculated data.
<P>
Example: Flag "conn" status a yellow if responsetime exceeds
100 msec.
<BR>
<TT> </TT>DS conn tcp.conn.rrd:sec >0.1 COLOR=yellow TEXT="Response time &V exceeds &U seconds"<BR>
<P>
<A NAME="lbAO"> </A>
<H2>MQ Series SETTINGS</H2>
<P>
<B>MQ_QUEUE queuename [age-warning=N] [age-critical=N] [depth-warning=N] [depth-critical=N]</B>
<BR>
<B>MQ_CHANNEL channelname [warning=PATTERN] [alert=PATTERN]</B>
<P>
This is a set of checks for checking the health of IBM MQ message-queues.
It requires the "mq.sh" or similar collector module to run on a node with
access to the MQ "Queue Manager" so it can report the status of queues
and channels.
<P>
The MQ_QUEUE setting checks the health of a single queue: You can warn
(yellow) or alert (red) based on the depth of the queue, and/or the
age of the oldest entry in the queue. These values are taken directly
from the output generated by the "runmqsc" utility.
<P>
The MQ_CHANNEL setting checks the health of a single MQ channel: You
can warn or alert based on the reported status of the channel. The
PATTERN is a normal pattern, i.e. either a list of status keywords,
or a regular expression pattern.
<P>
<A NAME="lbAP"> </A>
<H2>CHANGING THE DEFAULT SETTINGS</H2>
If you would like to use different defaults for the settings described above,
then you can define the new defaults after a DEFAULT line. E.g. this would
explicitly define all of the default settings:
<DL COMPACT>
<DT><DD>
<PRE>
DEFAULT
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
</PRE>
</DL>
<P>
<P>
<A NAME="lbAQ"> </A>
<H2>RULES TO SELECT HOSTS</H2>
All of the settings can be applied to a group of hosts, by preceding them with
rules. A rule defines of one of more filters using these keywords (note that
this is identical to the rule definitions used in the
<I><A HREF="../man5/alerts.cfg.5.html">alerts.cfg</A>(5)</I>
file).
<P>
<B>PAGE=targetstring</B>
Rule matching an alert by the name of the page in Xymon. "targetstring" is the path of
the page as defined in the hosts.cfg file. E.g. if you have this setup:
<DL COMPACT>
<DT><DD>
<PRE>
page servers All Servers
subpage web Webservers
10.0.0.1 www1.foo.com
subpage db Database servers
10.0.0.2 db1.foo.com
</PRE>
</DL>
<P>
Then the "All servers" page is found with <B>PAGE=servers</B>, the
"Webservers" page is <B>PAGE=servers/web</B> and the "Database servers"
page is <B>PAGE=servers/db</B>. Note that you can also use regular expressions
to specify the page name, e.g. <B>PAGE=%.*/db</B> would find the "Database
servers" page regardless of where this page was placed in the hierarchy.
<P>
The top-level page has a the fixed name <B>/</B>, e.g. <B>PAGE=/</B> would
match all hosts on the Xymon frontpage. If you need it in a regular
expression, use <B>PAGE=%^/</B> to avoid matching the forward-slash
present in subpage-names.
<P>
<B>EXPAGE=targetstring</B>
Rule excluding a host if the pagename matches.
<P>
<B>HOST=targetstring</B>
Rule matching a host by the hostname.
"targetstring" is either a comma-separated list of hostnames (from the hosts.cfg file),
"*" to indicate "all hosts", or a Perl-compatible regular expression.
E.g. "HOST=dns.foo.com,<A HREF="http://www.foo.com">www.foo.com</A>" identifies two specific hosts;
"HOST=%www.*.foo.com EXHOST=www-test.foo.com" matches all hosts with a name
beginning with "www", except the "www-test" host.
<P>
<B>EXHOST=targetstring</B>
Rule excluding a host by matching the hostname.
<P>
<B>CLASS=classname</B>
Rule match by the client class-name. You specify the class-name
for a host when starting the client through the "--class=NAME"
option to the runclient.sh script. If no class is specified, the
host by default goes into a class named by the operating system.
<P>
<B>EXCLASS=classname</B>
Exclude all hosts belonging to "classname" from this rule.
<P>
<B>DISPLAYGROUP=groupstring</B>
Rule matching an alert by the text of the display-group (text following the group,
group-only, group-except heading) in the hosts.cfg file. "groupstring" is the text
for the group, stripped of any HTML tags. E.g. if you have this setup:
<DL COMPACT>
<DT><DD>
<PRE>
group Web
10.0.0.1 www1.foo.com
10.0.0.2 www2.foo.com
group Production databases
10.0.1.1 db1.foo.com
</PRE>
</DL>
<P>
Then the hosts in the Web-group can be matched with <B>DISPLAYGROUP=Web</B>,
and the database servers can be matched with <B>DISPLAYGROUP="Production databases"</B>.
Note that you can also use regular expressions, e.g. <B>DISPLAYGROUP=%database</B>.
If there is no group-setting for the host, use "DISPLAYGROUP=NONE".
<P>
<B>EXDISPLAYGROUP=groupstring</B>
Rule excluding a group by matching the display-group string.
<P>
<B>TIME=timespecification</B>
Rule matching by the time-of-day. This is specified as the DOWNTIME
time specification in the hosts.cfg file. E.g. "TIME=W:0800:2200"
applied to a rule will make this rule active only on week-days between
8AM and 10PM.
<P>
<A NAME="lbAR"> </A>
<H2>DIRECTING ALERTS TO GROUPS</H2>
For some tests - e.g. "procs" or "msgs" - the right group of people
to alert in case of a failure may be different, depending on which
of the client rules actually detected a problem. E.g. if you have
PROCS rules for a host checking both "httpd" and "sshd" processes,
then the Web admins should handle httpd-failures, whereas "sshd"
failures are handled by the Unix admins.
<P>
To handle this, all rules can have a "GROUP=groupname" setting.
When a rule with this setting triggers a yellow or red status,
the groupname is passed on to the Xymon alerts module, so you
can use it in the alert rule definitions in
<I><A HREF="../man5/alerts.cfg.5.html">alerts.cfg</A>(5)</I>
to direct alerts to the correct group of people.
<P>
<A NAME="lbAS"> </A>
<H2>RULES: APPLYING SETTINGS TO SELECTED HOSTS</H2>
Rules must be placed after the settings, e.g.
<DL COMPACT>
<DT><DD>
<PRE>
LOAD 8.0 12.0 HOST=db.foo.com TIME=*:0800:1600
</PRE>
</DL>
<P>
<P>
If you have multiple settings that you want to apply the same rules to,
you can write the rules *only* on one line, followed by the settings. E.g.
<DL COMPACT>
<DT><DD>
<PRE>
HOST=%db.*.foo.com TIME=W:0800:1600
LOAD 8.0 12.0
DISK /db 98 100
PROC mysqld 1
</PRE>
</DL>
<P>
will apply the three settings to all of the "db" hosts on week-days between 8AM
and 4PM. This can be combined with per-settings rule, in which case the
per-settings rule overrides the general rule; e.g.
<DL COMPACT>
<DT><DD>
<PRE>
HOST=%.*.foo.com
LOAD 7.0 12.0 HOST=bax.foo.com
LOAD 3.0 8.0
</PRE>
</DL>
<P>
will result in the load-limits being 7.0/12.0 for the "bax.foo.com" host,
and 3.0/8.0 for all other foo.com hosts.
<P>
The entire file is evaluated from the top to bottom, and the first
match found is used. So you should put the specific settings first, and
the generic ones last.
<P>
<P>
<A NAME="lbAT"> </A>
<H2>NOTES</H2>
For the LOG, FILE and DIR checks, it is necessary also to configure the actual
file- and directory-names in the
<I><A HREF="../man5/client-local.cfg.5.html">client-local.cfg</A>(5)</I>
file. If the filenames are not listed there, the clients will not collect
any data about these files/directories, and the settings in the
analysis.cfg file will be silently ignored.
<P>
The ability to compute file checksums with MD5, SHA1 or RMD160 should not be
used for general-purpose file integrity checking, since the overhead of calculating
these on a large number of files can be significant. If you need this, look at
tools designed for this purpose - e.g. Tripwire or AIDE.
<P>
At the time of writing (april 2006), the SHA-1 and RMD160 algorithms are considered
cryptographically safe. The MD5 algorithm has been shown to have some weaknesses, and
is not considered strong enough when a high level of security is required.
<P>
<P>
<A NAME="lbAU"> </A>
<H2>SEE ALSO</H2>
<A HREF="../man8/xymond_client.8.html">xymond_client</A>(8), <A HREF="../man5/client-local.cfg.5.html">client-local.cfg</A>(5), <A HREF="../man8/xymond.8.html">xymond</A>(8), <A HREF="../man7/xymon.7.html">xymon</A>(7)
<P>
<P>
<HR>
<A NAME="index"> </A><H2>Index</H2>
<DL>
<DT><A HREF="#lbAB">NAME</A><DD>
<DT><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT><A HREF="#lbAE">FILE FORMAT</A><DD>
<DT><A HREF="#lbAF">CPU STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAG">DISK STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAH">MEMORY STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAI">PROCS STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAJ">MSGS STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAK">FILES STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAL">PORTS STATUS COLUMN SETTINGS</A><DD>
<DT><A HREF="#lbAM">SVCS status (Microsoft Windows clients)</A><DD>
<DT><A HREF="#lbAN">DS - RRD based status override</A><DD>
<DT><A HREF="#lbAO">MQ Series SETTINGS</A><DD>
<DT><A HREF="#lbAP">CHANGING THE DEFAULT SETTINGS</A><DD>
<DT><A HREF="#lbAQ">RULES TO SELECT HOSTS</A><DD>
<DT><A HREF="#lbAR">DIRECTING ALERTS TO GROUPS</A><DD>
<DT><A HREF="#lbAS">RULES: APPLYING SETTINGS TO SELECTED HOSTS</A><DD>
<DT><A HREF="#lbAT">NOTES</A><DD>
<DT><A HREF="#lbAU">SEE ALSO</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 09:41:02 GMT, February 23, 2014
</BODY>
</HTML>
|