File: developer-guide.pod

package info (click to toggle)
spong 2.7.7-19
links: PTS
area: main
in suites: etch, etch-m68k
size: 1,880 kB
ctags: 1,224
sloc: perl: 6,640; sh: 2,247; makefile: 237
file content (493 lines) | stat: -rw-r--r-- 14,623 bytes
parent folder | download | duplicates (3)
=head1 NAME

B<developer-guide> - developer's guide to Spong

=head1 DESCRIPTION

This is the developer guild to Spong. It documents the inner workings of the
client and server programs. It also describes the plug-in mechanism of the
B<spong-client> and B<spong-network> so that new check modules can be
developed for these programs.

=head1 PROTOCOLS

This section deals with the low level communication protocols that the clients
use to talk with the B<spong-server>.  The Spong and Big Brothers protocols
almost identical. They vary only in the data format.

=head2 SPONG PROTOCOL

The B<spong-server> listens in on port 1998 for status updates from clients.
After a socket has been opened, the client program sends a message with the
following format:

    command host service color time[:TTL] summary (\n)
    detailed status message line 1 (\n)
    detailed status message line 2 (\n)
    ...
    detailed status message line n (\n)

Where:

=over 4

=item command

The command being sent to the spong server indicating
a type of update message or a change in operating status of the client.

=item host

The fully qualified domain name of the host the message
is for.

=item service

The name of the service that the update message is
for.

=item color

The status color of the service (green - ok, yellow
- warning, red - alert).

=item time

The date/time of the update message in epoch time format
(i.e. the number of seconds since 01/01/70, 00:00 AM)

=item TTL

This optional field is the time to live, in seconds, for the status message. 
Normally a will become stale (i.e. purple status) after 2 times $SPONGSLEEP
seconds which is the default. See L<spong.conf/"$SPONGSLEEP">. This field will
override the default and keep the status message valid for a longer period of
time.

=item summary

The status summary message field. A short and to the point message that
summarizes the status being returned.

=item detailed status message

The remained lines of the message which will be the detailed information of the
status. Typically it can be the output of the C<df> command or the top processes
by CPU utilization or the detailed responses of network checks.

=back

=head2 BIG BROTHER PROTOCOL

The B<spong-server> listens in on port 1984 for status Big Brother
client updates. After a socket has been opens the client sends a message
with the following format:

    command host service color time summary (\n)
    detailed status message line 1 (\n)
    detailed status message line 2 (\n)
    ...
    detailed status message line n (\n)

Where:

=over 4

=item command

The command being sent to the spong server indicating
a type of update message or a change in operating status of the client.
At present the only command defined is C<status> which indicates a
service status update message.

=item host

The fully qualified domain name of the host the message
is for.

=item service

The name of the service that the update message is
for.

=item color

The status color of the service (green - ok, yellow
- warning, red - alert).

=item time

The date/time of the update message in standard date
format (i.e. Thu Jan 1 00:00:00 UTC 1970)

=item summary

The status summary message field.

=item detailed status message

The remained lines of the message
which will be the detailed information of the status. Typically it can
be the output of the df command or the top processes by CPU utilization
or the detailed responses of network checks.

=back

=head1 MODULES

B<spong-client>, B<spong-network>, B<spong-message> and B<spong-server> use
various routines which are coded as modules. When the programs are
initializing, they determine which modules are going to be required. The
programs then go out and load each of the modules from the library directory.
When the modules are loaded they register themselves with the plug-ins
registry.  The plug-in registry is the mechanism that the client programs use
to keep track of the modules into order to run them.

=head2 SERVER MODULES

B<spong-server> has a hook that allows external programs access to the incoming
status updates coming from Spong client programs. The hook takes the form of
Server Data modules which are called after spong-server stores the status 
update in it's database.

B<spong-server> passes all of the information of the update message in addition
to the current event status duration to the Data Module. The modules should
do any processing that they need to do in as short a time as possible. This is
to minimize the resource overhead with lots of simultaneous status updates
arrive at same time.

Debugging messages and error messages can be printed by using the 
&main::debug() and &main::error() functions respectively. If the module
develops a fatal error, it should terminate using the die() or croak() 
functions depending on ones preference. Modules should just return upon a 
successful invocation.  

NEW

There are two new types of Spong Server plugin modules: predata and postdata
modules.  The predata modules are called just after a message is received but
before the normal Spong processing is done. And a post module is called after
the Server's normal processing. i

Also in contrast to the old Spong Server data plugin modules all incoming
messages a sent to the new predata and postdata modules, not just status
messages.  Predata modules can even flag a message to tell the Spong Server to
drop the message and ignore it.  This will allow you a great deal more access
and control of the Server's incoming message processing. 

Modules are called a predictable, controlable order. There are run in the order
of the sorted registry hash keys. This allows you to create modules that have
run dependencies. 

And there is a new API to go along with the new modules. Now parameters are
passed in a Perl hash. This simplifies the module interface to the server. The
message type is determined by the C<cmd> field. Message hash details are 
in the L<spong-server-mod-template> document.

See L<spong-server-mod-template> for an example of how to code a
B<spong-server> Server Data Modules

=head2 CLIENT MODULES

Client modules define checks which are to be done on the host that the
B<spong-client> program is running on. The module's check function is called
without any parameters. Client modules are expected to issue any systems
command and parse the output in order to determine the service status.

Any threshold variables needed for warning and alert level trigger need to be
defined and placed into the F<SPONG/etc/spong.conf> file. The threshold
variable need to be uniquely named and should be named according to the type of
check being done (i.e. $DISKWARN or $DFWARN for disk checks and $CPUWARN for
CPU checks, etc.).

Once the service status and messages have been determined the module
can call the C<E<amp>main::status()> function in order to send the
information back to the spong-server. See L<"Status Function"> for more 
information.

=head2 NETWORK MODULES

Network modules defined checks that to be done on hosts over the network
to ensure that a network service is running. The modules are called with
the name of the host the check is to be done to. The modules is also expected
to put an alarm wrapper around the code that performs the check. This is
to prevent excessive delays dues to lost communications. It is suggested
that 10 seconds be used for the alarm setting.

The module should not call the E<amp>main::status() function directly.
B<spong-network> has a mechanism for rechecking services that are reported down
on an initial check. If the recheck mechanism is engaged, "red" statuses will
be downgraded to "yellow" until a failure count threshold is reached. The the
services will be reported as "red".

After the status condition has been determined the check function should return
three parameters:

=over 4

=item STATUS

The status color either "red", "yellow", or "green".

=item SUMMARY

A short line line summary of the status

=item MESSAGE

more detailed information (can be multi-lined)

=back

These parameters are the same that will be passed to the C<E<amp>main::status()>
command. See L<"Status Function"> for more information on these parameters.

The network modules have two support functions available,
C<E<amp>main::check_tcp()> and C<E<amp>main::check_simple()>, which can
simplify simple TCP port checks.

C<E<amp>main::check_tcp( host, port, data );>

Where the arguments are:

=over 4

=item HOST

The name or ip address of the host to be checked.

=item PORT

The name, or port number, of service to connect with.

=item DATA

The data to be send to the host after the port is opened.

=back

The function C<E<amp>main::check_tcp()> will make a connection to
the given PORT on the HOST and send a message (DATA). It then returns what
it gets back to the caller.

C<E<amp>main::check_simple( host, port, send, check, service)>

Where the arguments are

=over 4

=item HOST

The name or ip address of the host to be checked.

=item PORT

The name of the port to connect with.

=item SEND

The message to sent to the host after the port is opened.

=item CHECK

A perl regular express to be used to validate the response return
by the host.

=item SERVICE

The name of the service being check. It is used in the summary
and detailed status messages.

The function C<E<amp>main::check_simple()> is a generic TCP port checking
routine. This will go out connect to a given port (using C<E<amp>main::check_tcp()>) and
check to make sure you get back expected results. The function returned
three parameters: STATUS, SUMMARY and MESSAGE as detailed above. The return
values of this function can returned as the necessary returned values  of 
the check module.

=back

=head2 MESSAGE MODULES

Message modules are function called to send a notification message to
a contact on a specific service or service. The messaging functions are called
with an the contacts identifier for the messaging service (i.e. the page PIN
code of a paging provider). The messaging module is expected to take care of
all of the data formating and communications logic to send a notification
message to the contact.

The messaging functions has access to these global variable in order format
a notification message:

=over 4

=item I<$color>

The status color of the message

=item I<$host>

The fully qualified domain name of the host

=item I<$time>

The date and time of the message being sent. (format is
epoch time or time())

=item I<$message>

A one line summary status line

=item I<$duration>

The current duration of the current status in seconds.
(a zero duration indicates a change in status)

=back

There are two support functions that be used to format a message and send the
message via e-mail: C<E<amp>main::email_status()> and
C<E<amp>main::email_mini_status()>.  Both functions format e-mail message to be
send to RECIPIENTS, but C<email_mini_status()> sends out a shorter mail message
which is more suitable for SMS and smaller alpha pagers.

Both functions are called thusly:

C<E<amp>main::email_status( recipient, flags )>

C<E<amp>main::email_mini_status( recipient, flags )>

Where the arguments to the functions are:

=over 4

=item RECIPIENT

one or more e-mail recipients which placed in the to: line
of the mail message.

=item FLAGS

flags to alter the formating of the message.

=back

The only flag current defined is 'shortsubject'. This prevents $color,
$hostname, and $summary from being placed on the "subject:" line.

=head1 CREATING MODULES

Creating the actual modules is very trivia to do. Create your module by
following the appropriate template from below.

=over 4

=item *

L<spong-client Module Template|spong-client-mod-template>

=item *

L<spong-network Module Template|spong-network-mod-template>

=item *

L<spong-message Module Template|spong-message-mod-template>

=item *

L<spong-server Module Template|spong-server-mod-template>

=back

Then place your template module file into the appropriate directory below.

=over 4

=item *

B<spong-client> - F<LIBDIR/Spong/Client/plugins>

=item *

B<spong-network> - F<LIBDIR/Spong/Network/plugins>

=item *

B<spong-message> - F<LIBDIR/Spong/Message/plugins>

=item *

B<spong-server> - F<LIBDIR/Spong/plugins>

=back

Then test your modules by running the program with the --debug parameter to see
if it is operating properly.

=head1 Status Function

E<amp>main::status( SERVERADDR, HOST, SERVICE, COLOR, SUMMARY, MESSAGE )

The arguments to the C<E<amp>main::status()> function are:

=over 4

=item SERVERADDR 

Should be I<$SPONGSERVER>.

=item HOST

The full hostname of the machine being reported.

=item SERVICE

The a short name that describes the service
that you are reporting on.

=item COLOR

The color of the status being reported, either "green", "yellow", or "red".
"green" denotes an OK status, there are no problems and everything is within
normal parameters. "yellow" denotes a warning status, a abnormal situation that
has which may be need to be looked at or a parameter has changed (up or down)
towards a critical level. "red" denotes an alert status, a critical situation
that has arisen and needs immediate attention or a parameter has changed (up
or down) to a critical level.

=item SUMMARY

A short one line summary of the status. This should be a short and concise
summary of the current situation of the service. The simplest form is to 
say "Service is OK" or "Service is down". Another form is to display current
information (like system uptime, number of job and users) and additional
text for warning and alerts (i.e. "uptime - 123, jobs - 123, users - 123, cpu
load level is at 3.2"). 

If you are reporting on multiple sets of like items (like file partitions or
processes), report the names of those items that are abnormal, (i.e.
"filesystems: / at 99%, /tmp at 100%"). 

=item MESSAGE

This is the place to put detailed information about the status of the service.
Typically this will be the output of the system commands or function calls. For example, it could be the 10 jobs by cpu usage in a C<ps> command, or the output
of a df command for disk checking. 

There are no limitations on the contexts of the field. You can include URL's
that link to another monitor package or take you to an administration web page
for the service in question.

=back

=head1 SEE ALSO

L<spong-server>, L<spong-client>, L<spong-network>, L<spong-message>,
L<spong-server-mod-template>, L<spong-client-mod-template>,
L<spong-network-mod-template>, L<spong-message-mod-template>, L<spong.conf>

=head1 AUTHOR

Stephen L Johnson, <F<sjohnson@monsters.org>>