File: networkoutages.html

package info (click to toggle)
nagios2 2.6-2%2Betch5
  • links: PTS
  • area: main
  • in suites: etch
  • size: 6,856 kB
  • ctags: 4,475
  • sloc: ansic: 64,870; sh: 4,676; makefile: 787; perl: 722
file content (138 lines) | stat: -rw-r--r-- 5,315 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
<title>Network Outages</title>

<STYLE type="text/css">
<!--
        .Default { font-family: verdana,arial,serif; font-size: 8pt; }
        .PageTitle { font-family: verdana,arial,serif; font-size: 12pt; font-weight: bold; }
-->      
</STYLE>

</head>

<body bgcolor="#FFFFFF" text="black" class="Default">

<p>
<div align="center">
<h2 class="PageTitle">Network Outages</h2>
</div>
</p>

<hr>

<p>
<strong><u>Introduction</u></strong>
</p>

<p>
The <a href="cgis.html#outages_cgi">outages CGI</a> is designed to help pinpoint the cause of network outages.   For small networks this CGI may not be particularly useful, but for larger ones it will be.  Pinpointing the cause of outages will help admins to more quickly find and resolve problems which are causing the biggest impact on the network.
</p>

<p>
It should be noted that the outages CGI will not attempt to find the <i>exact</i> cause of the problem, but will rather locate the hosts on your network which seem to be causing the most problems.  Delving into the problem at a deeper level is left to the user, as there are any number of things which might actually be the cause of the problem.
</p>

<p>
<strong><u>Diagrams</u></strong>
</p>

<p>
The diagrams below help to show how the outages CGI goes about determining the cause of network outages.  You can click on either image for a larger version...
</p>

<table border=0 cellpadding=10 class="Default">

<tr>
<td valign=top><strong><u>Diagram 1</u></strong></td>
</tr>

<tr>
<td valign=top>
This diagram will serve as the basis for our example. All hosts shows in red are either down or unreachable (from the view of Nagios).  All other hosts are up.
</td>
</tr>

<tr>
<td align=center valign=top>
<a href="images/network-outage1.png"><img src="images/network-outage1.png" border=1 alt="Hosts That Are Down Or Unreachable" width=500 height=300></a>
</td>
</tr>

<tr>
<td valign=top><strong><u>Diagram 2</u></strong></td>
</tr>

<tr>
<td valign=top>
This diagram pinpoints the causes of the network outages (from the view of Nagios), and shows various groups of hosts which are affected by the outages.
</td>
</tr>

<tr>
<td align=center valign=top>
<a href="images/network-outage2.png"><img src="images/network-outage2.png" border=1 alt="Hosts That Are Causing Outages" width=500 height=300></a>
</td>
</tr>
</table>

<p>
<strong><u>Determining The Cause Of Network Outages</u></strong>
</p>

<p>
So how does the outages CGI determine which hosts are the source of problems?  <i>"Problem" hosts must be either in a DOWN or UNREACHABLE state <b>and</b> at least one of their immediate parent hosts must be UP</i>.  Hosts which fit this criteria are flagged as being potential problem hosts.
</p>
<p>
In order to determine whether these flagged hosts are causing network outages, we must performs some other tests...
</p>
<p>
If <i>all</i> of the immediate child hosts of one of these flagged hosts is DOWN or UNREACHABLE <i>and</i> has no immediate parent host that is up, the flagged host is the cause of a network outage.  If even one of the immediate children of a flagged host does <i>not</i> pass this test, then the flagged host is <i>not</i> the cause of a network outage.
</p>

<p>
<strong><u>Determining The Effects Of Network Outages</u></strong>
</p>

<p>
Along with telling you what hosts are causing problem on your network, the outages CGI will also tell you how many hosts and services are affected by a particular problem host.  How is this determined?  Take a look at diagram 2 above...
</p>

<p>
From the diagram it is clear that host 1 is blocking two child hosts (in domain A).  Host 2 is solely responsbile for blocking only itself (domain B) and host 3 is solely responsibly for blocking 7 hosts (domain C).  The outage effects of the two hosts in domain D are "shared" between hosts 2 and 3, since it is unclear as to which host is actually the cause of the outage.  If either host 2 or 3 was UP, the these hosts might not be blocked. 
</p>

<p>
The numbers of affected hosts for each problem host are as follows (the problem host is also included in these figures):
</p>

<p>
<ul>
<li>Host 1: 3 affected hosts
<li>Host 2: 3 affected hosts
<li>Host 3: 10 affected hosts
</ul>
</p>

<p>
<strong><u>Ranking Problems Based On Severity Level</u></strong>
</p>

<p>
The outages CGI will display all problem hosts, whether they are causing network outages or not.  However, the CGI will tell you how many of the problem hosts (if any) are causing network outages.
</p>

<p>
In order to display the problem hosts in a somewhat useful manner, they are sorted by the severity of the effect they are having on the network.  The severity level is determined by two things:  The number of hosts which are affected by problem host and the number of services which are affected.  Hosts hold a higher weight than services when it comes to calculating severity.  The current code sets this weight ratio at 4:1 (i.e. hosts are 4 times more important than individual services).
</p>

<p>
Assuming that all hosts in diagram 2 have an equal number of services associated with them, host 3 would be ranked as the most severe problem, while hosts 1 and 2 would have the same severity level.
</p>

<hr>

</body>
</html>