File: Debugging.md

package info (click to toggle)
cacti 1.2.24%2Bds1-1%2Bdeb12u5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 71,012 kB
  • sloc: php: 119,968; javascript: 29,780; sql: 2,632; xml: 1,823; sh: 1,248; perl: 194; makefile: 65; ruby: 9
file content (257 lines) | stat: -rw-r--r-- 8,969 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# Debugging

Cacti users sometimes complain about NaN's in their graphs. Unfortunately, there
are several reasons for this result. The following is a step-by-step procedure
recommended for debugging.

## Check Cacti Log File

Your cacti log file should be located at `<path_cacti>/log/cacti.log`. If it is
not, see `Settings`, `Paths`. Check for this kind of error:

```console
SPINE: Host[...] DS[....] WARNING: SNMP timeout detected [500 ms], ignoring host '........'
```

For "reasonable" timeouts, this may be related to a snmpbulkwalk issue. To
change this, see `Settings`, `Poller` and lower the value for `The Maximum SNMP
OIDs Per SNMP Get Request`. Start at a value of 2 and increase it again, if the
poller starts working. (1 or less disables snmpbulkwalk) Some agent's don't have
the horsepower to deliver that many OIDs at a time. Therefore, we can reduce the
number for those older/under-powered devices.

## Check Basic Data Gathering

For scripts, run them as cactiuser from CLI to check basic functionality. E.g.
for a Perl script named `your-perl-script.pl` with parameters "p1 p2" under *nix
this would look like:

```sh
su - cactiuser
/full/path/to/perl your-perl-script.pl p1 p2
... (check output)
```

For SNMP, snmpget the _exact_ OID you're asking for, using same community string
and SNMP version as defined within cacti. For an OID of `.1.3.6.1.4.something`,
community string of `very-secret` and version 2 for target host `target-host`
this would look like

```sh
snmpget -c very-secret -v 2c target-host .1.3.6.1.4.something
.... (check output)
```

## Check Cacti's poller

First make sure that crontab always shows poller.php. This program will either
call cmd.php, the PHP based poller _or_ spine, the fast alternative, written in
C. Define the poller you're using at `Settings`, `Poller`. Spine has to be
implemented separately, it does not come with cacti by default.

Now, clear `./log/cacti.log` (or rename it to get a fresh start)

Then, change `Settings`, `Poller Logging Level` to DEBUG for _one_ polling
cycle. You may rename this log as well to avoid more stuff added to it with
subsequent polling cycles.

Now, find the host/data source in question. The `Host[<id>]` is given
numerically, the `<id>` being a specific number for that host. Find this `<id>`
from the `Devices` menu when editing the host: The URL contains a string like

```console
id=<id>
```

Check, whether the output is as expected. If not, check your script (e.g.
`/full/path/to/perl`). If OK, proceed to next step

This procedure may be replaced by running the poller manually for the failing
host only. To do so, you need the `<id>`, again. If you're using cmd.php, set
the DEBUG logging level as defined above and run

```sh
php -q cmd.php <id> <id>
```

If you're using spine, you may override logging level when calling the poller:

```sh
./spine --verbosity=5 <id> <id>
```

All output is printed to STDOUT in both cases. This procedure allows for
repeated tests without waiting for the next polling interval. And there's no
need to manually search for the failing host between hundreds of lines of
output.

## Check MySQL updating

In most cases, this step can be skipped. You may want to return to this step if
the next one fails (e.g. no rrdtool update to be found)

From debug log, find the MySQL update statement for that host concerning table
`poller_output`. On very rare occasions, this will fail. Copy that SQL statement
and paste it to a MySQL session started from CLI. This may as well be done from
some tool like phpMyAdmin. Check the SQL return code.

## Check RRDfile updating

Down in the same log, you should find some

```sh
rrdtool update <filename> --template ...
```

You should find exactly one update statement for each file.

RRDfiles should be created by the poller. If it does not create them, it will
not fill them either. If it does check your `Poller Cache` from Utilities and
search for your target. Does the query show up here?

## Check RRDfile ownership

If RRDfiles were created e.g. with root ownership, a poller running as
cactiuser will not be able to update those files

```sh
cd /var/www/html/cacti/rra
ls -l localhost*
-rw-r--r--  1 root      root      463824 May 31 12:40 localhost_load_1min_5.rrd
-rw-r--r--  1 cactiuser cactiuser 155584 Jun  1 17:10 localhost_mem_buffers_3.rrd
-rw-r--r--  1 cactiuser cactiuser 155584 Jun  1 17:10 localhost_mem_swap_4.rrd
-rw-r--r--  1 cactiuser cactiuser 155584 Jun  1 17:10 localhost_proc_7.rrd
-rw-r--r--  1 cactiuser cactiuser 155584 Jun  1 17:10 localhost_users_6.rrd
```

Run the following command to cure this problem

```sh
chown cactiuser:cactiuser *.rrd
```

## Check RRDfile numbers

You're perhaps wondering about this step, if the former was OK. But due to data
sources MINIMUM and MAXIMUM definitions, it is possible, that valid updates for
RRDfiles are suppressed, because MINIMUM was not reached or MAXIMUM was
exceeded.

Assuming, you've got some valid `rrdtool update` in step 3, perform a

```sh
rrdtool fetch <RRDfile> AVERAGE
```

and look at the last 10-20 lines. If you find NaN's there, perform

```sh
rrdtool info <RRDfile>
```

and check the `ds[...].min` and `ds[...].max` entries, e.g.

```sh
ds[loss].min = 0.0000000000e+00
ds[loss].max = 1.0000000000e+02
```

In this example, MINIMUM = 0 and MAXIMUM = 100. For a `ds.[...].type=GAUGE`
verify, that e.g. the number returned by the script does not exceed
`ds[...].MAX` (same holds for MINIMUM, respectively).

If you run into this, not only should you update the data source definition
within the Data Template, but also perform a:

```sh
rrdtool tune <RRDfile> --maximum <ds-name>:<new ds maximum>
```

for all existing RRDfiles belonging to that Data Template.

At this step, it is wise to check `step` and `heartbeat` of the RRDfile as
well. For standard 300 seconds polling intervals (step=300), it is wise to set
`minimal_heartbeat` to 600 seconds. If a single update is missing and the next
one occurs in less than 600 seconds from the last one, RRDtool will interpolate
the missing update. Thus, gaps are "filled" automatically by interpolation. Be
aware of the fact, that this is no "real" data! Again, this must be done in the
Data Template itself and by using `rrdtool tune` for all existing RRDfiles of
this type.

## Check `rrdtool graph` statement

Last resort would be to check, that the correct data sources are used. Go to
`Graph Management` and select your Graph. Enable DEBUG Mode to find the whole
`rrdtool graph` statement. You should notice the `DEF` statements. They specify
the RRDfile and data source to be used. You may check, that all of them are as
wanted.

## Miscellaneous

Up to current cacti 0.8.6h, table `poller_output` may increase beyond reasonable
size.

This is commonly due to php.ini's memory settings of 8MB default. Change this to
at least 64 MB.

To check this, run the following SQL from MySQL CLI (or phpMyAdmin or the like)

```sql
select count(*) from poller_output;
```

If the result is huge, you may get rid of those stuff by

```sql
truncate table poller_output;
```

As of current SVN code for upcoming cacti 0.9, I saw measures were taken on both
issues (memory size, truncating poller_output).

## RPM Installation

Most rpm installations will setup the crontab entry now. If you've followed the
installation instructions to the letter (which you should always do ;-) ), you
may now have two poller running. That's not a good thing, though. Most rpm
installations will setup cron in `/etc/cron.d/cacti`

Now check all your crontab, especially `/etc/crontab` and crontab of users root
and cactiuser. Leave only one poller entry for all of them. Personally, I've
chosen `/etc/cron.d/cacti` to avoid problems when updating RPM's. Most often,
you won't remember this item when updating lots of RPM's, so I felt more secure
to put it here. And I've made some slight modifications, see

```sh
shell> vi /etc/cron.d/cacti
```

```ini
*/5 * * * *     cactiuser       /usr/bin/php -q /var/www/html/cacti/poller.php > /var/local/log/poller.log 2>&1
```

This will produce a file `/var/local/log/poller.log`, which includes some
additional informations from each poller's run, such as RRDtool errors. It
occupies only some few bytes and will be overwritten each time.

If you're using the crontab of user "cactiuser" instead, this will look like

```sh
shell> crontab -e -u cactiuser
```

```ini
*/5 * * * *     /usr/bin/php -q /var/www/html/cacti/poller.php > /var/local/log/poller.log 2>&1
```

## Not NaN, but 0 (zero) values

Pay attention to custom scripts. It is required, that external commands called
from there are in the `$PATH` of the cactiuser running the poller. It is
therefor recommended to provide `/full/path/to/external/command`

User "criggie" reported an issue with running smartctl. It was complaining "you
are not root" so a quick `chmod +s` on the script fixed that problem.

---
<copy>Copyright (c) 2004-2023 The Cacti Group</copy>