File: 07-Operations.md

package info (click to toggle)
icingadb 1.5.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 59,960 kB
  • sloc: ansic: 170,157; asm: 7,097; sql: 4,098; sh: 1,614; cpp: 1,132; makefile: 438; xml: 160
file content (127 lines) | stat: -rw-r--r-- 6,463 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Operations

This section is a loose collection of various topics and external references for running Icinga DB on a day-to-day basis.
It covers topics such as self-monitoring, backups, and specifics of third-party components.

## Monitor Icinga DB

It is strongly recommended to monitor the monitoring.

There is a built-in [`icingadb` check command](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#icingadb) in the Icinga 2 ITL.
It covers several potential errors, including operations that take too long or invalid high availability scenarios.
Even if the Icinga DB has crashed, checks will still run and Icinga 2 would generate notifications.

In addition, both the Redis® and the relational database should be monitored.
There are predefined check commands in the ITL to choose from.

- [`redis`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#redis)
- [`mysql`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#mysql)
- [`mysql_health`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#mysql_health)
- [`postgres`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#postgres)

A simpler approach would be to check if the processes are running, e.g.,
with [`proc`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#procs) or
[`systemd`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#systemd).

## Backups

There are only two things to back up in Icinga DB.

1. The configuration file in `/etc/icingadb` and
2. the relational database, using `mysqldump`, `mariadb-dump` or `pg_dump`.

!!! warning

    When creating a database dump for MySQL or MariaDB with `mysqldump` or `mariadb-dump`,
    use the [`--single-transaction` command line argument flag](https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html#option_mysqldump_single-transaction)
    to not lock the whole database while the backup is running.

## Third-Party Configuration

Icinga DB relies on external components to work.
The following collection is based on experience.
It is a target for continuous improvement.

### MySQL and MariaDB

#### `max_allow_packets`

The `max_allow_packets` system variable limits the size of messages between MySQL/MariaDB servers and clients.
More information is available in
[MySQL's "Replication and max_allowed_packet" documentation section](https://dev.mysql.com/doc/refman/8.4/en/replication-features-max-allowed-packet.html),
[MySQL's variable documentation](https://dev.mysql.com/doc/refman/8.4/en/server-system-variables.html#sysvar_max_allowed_packet) and
[MariaDB's variable documentation](https://mariadb.com/kb/en/server-system-variables/#max_allowed_packet).

The database configuration should have `max_allow_packets` set to at least `64M`.

#### Amazon RDS for MySQL

When importing the MySQL schema into Amazon RDS for MySQL, the following may occur.

```
Error 1419: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)
```

This error can be mitigated by creating and modifying a custom DB parameter group as described in the related [AWS Knowledge Center article](https://repost.aws/knowledge-center/rds-mysql-functions).

#### Galera Cluster

Starting with Icinga DB version 1.2.0, Galera support has been added to the Icinga DB daemon.
Its specific database configuration is described in the [Galera configuration section](03-Configuration.md#galera-cluster).

As mentioned in [MariaDB's known Galera cluster limitations](https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/),
transactions are limited in both amount of rows (128K) and size (2GiB).
A busy Icinga setup can cause Icinga DB to create transactions that exceed these limits with the default configuration.

If you get an error like `Error 1105 (HY000): Maximum writeset size exceeded`
and your Galera node logs something like `WSREP: transaction size limit (2147483647) exceeded`,
decrease the values of `max_placeholders_per_statement` and `max_rows_per_transaction` in Icinga DB's
[Database Options](https://icinga.com/docs/icinga-db/latest/doc/03-Configuration/#database-options).

### Redis®
The official [Redis® administration documentation](https://redis.io/docs/latest/operate/oss_and_stack/management/admin/) is quite useful
regarding the operation of Redis® and is the recommendation for this topic in general.
Below, we will address topics specific to Icinga setups.

#### Redis® requires memory overcommit on Linux

On Linux, enable [memory overcommitting](https://www.kernel.org/doc/Documentation/vm/overcommit-accounting).

```shell
sysctl vm.overcommit_memory=1
```

To persist this setting across reboots, add the following line to [`sysctl.conf(5)`](https://man7.org/linux/man-pages/man5/sysctl.conf.5.html).
If your distribution uses systemd, a configuration file under `/etc/sysctl.d/` is required, as described by
[`systemd-sysctl.service(8)`](https://www.freedesktop.org/software/systemd/man/latest/systemd-sysctl.service.html) and
[`sysctl.d(5)`](https://man7.org/linux/man-pages/man5/sysctl.d.5.html).

```
vm.overcommit_memory = 1
```

#### Huge memory footprint and IO usage in large setups

For large setups, the default Redis® configuration is slightly problematic, since Redis® will try to perpetually save its state
to the filesystem, a process that gets triggered often enough to make the save process a permanent companion.
This will increase the memory usage by a factor of two and also cause a troublesome amount of IO.

As an improvement, Redis® can be [configured to produce the dump less often or not at all with the `save` setting](https://redis.io/docs/latest/operate/oss_and_stack/management/persistence) in the
configuration. Be warned here, that in case of a crash (of Redis® or the whole machine) all the data after the last dump
is lost, meaning that all the events which were not already persisted by Icinga DB or persisted by Redis® are lost then.

To completely disable retention, the `save` setting must be given an empty argument:

```
save ""
```

Alternatively, to reduce the occurrences of dumps, something similar to

```
save 3600 1 900 100000
```

can be used.
In this example, a dump is performed every hour (3600s) if at least on changes occurred in that time frame
and every fifteen minutes (900s) if at least 100,000 changes occurred.