File: troubleshooting.rst

package info (click to toggle)
designate 1%3A21.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 6,860 kB
  • sloc: python: 49,608; sh: 1,914; sql: 155; makefile: 83; javascript: 3
file content (125 lines) | stat: -rw-r--r-- 4,344 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Troubleshooting
===============

I have a broken zone
--------------------

A zone is considered broken when it is not receiving updates anymore.
Its status can be "ERROR" if Designate detected the error condition
or it can be stuck in "PENDING" for a long time.

Review the logs from the API, Central, Producer, Worker and MiniDNS.
Identify the transaction ID of the last successful change and the first
failing change. Using the ID, you can filter logs from the Designate
components that are related to the same transaction.
Look for log messages with ERROR level before and after
the first failing update.

Failures in updating a zone are usually related to problems in Producer,
Worker, MiniDNS or the database.

Ensure the services are running and network connectivity is not impaired.

Transient network issues can be the cause of a broken zone.
Producer and Worker are stateful services and perform attempts at restoring
failing zones over time. Restarting the services will trigger new attempts.


I have a broken pool
--------------------

I deleted a zone but it's still in the database
-----------------------------------------------

Deleted zones are flagged with "status" set to "DELETED" and "task" set to
"NONE" once the deletion process terminates successfully.

What ports should be open?
--------------------------

Port numbers are configurable: review your designate.conf

The default values are:

+------------------------+------------+----------+
| Component              | Protocol   | Port     |
| (header rows optional) |            | numbers  |
+========================+============+==========+
+------------------------+------------+----------+
| API                    | TCP        | 9001     |
+------------------------+------------+----------+
| Keystone (external)    | TCP        | 35357    |
+------------------------+------------+----------+
| MiniDNS                | TCP        | 5354     |
+                        +------------+----------+
|                        | UDP        | 5354     |
+------------------------+------------+----------+
| MySQL                  | TCP        |    3306  |
+------------------------+------------+----------+
| RabbitMQ               | TCP        |    5672  |
+------------------------+------------+----------+
| Resolvers              | TCP        | 53       |
+                        +------------+----------+
|                        | UDP        | 53       |
+------------------------+------------+----------+
| ZooKeeper              | TCP        |    2181  |
+                        +------------+----------+
|                        | TCP        | 2888,3888|
+------------------------+------------+----------+



What network protocol are used?
-------------------------------

HTTP[S] by the API, RabbitMQ and the MySQL protocol by most components,
DNS (resolution and XFR), ZooKeeper, Memcached.

What needs access to the Database?
----------------------------------

Central, MiniDNS

What needs access to RabbitMQ?
------------------------------

The API, Central, Producer, Worker, MiniDNS

What needs access to ZooKeeper?
-------------------------------

Pool and Producer

What needs access to Memcached?
-------------------------------

API and Worker

How do I monitor Designate?
---------------------------

Designate can be monitored by various
`monitoring systems listed here <https://wiki.openstack.org/wiki/Operations/Monitoring>`_

What are useful metrics to monitor?
-----------------------------------

* General host monitoring, i.e. CPU load, memory usage, disk and network I/O
* MySQL performance, errors and free disk space
* Number of zones in ACTIVE, PENDING and ERROR status
* API queries per second, broken down by "read" and "write" operation on zones,
  records, etc
* Zone change propagation time i.e. how long does it takes for a record update
  to reach the resolvers
* Log messages containing having "ERROR" level
* Quotas utilization i.e. number of existing records/zones against the
  maximum allowed
* Memcached, RabbitMQ, ZooKeeper performance and errors


What are useful metrics to review first during an incident?
-----------------------------------------------------------

* Host, network and MySQL performance metrics
* Number of zones in ACTIVE, PENDING and ERROR status
* Log messages containing having "ERROR" level