File: troubleshooting.rst

package info (click to toggle)
ironic-python-agent 11.2.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 4,508 kB
  • sloc: python: 36,248; sh: 60; makefile: 29
file content (252 lines) | stat: -rw-r--r-- 9,737 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
.. _troubleshooting:

=========================================
Troubleshooting Ironic-Python-Agent (IPA)
=========================================

This document contains basic trouble shooting information for IPA.

Gaining Access to IPA on a node
===============================
In order to access a running IPA instance a user must be added or enabled on
the image. Below we will cover several ways to do this.

Access via ssh
--------------

ironic-python-agent-builder
~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSH access can be added to DIB built IPA images with the dynamic-login [0]_
or the devuser element [1]_

The dynamic-login element allows the operator to inject a SSH key when the
image boots. Kernel command line parameters are used to do this.

dynamic-login element example:

- Add ``sshkey="ssh-rsa BBA1..."`` to kernel_append_params setting in
  the ``ironic.conf`` file
- Restart the ironic-conductor with the command
  ``service ironic-conductor restart``

Install ``ironic-python-agent-builder`` following the guide [2]_

devuser element example::

  export DIB_DEV_USER_USERNAME=username
  export DIB_DEV_USER_PWDLESS_SUDO=yes
  export DIB_DEV_USER_AUTHORIZED_KEYS=$HOME/.ssh/id_rsa.pub
  ironic-python-agent-builder -o /path/to/custom-ipa -e devuser debian

Access via console
------------------
If you need to use console access, passwords must be enabled there are a
couple ways to enable this depending on how the IPA image was created:

ironic-python-agent-builder: dynamic-login
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Users wishing to use password access can be add the dynamic-login [0]_ or the
devuser element [1]_

The dynamic-login element allows the operator to change the root password or
SSH key dynamically when the image boots. Kernel command line parameters
are used to do this.

Generate a password hash with following command:

.. code-block:: console

    $ openssl passwd -1 -stdin | sed 's/\$/\$\$/g'

Add ``rootpwd="<openssl output>"`` value or add ``sshkey="<ssh public key>"``
on the ``kernel_append_params``
setting in the Ironic configuration file (``/etc/ironic/ironic.conf``).
Restart the ironic-conductor e.g. with

.. code-block:: console

   $ sudo systemctl restart ironic-conductor

.. warning::

   * The ``sed`` command is used to escape the ``$`` symbols in the
     configuration file.

   * The quotation marks around the value are mandatory.

   * Only 1 password or 1 SSH key is supported.

ironic-python-agent-builder: devuser
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users can also be added to DIB built IPA images with the devuser element [1]_.
Install ``ironic-python-agent-builder`` following the guide [2]_.

Example:

.. code-block:: bash

  export DIB_DEV_USER_USERNAME=username
  export DIB_DEV_USER_PWDLESS_SUDO=yes
  export DIB_DEV_USER_PASSWORD=PASSWORD
  ironic-python-agent-builder -o /path/to/custom-ipa -e devuser debian

How to pause the IPA for debugging
----------------------------------
When debugging issues with the IPA, in particular with cleaning, it may be
necessary to log in to the RAM disk before the IPA actually starts (and delay
the launch of the IPA). One easy way to do this is to set ``maintenance``
on the node and then trigger cleaning. Ironic will boot the node into the
RAM disk, but the IPA will stall until the maintenance state is removed. This
opens a time window to log into the node.

Another way to do this is to add simple cleaning steps in a custom hardware
manager which sleep until a certain condition is met, e.g. until a given
file exists. Having multiple of these "barrier steps" allows to go through the
cleaning steps and have a break point in between them.

Set IPA to debug logging
========================
Debug logging can be enabled a several different ways. The easiest way is to
add ``ipa-debug=1`` to the kernel command line. To do this:

- Append ``ipa-debug=1`` to the kernel_append_params setting in the
  ``ironic.conf`` file
- Restart the ironic-conductor with the command
  ``service ironic-conductor restart``

If the system is running and uses systemd then editing the services file
will be required.

- ``systemctl edit ironic-python-agent.service``
- Append ``--debug`` to end of the ExecStart command
- Restart IPA. See the `Manually restart IPA`_ section below.

Where can I find the IPA logs
=============================

Retrieving the IPA logs will differ depending on which base image was used.

* Operating system that do not use ``systemd`` (ie Ubuntu 14.04)

  - logs will be found in the /var/log/ folder.

* Operating system that do use ``systemd`` (ie Fedora, CentOS, RHEL)

  - logs may be viewed with ``sudo journalctl -u ironic-python-agent``
  - if using a diskimage-builder ramdisk, it may be configured to output all
    contents of the journal, including ironic-python-agent logs, by enabling
    the `journal-to-console element <https://docs.openstack.org/diskimage-builder/latest/elements/journal-to-console/README.html>`_.

In addition, Ironic is configured to retrieve IPA logs upon failures by default,
you can learn more about this feature in the `Ironic troubleshooting guide <https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_.

Manually restart IPA
====================

In some cases it is helpful to enable debug mode on a running node.
If the system does not use systemd then IPA can be restarted directly::

  sudo /usr/local/bin/ironic-python-agent [--debug]

If the system uses systemd then systemctl can be used to restart the service::

  sudo systemctl restart ironic-python-agent.service

Cleaning halted with ProtectedDeviceError
=========================================

The IPA service has halted cleaning as one of the block devices within or
attached to the bare metal node contains a class of filesystem which **MAY**
cause irreparable harm to a potentially running cluster if accidentally
removed.

These filesystems *may* be used for only local storage and as a result be
safe to erase. However if a shared block device is in use, such as a device
supplied via a Storage Area Network utilizing protocols such as iSCSI or
FibreChannel. Ultimately the Host Bus Adapter (HBA) may not be an entirely
"detectable" entity given the hardware market place and aspects such as
"SmartNICs" and Converged Network Adapters with specific offload functions
to support standards like "NVMe over Fabric" (NVMe-oF).

By default, the agent will prevent these filesystems from being deleted and
will halt the cleaning process when detected. The cleaning process can be
re-triggered via Ironic's state machine once one of the documented settings
have been used to notify the agent that no action is required.

What filesystems are looked for
-------------------------------

+-------------------------------------------+
| IBM General Parallel Filesystem           |
+-------------------------------------------+
| Red Hat Global Filesystem 2               |
+-------------------------------------------+
| VmWare Virtual Machine FileSystem (VMFS)  |
+-------------------------------------------+

I'm okay with deleting, how do I tell IPA to clean the disk(s)?
---------------------------------------------------------------

Four potential ways exist to signal to IPA. Please note, all of these options
require access either to the node in Ironic's API or ability to modify Ironic
configuration.

Via Ironic
~~~~~~~~~~

.. note:: This option requires that the version of Ironic be sufficient enough
   to understand and explicitly provide this option to the Agent.

Inform Ironic to provide the option to the Agent::

  baremetal node set --driver-info wipe_special_filesystems=True

Via a node's kernel_append_params setting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This may be set on a node level by utilizing the override
``kernel_append_params`` setting which can be utilized on a node
level. Example::

  baremetal node set --driver-info kernel_append_params="ipa-guard-special-filesystems=False"

Alternatively, if you wish to set this only once, you may use
the ``instance_info`` field, which is wiped upon teardown of the node.
Example::

  baremetal node set --instance-info kernel_append_params="ipa-guard-special-filesystems=False"

Via Ironic's Boot time PXE parameters (Globally)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Globally, this setting may be passed by modifying the ``ironic.conf``
configuration file on your cluster by adding
``ipa-guard-special-filesystems=False`` string to the
``[pxe]kernel_append_params`` parameter.

.. warning::
   If your running a multi-conductor deployment, all of your ``ironic.conf``
   configuration files will need to be updated to match.

Via Ramdisk configuration
~~~~~~~~~~~~~~~~~~~~~~~~~

This option requires modifying the ramdisk, and is the most complex, but may
be advisable if you have a mixed environment cluster where shared clustered
filesystems may be a concern on some machines, but not others.

.. warning::
   This requires rebuilding your agent ramdisk, and modifying the embedded
   configuration file for the ironic-python-agent. If your confused at all
   by this statement, this option is not for you.

Edit /etc/ironic_python_agent/ironic_python_agent.conf and set the parameter
``[DEFAULT]guard_special_filesystems`` to ``False``.


References
==========
.. [0] Dynamic-login DIB element: https://github.com/openstack/diskimage-builder/tree/master/diskimage_builder/elements/dynamic-login
.. [1] DevUser DIB element: https://github.com/openstack/diskimage-builder/tree/master/diskimage_builder/elements/devuser
.. [2] ironic-python-agent-builder: https://docs.openstack.org/ironic-python-agent-builder/latest/install/index.html