1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
|
.. index::
pair: XML element; status
Status
------
Pacemaker automatically generates a ``status`` section in the CIB (inside the
``cib`` element, at the same level as ``configuration``). The status is
transient, and is not stored to disk with the rest of the CIB.
The section's structure and contents are internal to Pacemaker and subject to
change from release to release. Its often obscure element and attribute names
are kept for historical reasons, to maintain compatibility with older versions
during rolling upgrades.
Users should not modify the section directly, though various command-line tool
options affect it indirectly.
.. index::
pair: XML element; node_state
single: node; state
Node State
##########
The ``status`` element contains ``node_state`` elements for each node in the
cluster (and potentially nodes that have been removed from the configuration
since the cluster started). The ``node_state`` element has attributes that
allow the cluster to determine whether the node is healthy.
.. topic:: Example minimal node state entry
.. code-block:: xml
<node_state id="1" uname="cl-virt-1" in_ccm="1721760952" crmd="1721760952" crm-debug-origin="controld_update_resource_history" join="member" expected="member">
<transient_attributes id="1"/>
<lrm id="1"/>
</node_state>
.. list-table:: **Attributes of a node_state Element**
:class: longtable
:widths: 20 20 60
:header-rows: 1
* - Name
- Type
- Description
* - .. _node_state_id:
.. index::
pair: node_state; id
id
- :ref:`text <text>`
- Node ID (identical to ``id`` of corresponding ``node`` element in the
``configuration`` section)
* - .. node_state_uname:
.. index::
pair: node_state; uname
uname
- :ref:`text <text>`
- Node name (identical to ``uname`` of corresponding ``node`` element in the
``configuration`` section)
* - .. node_state_in_ccm:
.. index::
pair: node_state; in_ccm
in_ccm
- :ref:`epoch time <epoch_time>` *(since 2.1.7; previously boolean)*
- If the node's controller is currently in the cluster layer's membership,
this is the epoch time at which it joined (or 1 if the node is in the
process of leaving the cluster), otherwise 0 *(since 2.1.7; previously,
it was "true" or "false")*
* - .. node_state_crmd:
.. index::
pair: node_state; crmd
crmd
- :ref:`epoch time <epoch_time>` *(since 2.1.7; previously an enumeration)*
- If the node's controller is currently in the cluster layer's controller
messaging group, this is the epoch time at which it joined, otherwise 0
*(since 2.1.7; previously, the value was either "online" or "offline")*
* - .. node_state_crm_debug_origin:
.. index::
pair: node_state; crm-debug-origin
crm-debug-origin
- :ref:`text <text>`
- Name of the source code function that recorded this ``node_state``
element (for debugging)
* - .. node_state_join:
.. index::
pair: node_state; join
join
- :ref:`enumeration <enumeration>`
- Current status of node's controller join sequence (and thus whether it
is eligible to run resources). Allowed values:
* ``down``: Not yet joined
* ``pending``: In the process of joining or leaving
* ``member``: Fully joined
* ``banned``: Rejected by DC
* - .. node_state_expected:
.. index::
pair: node_state; expected
expected
- :ref:`enumeration <enumeration>`
- What cluster expects ``join`` to be in the immediate future. Allowed
values are same as for ``join``.
.. _transient_attributes:
.. index::
pair: XML element; transient_attributes
single: node; transient attribute
single: node attribute; transient
Transient Node Attributes
#########################
The ``transient_attributes`` section specifies transient
:ref:`node_attributes`. In addition to any values set by the administrator or
resource agents using the ``attrd_updater`` or ``crm_attribute`` tools, the
cluster stores various state information here.
.. topic:: Example transient node attributes for a node
.. code-block:: xml
<transient_attributes id="cl-virt-1">
<instance_attributes id="status-cl-virt-1">
<nvpair id="status-cl-virt-1-pingd" name="pingd" value="3"/>
<nvpair id="status-cl-virt-1-fail-count-pingd:0.monitor_30000" name="fail-count-pingd:0#monitor_30000" value="1"/>
<nvpair id="status-cl-virt-1-last-failure-pingd:0" name="last-failure-pingd:0" value="1239009742"/>
</instance_attributes>
</transient_attributes>
.. index::
pair: XML element; lrm
pair: XML element; lrm_resources
pair: node; history
Node History
############
Each ``node_state`` element contains an ``lrm`` element with a history of
certain resource actions performed on the node. The ``lrm`` element contains an
``lrm_resources`` element.
.. index::
pair: XML element; lrm_resource
pair: resource; history
Resource History
________________
The ``lrm_resources`` element contains an ``lrm_resource`` element for each
resource that has had an action performed on the node.
An ``lrm_resource`` entry has attributes allowing the cluster to stop the
resource safely even if it is removed from the configuration. Specifically, the
resource's ``id``, ``class``, ``type`` and ``provider`` are recorded.
.. index::
pair: XML element; lrm_rsc_op
pair: action; history
Action History
______________
Each ``lrm_resource`` element contains an ``lrm_rsc_op`` element for each
recorded action performed for that resource on that node. (Not all actions are
recorded, just enough to determine the resource's state.)
.. list-table:: **Attributes of an lrm_rsc_op Element**
:class: longtable
:widths: 20 20 60
:header-rows: 1
* - Name
- Type
- Description
* - .. _lrm_rsc_op_id:
.. index::
pair: lrm_rsc_op; id
id
- :ref:`text <text>`
- Identifier for the history entry constructed from the resource ID,
action name or history entry type, and action interval.
* - .. _lrm_rsc_op_operation_key:
.. index::
pair: lrm_rsc_op; operation_key
operation_key
- :ref:`text <text>`
- Identifier for the action that was executed, constructed from the
resource ID, action name, and action interval.
* - .. _lrm_rsc_op_operation:
.. index::
pair: lrm_rsc_op; operation
operation
- :ref:`text <text>`
- The name of the action the history entry is for
* - .. _lrm_rsc_op_crm_debug_origin:
.. index::
pair: lrm_rsc_op; crm-debug-origin
crm-debug-origin
- :ref:`text <text>`
- Name of the source code function that recorded this entry (for
debugging)
* - .. _lrm_rsc_op_crm_feature_set:
.. index::
pair: lrm_rsc_op; crm_feature_set
crm_feature_set
- :ref:`version <version>`
- The Pacemaker feature set used to record this entry.
* - .. _lrm_rsc_op_transition_key:
.. index::
pair: lrm_rsc_op; transition-key
transition-key
- :ref:`text <text>`
- A concatenation of the action's transition graph action number, the
transition graph number, the action's expected result, and the UUID of
the controller instance that scheduled it.
* - .. _lrm_rsc_op_transition_magic:
.. index::
pair: lrm_rsc_op; transition-magic
transition-magic
- :ref:`text <text>`
- A concatenation of ``op-status``, ``rc-code``, and ``transition-key``.
* - .. _lrm_rsc_op_exit_reason:
.. index::
pair: lrm_rsc_op; exit-reason
exit-reason
- :ref:`text <text>`
- An error message (if available) from the resource agent or Pacemaker if
the action did not return success.
* - .. _lrm_rsc_op_on_node:
.. index::
pair: lrm_rsc_op; on_node
on_node
- :ref:`text <text>`
- The name of the node that executed the action (identical to the
``uname`` of the enclosing ``node_state`` element)
* - .. _lrm_rsc_op_call_id:
.. index::
pair: lrm_rsc_op; call-id
call-id
- :ref:`integer <integer>`
- A node-specific counter used to determine the order in which actions
were executed.
* - .. _lrm_rsc_op_rc_code:
.. index::
pair: lrm_rsc_op; rc-code
rc-code
- :ref:`integer <integer>`
- The resource agent's exit status for this action. Refer to the *Resource
Agents* chapter of *Pacemaker Administration* for how these values are
interpreted.
* - .. _lrm_rsc_op_op_status:
.. index::
pair: lrm_rsc_op; op-status
op-status
- :ref:`integer <integer>`
- The execution status of this action. The meanings of these codes are
internal to Pacemaker.
* - .. _lrm_rsc_op_interval:
.. index::
pair: lrm_rsc_op; interval
interval
- :ref:`nonnegative integer <nonnegative_integer>`
- If the action is recurring, its frequency (in milliseconds), otherwise
0.
* - .. _lrm_rsc_op_last_rc_change:
.. index::
pair: lrm_rsc_op; last-rc-change
last-rc-change
- :ref:`epoch time <epoch_time>`
- Node-local time at which the action first returned the current value of
``rc-code``.
* - .. _lrm_rsc_op_exec_time:
.. index::
pair: lrm_rsc_op; exec-time
exec-time
- :ref:`integer <integer>`
- Time (in seconds) that action execution took (if known)
* - .. _lrm_rsc_op_queue_time:
.. index::
pair: lrm_rsc_op; queue-time
queue-time
- :ref:`integer <integer>`
- Time (in seconds) that action was queued in the local executor (if known)
* - .. _lrm_rsc_op_op_digest:
.. index::
pair: lrm_rsc_op; op-digest
op-digest
- :ref:`text <text>`
- If present, this is a hash of the parameters passed to the action. If a
hash of the currently configured parameters does not match this, that
means the resource configuration changed since the action was performed,
and the resource must be reloaded or restarted.
* - .. _lrm_rsc_op_op_restart_digest:
.. index::
pair: lrm_rsc_op; op-restart-digest
op-restart-digest
- :ref:`text <text>`
- If present, the resource agent supports reloadable parameters, and this
is a hash of the non-reloadable parameters passed to the action. This
allows the cluster to choose between reload and restart when one is
needed.
* - .. _lrm_rsc_op_op_secure_digest:
.. index::
pair: lrm_rsc_op; op-secure-digest
op-secure-digest
- :ref:`text <text>`
- If present, the resource agent marks some parameters as sensitive, and
this is a hash of the non-sensitive parameters passed to the action.
This allows the value of sensitive parameters to be removed from a saved
copy of the CIB while still allowing scheduler simulations to be
performed on that copy.
Simple Operation History Example
________________________________
.. topic:: A monitor operation (determines current state of the ``apcstonith`` resource)
.. code-block:: xml
<lrm_resource id="apcstonith" type="fence_apc_snmp" class="stonith">
<lrm_rsc_op id="apcstonith_monitor_0" operation="monitor" call-id="2"
rc-code="7" op-status="0" interval="0"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
op-digest="2e3da9274d3550dc6526fb24bfcbcba0"
transition-key="22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
transition-magic="0:7;22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
last-rc-change="1239008085" exec-time="10" queue-time="0"/>
</lrm_resource>
The above example shows the history entry for a probe (non-recurring monitor
operation) for the ``apcstonith`` resource.
The cluster schedules probes for every configured resource on a node when
the node first starts, in order to determine the resource's current state
before it takes any further action.
From the ``transition-key``, we can see that this was the 22nd action of
the 2nd graph produced by this instance of the controller
(2668bbeb-06d5-40f9-936d-24cb7f87006a).
The third field of the ``transition-key`` contains a 7, which indicates
that the cluster expects to find the resource inactive. By looking at the
``rc-code`` property, we see that this was the case.
As that is the only action recorded for this node, we can conclude that
the cluster started the resource elsewhere.
Complex Operation History Example
_________________________________
.. topic:: Resource history of a ``pingd`` clone with multiple entries
.. code-block:: xml
<lrm_resource id="pingd:0" type="pingd" class="ocf" provider="pacemaker">
<lrm_rsc_op id="pingd:0_monitor_30000" operation="monitor" call-id="34"
rc-code="0" op-status="0" interval="30000"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="10:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
last-rc-change="1239009741" exec-time="10" queue-time="0"/>
<lrm_rsc_op id="pingd:0_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" call-id="32"
rc-code="0" op-status="0" interval="0"
transition-key="11:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
last-rc-change="1239009741" exec-time="10" queue-time="0"/>
<lrm_rsc_op id="pingd:0_start_0" operation="start" call-id="33"
rc-code="0" op-status="0" interval="0"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="31:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
last-rc-change="1239009741" exec-time="10" queue-time="0" />
<lrm_rsc_op id="pingd:0_monitor_0" operation="monitor" call-id="3"
rc-code="0" op-status="0" interval="0"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="23:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
last-rc-change="1239008085" exec-time="20" queue-time="0"/>
</lrm_resource>
When more than one history entry exists, it is important to first sort
them by ``call-id`` before interpreting them.
Once sorted, the above example can be summarized as:
#. A non-recurring monitor operation returning 7 (not running), with a
``call-id`` of 3
#. A stop operation returning 0 (success), with a ``call-id`` of 32
#. A start operation returning 0 (success), with a ``call-id`` of 33
#. A recurring monitor returning 0 (success), with a ``call-id`` of 34
The cluster processes each history entry to build up a picture of the
resource's state. After the first and second entries, it is
considered stopped, and after the third it considered active.
Based on the last operation, we can tell that the resource is
currently active.
Additionally, from the presence of a ``stop`` operation with a lower
``call-id`` than that of the ``start`` operation, we can conclude that the
resource has been restarted. Specifically this occurred as part of
actions 11 and 31 of transition 11 from the controller instance with the key
``2668bbeb...``. This information can be helpful for locating the
relevant section of the logs when looking for the source of a failure.
|