File: design-node-add.rst

package info (click to toggle)
ganeti-2.15 2.15.2-15
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 17,920 kB
  • sloc: python: 104,179; haskell: 42,371; sh: 3,696; makefile: 2,672
file content (169 lines) | stat: -rw-r--r-- 6,037 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
Design for adding a node to a cluster
=====================================

.. contents:: :depth: 3


Note
----

Closely related to this design is the more recent design
:doc:`node security <design-node-security>` which extends and changes
some of the aspects mentioned in this document. Make sure that you
read the more recent design as well to get an up to date picture of
Ganeti's procedure for adding new nodes.


Current state and shortcomings
------------------------------

Before a node can be added to a cluster, its SSH daemon must be
re-configured to use the cluster-wide SSH host key. Ganeti 2.3.0 changed
the way this is done by moving all related code to a separate script,
``tools/setup-ssh``, using Paramiko. Before all such configuration was
done from ``lib/bootstrap.py`` using the system's own SSH client and a
shell script given to said client through parameters.

Both solutions controlled all actions on the connecting machine; the
newly added node was merely executing commands. This implies and
requires a tight coupling and equality between nodes (e.g. paths to
files being the same). Most of the logic and error handling is also done
on the connecting machine.

Once a node's SSH daemon has been configured, more than 25 files need to
be copied using ``scp`` before the node daemon can be started. No
verification is being done before files are copied. Once the node daemon
is started, an opcode is submitted to the master daemon, which will then
copy more files, such as the configuration and job queue for master
candidates, using RPC. This process is somewhat fragile and requires
initiating many SSH connections.

Proposed changes
----------------

SSH
~~~

The main goal is to move more logic to the newly added node. Instead of
having a relatively large script executed on the master node, most of it
is moved over to the added node.

A new script named ``prepare-node-join`` is added. It receives a JSON
data structure (defined :ref:`below <prepare-node-join-json>`) on its
standard input. Once the data has been successfully decoded, it proceeds
to configure the local node's SSH daemon and root's SSH settings, after
which the SSH daemon is restarted.

All the master node has to do to add a new node is to gather all
required data, build the data structure, and invoke the script on the
node to be added. This will enable us to once again use the system's own
SSH client and to drop the dependency on Paramiko for Ganeti itself
(``ganeti-listrunner`` is going to continue using Paramiko).

Eventually ``setup-ssh`` can be removed.


Node daemon
~~~~~~~~~~~

Similar to SSH setup changes, the process of copying files and starting
the node daemon will be moved into a dedicated program. On its standard
input it will receive a standardized JSON structure (defined :ref:`below
<node-daemon-setup-json>`). Once the input data has been successfully
decoded and the received values were verified for sanity, the program
proceeds to write the values to files and then starts the node daemon
(``ganeti-noded``).

To add a new node to the cluster, the master node will have to gather
all values, build the data structure, and then invoke the newly added
``node-daemon-setup`` program via SSH. In this way only a single SSH
connection is needed and the values can be verified before being written
to files.

If the program exits successfully, the node is ready to be added to the
master daemon's configuration. The node daemon will be running, but
``OpNodeAdd`` needs to be run before it becomes a full node. The opcode
will copy more files, such as the :doc:`RAPI certificate <rapi>`.


Data structures
---------------

.. _prepare-node-join-json:

JSON structure for SSH setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The data is given in an object containing the keys described below.
Unless specified otherwise, all entries are optional.

``cluster_name``
  Required string with the cluster name. If a local cluster name is
  found, the join process is aborted unless the passed cluster name
  matches the local name.
``node_daemon_certificate``
  Public part of cluster's node daemon certificate in PEM format. If a
  local node certificate and key is found, the join process is aborted
  unless this passed public part can be verified with the local key.
``ssh_host_key``
  List containing public and private parts of SSH host key. See below
  for definition.
``ssh_root_key``
  List containing public and private parts of root's key for SSH
  authorization. See below for definition.

Lists of SSH keys use a tuple with three values. The first describes the
key variant (``rsa`` or ``dsa``). The second and third are the private
and public part of the key. Example:

.. highlight:: javascript

::

  [
    ("rsa", "-----BEGIN RSA PRIVATE KEY-----...", "ssh-rss AAAA..."),
    ("dsa", "-----BEGIN DSA PRIVATE KEY-----...", "ssh-dss AAAA..."),
  ]


.. _node-daemon-setup-json:

JSON structure for node daemon setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The data is given in an object containing the keys described below.
Unless specified otherwise, all entries are optional.

``cluster_name``
  Required string with the cluster name. If a local cluster name is
  found, the join process is aborted unless the passed cluster name
  matches the local name. The cluster name is also included in the
  dictionary given via the ``ssconf`` entry.
``node_daemon_certificate``
  Public and private part of cluster's node daemon certificate in PEM
  format. If a local node certificate is found, the process is aborted
  unless it matches.
``ssconf``
  Dictionary with ssconf names and their values. Both are strings.
  Example:

  .. highlight:: javascript

  ::

    {
      "cluster_name": "cluster.example.com",
      "master_ip": "192.168.2.1",
      "master_netdev": "br0",
      // ...
    }

``start_node_daemon``
  Boolean denoting whether the node daemon should be started (or
  restarted if it was running for some reason).

.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: