1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367
|
# Online cluster deployment and maintenance
The cluster component deploys production clusters as quickly as playground deploys local clusters, and it provides more powerful cluster management capabilities than playground, including upgrades to the cluster, downsizing, scaling and even operational auditing. It supports a very large number of commands:
```bash
$ tiup cluster
The component `cluster` is not installed; downloading from repository.
download https://tiup-mirrors.pingcap.com/cluster-v0.4.9-darwin-amd64.tar.gz 15.32 MiB / 15.34 MiB 99.90% 10.04 MiB p/s
Starting component `cluster`: /Users/joshua/.tiup/components/cluster/v0.4.9/cluster
Deploy a TiDB cluster for production
Usage:
tiup cluster [flags]
tiup [command]
Available Commands:
deploy Deployment Cluster
start Start deployed cluster
stop Stop Cluster
restart restart cluster
scale-in cluster shrinkage
Scale-out Cluster Scaling
destroy Destroy cluster
upgrade Upgrade Cluster
exec executes commands on one or more machines in the cluster
display Get cluster information
list Get cluster list
audit View cluster operation log
import Import a cluster deployed by TiDB-Ansible
edit-config Editing the configuration of TiDB clusters
reload for overriding cluster configurations when necessary
patch replaces deployed components on its cluster with temporary component packages
help Print Help Information
Flags:
-h, -help Help Information
--ssh-timeout int SSH connection timeout
-y, --yes Skip all confirmation steps.
```
## Deployment cluster
The command used for deploying clusters is tiup cluster deploy, and its general usage is.
```bash
tiup cluster deploy <cluster-name> <version> <topology.yaml> [flags]
```
This command requires us to provide the name of the cluster, the version of TiDB used by the cluster, and a topology file for the cluster, which can be written with reference to [example](/examples/topology.example.yaml). Take a simplest topology as an example:
```yaml
---
pd_servers:
- host: 172.16.5.134
name: pd-134
- host: 172.16.5.139
name: pd-139
- host: 172.16.5.140
name: pd-140
tidb_servers:
- host: 172.16.5.134
- host: 172.16.5.139
- host: 172.16.5.140
tikv_servers:
- host: 172.16.5.134
- host: 172.16.5.139
- host: 172.16.5.140
grafana_servers:
- host: 172.16.5.134
monitoring_servers:
- host: 172.16.5.134
```
Save the file as `/tmp/topology.yaml`. If we want to use TiDB's v4.0.0-rc version with the cluster name prod-cluster, run:
```shell
tiup cluster deploy prod-cluster v3.0.12 /tmp/topology.yaml
```
During execution, the topology is reconfirmed and prompted for the root password on the target machine.
```bash
Please confirm your topology:
TiDB Cluster: prod-cluster
TiDB Version: v3.0.12
Type Host Ports Directories
---- ---- ----- -----------
pd 172.16.5.134 2379/2380 deploy/pd-2379,data/pd-2379
pd 172.16.5.139 2379/2380 deploy/pd-2379,data/pd-2379
pd 172.16.5.140 2379/2380 deploy/pd-2379,data/pd-2379
tikv 172.16.5.134 20160/20180 deploy/tikv-20160,data/tikv-20160
tikv 172.16.5.139 20160/20180 deploy/tikv-20160,data/tikv-20160
tikv 172.16.5.140 20160/20180 deploy/tikv-20160,data/tikv-20160
tidb 172.16.5.134 4000/10080 deploy/tidb-4000
tidb 172.16.5.139 4000/10080 deploy/tidb-4000
tidb 172.16.5.140 4000/10080 deploy/tidb-4000
prometheus 172.16.5.134 9090 deploy/prometheus-9090,data/prometheus-9090
grafana 172.16.5.134 3000 deploy/grafana-3000
Attention:
1. If the topology is not what you expected, check your yaml file.
1. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]:
```
After entering the password, the tiup-cluster will download the required components and deploy them to the corresponding machine, indicating a successful deployment when you see the following prompt:
```bash
Deployed cluster `prod-cluster` successfully
```
## View cluster list
Once the cluster is deployed we will be able to see it in the cluster list via the tiup cluster list:
```bash
[user@localhost ~]# tiup cluster list
Starting /root/.tiup/components/cluster/v0.4.5/cluster list
Name User Version Path PrivateKey
---- ---- ------- ---- ----------
prod-cluster tidb v3.0.12 /root/.tiup/storage/cluster/clusters/prod-cluster /root/.tiup/storage/cluster/clusters/prod-cluster/ssh/id_rsa
```
## Start the cluster.
If you have forgotten the name of the cluster you have deployed, you can use the tiup cluster list to see the command to start the cluster:
```shell
tiup cluster start prod-cluster
```
## Checking cluster status
We often want to know the operating status of each component in a cluster, and it's obviously inefficient to look at it from machine to machine, so it's time for the tiup cluster display, which is used as follows:
```bash
[user@localhost ~]# tiup cluster display prod-cluster
Starting /root/.tiup/components/cluster/v0.4.5/cluster display prod-cluster
TiDB Cluster: prod-cluster
TiDB Version: v3.0.12
ID Role Host Ports Status Data Dir Deploy Dir
-- ---- ---- ----- ------ -------- ----------
172.16.5.134:3000 grafana 172.16.5.134 3000 Up - deploy/grafana-3000
172.16.5.134:2379 pd 172.16.5.134 2379/2380 Healthy|L data/pd-2379 deploy/pd-2379
172.16.5.139:2379 pd 172.16.5.139 2379/2380 Healthy data/pd-2379 deploy/pd-2379
172.16.5.140:2379 pd 172.16.5.140 2379/2380 Healthy data/pd-2379 deploy/pd-2379
172.16.5.134:9090 prometheus 172.16.5.134 9090 Up data/prometheus-9090 deploy/prometheus-9090
172.16.5.134:4000 tidb 172.16.5.134 4000/10080 Up - deploy/tidb-4000
172.16.5.139:4000 tidb 172.16.5.139 4000/10080 Up - deploy/tidb-4000
172.16.5.140:4000 tidb 172.16.5.140 4000/10080 Up - deploy/tidb-4000
172.16.5.134:20160 tikv 172.16.5.134 20160/20180 Up data/tikv-20160 deploy/tikv-20160
172.16.5.139:20160 tikv 172.16.5.139 20160/20180 Up data/tikv-20160 deploy/tikv-20160
172.16.5.140:20160 tikv 172.16.5.140 20160/20180 Up data/tikv-20160 deploy/tikv-20160
```
For normal components, the Status column will show "Up" or "Down" to indicate whether the service is normal or not, and for PD, the Status column will show Healthy or Down, and may have a |L to indicate that the PD is Leader.
## Condensation
Sometimes the business volume decreases and the cluster takes up some of the original resources, so we want to safely release some nodes and reduce the cluster size, so we need to downsize. The reduction is offline service, which eventually removes the specified node from the cluster and deletes the associated data files left behind. Since the downlinking of TiKV and Binlog components is asynchronous (requires removal through the API) and the downlinking process is time-consuming (requires constant observation to see if the node has been downlinked successfully), special treatment has been given to TiKV and Binglog components:
- Operation of TiKV and Binlog components
- TiUP cluster exits directly after it is offline via API without waiting for the offline to complete
- When you wait until later, you will check for the presence of TiKV or Binlog nodes that have already been downlinked when you execute commands related to cluster operations. If it does not exist, the specified operation continues; if it does, the following operation is performed.
- Stopping the service of nodes that have been downlinked
- Clean up the data files associated with nodes that have been taken offline
- Update the topology of the cluster and remove nodes that have been dropped
- Operation of other components
- The downlink of the PD component removes the specified node from the cluster via the API (a quick process), then disables the service of the specified PD and clears the data file associated with that node
- Directly stop and clear the data files associated with the node when other components are downlinked
Basic usage of the condensation command:
```bash
tiup cluster-scale-in <cluster-name> -N <node-id>
````
It needs to specify at least two parameters, one is the cluster name and the other is the node ID, which can be obtained using the tiup cluster display command with reference to the previous section. For example, I want to kill the TiKV on 172.16.5.140, so I can execute:
```bash
[user@localhost ~]# tiup cluster display prod-cluster
Starting /root/.tiup/components/cluster/v0.4.5/cluster display prod-cluster
TiDB Cluster: prod-cluster
TiDB Version: v3.0.12
ID Role Host Ports Status Data Dir Deploy Dir
-- ---- ---- ----- ------ -------- ----------
172.16.5.134:3000 grafana 172.16.5.134 3000 Up - deploy/grafana-3000
172.16.5.134:2379 pd 172.16.5.134 2379/2380 Healthy|L data/pd-2379 deploy/pd-2379
172.16.5.139:2379 pd 172.16.5.139 2379/2380 Healthy data/pd-2379 deploy/pd-2379
172.16.5.140:2379 pd 172.16.5.140 2379/2380 Healthy data/pd-2379 deploy/pd-2379
172.16.5.134:9090 prometheus 172.16.5.134 9090 Up data/prometheus-9090 deploy/prometheus-9090
172.16.5.134:4000 tidb 172.16.5.134 4000/10080 Up - deploy/tidb-4000
172.16.5.139:4000 tidb 172.16.5.139 4000/10080 Up - deploy/tidb-4000
172.16.5.140:4000 tidb 172.16.5.140 4000/10080 Up - deploy/tidb-4000
172.16.5.134:20160 tikv 172.16.5.134 20160/20180 Up data/tikv-20160 deploy/tikv-20160
172.16.5.139:20160 tikv 172.16.5.139 20160/20180 Up data/tikv-20160 deploy/tikv-20160
172.16.5.140:20160 tikv 172.16.5.140 20160/20180 Offline data/tikv-20160 deploy/tikv-20160
```
The node is automatically deleted after the PD schedules its data to other TiKVs.
## Expansion.
The internal logic of scaling is similar to deployment in that the TiUP cluster first guarantees the SSH connection of the node, creates the necessary directory on the target node, then executes the deployment and starts the service. The PD node's expansion is added to the cluster by join, and the configuration of the services associated with the PD is updated; other services are added directly to the cluster. All services do correctness validation at the time of expansion and eventually return whether the expansion was successful.
For example, expanding a TiKV node and a PD node in a cluster tidb-test:
### 1. New scale.yaml file, add TiKV and PD node IP
> **Note**
>
> Note that a new topology file is created that writes only the description of the expanded node, not the existing node.
```yaml
---
pd_servers:
- ip: 172.16.5.140
tikv_servers:
- ip: 172.16.5.140
````
### 2. Perform capacity expansion operations
TiUP cluster add the corresponding node to the cluster according to the information such as port, directory, etc. declared in the scale.yaml file:
```shell
tiup cluster scale-out tidb-test scale.yaml
````
After execution, you can check the expanded cluster status with the `tiup cluster display tidb-test` command.
## Rolling upgrade
The rolling upgrade feature leverages TiDB's distributed capabilities to keep the upgrade process as transparent and non-aware of the front-end business as possible. If there is a problem with the configuration, the tool will be upgraded node by node. Which has different operations for different nodes.
### The operation of different nodes
- Upgrade PD
- Prioritize upgrading non-Leader nodes
- Upgrade all non-Leader nodes after the upgrade is complete.
- The tool sends a command to the PD to migrate the Leader to the node where the upgrade is complete
- When Leader has been switched to another node, upgrade the old Leader node.
- At the same time, if there is an unhealthy node in the upgrade process, the tool will suspend the upgrade and exit, at this time, the manual judgment, repair and then perform the upgrade.
- Upgrade TiKV
- First add a migration to the PD that corresponds to the scheduling of the region leader on TiKV, and ensure that the upgrade process does not affect the front-end business by migrating the leader
- Wait for the migration leader to complete before updating the TiKV node
- Wait for the updated TiKV to start normally before removing the migration leader's scheduling.
- Upgrade other services
- Normal out-of-service updates
### Upgrade operation
The upgrade command parameters are as follows:
```bash''
Usage:
tiup cluster upgrade <cluster-name> <version> [flags]
Flags:
--force forces escalation without transfer leader (dangerous operation)
-h, --help help manual
--transfer-timeout int transfer leader's timeout
Global Flags:
--ssh-timeout int SSH connection timeout
-y, --yes Skip all confirmation steps.
````
For example, to upgrade a cluster to v4.0.0-rc, you need only one command:
```bash
$ tiup cluster upgrade tidb-test v4.0.0-rc
````
## Update configuration
Sometimes we want to dynamically update the configuration of a component, tiup-cluster saves a copy of the current configuration for each cluster, and if we want to edit this configuration, we execute `tiup cluster edit-config <cluster-name>`, for example:
```bash
tiup cluster edit-config prod-cluster
````
The tiup-cluster then uses vi to open the configuration file for editing and save it after editing. The configuration is not applied to the cluster at this point, and if you want it to take effect, you need to execute:
```bash
tiup cluster reload prod-cluster
````
This action sends the configuration to the target machine, restarts the cluster, and makes the configuration effective.
## Update components
Regular upgrade clusters can use the upgrade command, but in some scenarios (e.g. Debug) it may be necessary to replace a running component with a temporary package, in which case you can use the patch command
```bash
[user@localhost ~]# tiup cluster patch --help
Replace the remote package with a specified package and restart the service
Usage:
tiup cluster patch <cluster-name> <package-path> [flags]
Flags:
-h, --help Help Information
-N, --node strings specify the node to be replaced
--overwrite uses the currently specified temporary package in future scale-out operations
-R, -role strings Specify the type of service to be replaced
--transfer-timeout int transfer leader's timeout
Global Flags:
--ssh-timeout int SSH connection timeout
-y, --yes Skip all confirmation steps
```
For example, if there is a TiDB hotfix package in /tmp/tidb-hotfix.tar.gz, and we want to replace all TiDBs on the cluster, we can:
```bash
tiup cluster patch test-cluster /tmp/tidb-hotfix.tar.gz -R tidb
```
Or just replace one of the TiDBs:
```
tiup cluster patch test-cluster /tmp/tidb-hotfix.tar.gz -N 172.16.4.5:4000
```
## Importing TiDB-Ansible clusters
Before TiUP, clusters were generally deployed using TiDB-Ansible, and the import command was used to transition this part of the cluster to TiUP receivership.
Use of the import command.
```bash
[user@localhost ~]# tiup cluster import --help
Import an existing TiDB cluster from TiDB-Ansible
Usage:
tiup cluster import [flags]
Flags:
-d, --dir string TiDB-Ansible's directory, default is current directory
-h, -help import help information
--inventory string inventory file name (default is "event.ini")
--no-backup does not backup Ansible directories, for Ansible directories with multiple inventory files
-r, --rename NAME Rename the imported cluster
Global Flags:
--ssh-timeout int SSH connection timeout
-y, --yes Skip all confirmation steps
```
Example: Importing a cluster:
```bash
cd tidb-ansible
tiup cluster import
```
perhaps
```bash
tiup cluster import --dir=/path/to/tidb-ansible
```
|