1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471
|
=====================
Administrator's Guide
=====================
-------------------------
Defining Storage Policies
-------------------------
Defining your Storage Policies is very easy to do with Swift. It is important
that the administrator understand the concepts behind Storage Policies
before actually creating and using them in order to get the most benefit out
of the feature and, more importantly, to avoid having to make unnecessary changes
once a set of policies have been deployed to a cluster.
It is highly recommended that the reader fully read and comprehend
:doc:`overview_policies` before proceeding with administration of
policies. Plan carefully and it is suggested that experimentation be
done first on a non-production cluster to be certain that the desired
configuration meets the needs of the users. See :ref:`upgrade-policy`
before planning the upgrade of your existing deployment.
Following is a high level view of the very few steps it takes to configure
policies once you have decided what you want to do:
#. Define your policies in ``/etc/swift/swift.conf``
#. Create the corresponding object rings
#. Communicate the names of the Storage Policies to cluster users
For a specific example that takes you through these steps, please see
:doc:`policies_saio`
------------------
Managing the Rings
------------------
You may build the storage rings on any server with the appropriate
version of Swift installed. Once built or changed (rebalanced), you
must distribute the rings to all the servers in the cluster. Storage
rings contain information about all the Swift storage partitions and
how they are distributed between the different nodes and disks.
Swift 1.6.0 is the last version to use a Python pickle format.
Subsequent versions use a different serialization format. **Rings
generated by Swift versions 1.6.0 and earlier may be read by any
version, but rings generated after 1.6.0 may only be read by Swift
versions greater than 1.6.0.** So when upgrading from version 1.6.0 or
earlier to a version greater than 1.6.0, either upgrade Swift on your
ring building server **last** after all Swift nodes have been successfully
upgraded, or refrain from generating rings until all Swift nodes have
been successfully upgraded.
If you need to downgrade from a version of swift greater than 1.6.0 to
a version less than or equal to 1.6.0, first downgrade your ring-building
server, generate new rings, push them out, then continue with the rest
of the downgrade.
For more information see :doc:`overview_ring`.
Removing a device from the ring::
swift-ring-builder <builder-file> remove <ip_address>/<device_name>
Removing a server from the ring::
swift-ring-builder <builder-file> remove <ip_address>
Adding devices to the ring:
See :ref:`ring-preparing`
See what devices for a server are in the ring::
swift-ring-builder <builder-file> search <ip_address>
Once you are done with all changes to the ring, the changes need to be
"committed"::
swift-ring-builder <builder-file> rebalance
Once the new rings are built, they should be pushed out to all the servers
in the cluster.
Optionally, if invoked as 'swift-ring-builder-safe' the directory containing
the specified builder file will be locked (via a .lock file in the parent
directory). This provides a basic safe guard against multiple instances
of the swift-ring-builder (or other utilities that observe this lock) from
attempting to write to or read the builder/ring files while operations are in
progress. This can be useful in environments where ring management has been
automated but the operator still needs to interact with the rings manually.
If the ring builder is not producing the balances that you are
expecting, you can gain visibility into what it's doing with the
``--debug`` flag.::
swift-ring-builder <builder-file> rebalance --debug
This produces a great deal of output that is mostly useful if you are
either (a) attempting to fix the ring builder, or (b) filing a bug
against the ring builder.
You may notice in the rebalance output a 'dispersion' number. What this
number means is explained in :ref:`ring_dispersion` but in essence
is the percentage of partitions in the ring that have too many replicas
within a particular failure domain. You can ask 'swift-ring-builder' what
the dispersion is with::
swift-ring-builder <builder-file> dispersion
This will give you the percentage again, if you want a detailed view of
the dispersion simply add a ``--verbose``::
swift-ring-builder <builder-file> dispersion --verbose
This will not only display the percentage but will also display a dispersion
table that lists partition dispersion by tier. You can use this table to figure
out were you need to add capacity or to help tune an :ref:`ring_overload` value.
Now let's take an example with 1 region, 3 zones and 4 devices. Each device has
the same weight, and the ``dispersion --verbose`` might show the following::
Dispersion is 50.000000, Balance is 0.000000, Overload is 0.00%
Required overload is 33.333333%
Worst tier is 50.000000 (r1z3)
--------------------------------------------------------------------------
Tier Parts % Max 0 1 2 3
--------------------------------------------------------------------------
r1 256 0.00 3 0 0 0 256
r1z1 192 0.00 1 64 192 0 0
r1z1-127.0.0.1 192 0.00 1 64 192 0 0
r1z1-127.0.0.1/sda 192 0.00 1 64 192 0 0
r1z2 192 0.00 1 64 192 0 0
r1z2-127.0.0.2 192 0.00 1 64 192 0 0
r1z2-127.0.0.2/sda 192 0.00 1 64 192 0 0
r1z3 256 50.00 1 0 128 128 0
r1z3-127.0.0.3 256 50.00 1 0 128 128 0
r1z3-127.0.0.3/sda 192 0.00 1 64 192 0 0
r1z3-127.0.0.3/sdb 192 0.00 1 64 192 0 0
The first line reports that there are 256 partitions with 3 copies in region 1;
and this is an expected output in this case (single region with 3 replicas) as
reported by the "Max" value.
However, there is some inbalance in the cluster, more precisely in zone 3. The
"Max" reports a maximum of 1 copy in this zone; however 50.00% of the partitions
are storing 2 replicas in this zone (which is somewhat expected, because there
are more disks in this zone).
You can now either add more capacity to the other zones, decrease the total
weight in zone 3 or set the overload to a value `greater than` 33.333333% -
only as much overload as needed will be used.
-----------------------
Scripting Ring Creation
-----------------------
You can create scripts to create the account and container rings and rebalance. Here's an example script for the Account ring. Use similar commands to create a make-container-ring.sh script on the proxy server node.
1. Create a script file called make-account-ring.sh on the proxy
server node with the following content::
#!/bin/bash
cd /etc/swift
rm -f account.builder account.ring.gz backups/account.builder backups/account.ring.gz
swift-ring-builder account.builder create 18 3 1
swift-ring-builder account.builder add r1z1-<account-server-1>:6202/sdb1 1
swift-ring-builder account.builder add r1z2-<account-server-2>:6202/sdb1 1
swift-ring-builder account.builder rebalance
You need to replace the values of <account-server-1>,
<account-server-2>, etc. with the IP addresses of the account
servers used in your setup. You can have as many account servers as
you need. All account servers are assumed to be listening on port
6202, and have a storage device called "sdb1" (this is a directory
name created under /drives when we setup the account server). The
"z1", "z2", etc. designate zones, and you can choose whether you
put devices in the same or different zones. The "r1" designates
the region, with different regions specified as "r1", "r2", etc.
2. Make the script file executable and run it to create the account ring file::
chmod +x make-account-ring.sh
sudo ./make-account-ring.sh
3. Copy the resulting ring file /etc/swift/account.ring.gz to all the
account server nodes in your Swift environment, and put them in the
/etc/swift directory on these nodes. Make sure that every time you
change the account ring configuration, you copy the resulting ring
file to all the account nodes.
-----------------------
Handling System Updates
-----------------------
It is recommended that system updates and reboots are done a zone at a time.
This allows the update to happen, and for the Swift cluster to stay available
and responsive to requests. It is also advisable when updating a zone, let
it run for a while before updating the other zones to make sure the update
doesn't have any adverse effects.
----------------------
Handling Drive Failure
----------------------
In the event that a drive has failed, the first step is to make sure the drive
is unmounted. This will make it easier for swift to work around the failure
until it has been resolved. If the drive is going to be replaced immediately,
then it is just best to replace the drive, format it, remount it, and let
replication fill it up.
After the drive is unmounted, make sure the mount point is owned by root
(root:root 755). This ensures that rsync will not try to replicate into the
root drive once the failed drive is unmounted.
If the drive can't be replaced immediately, then it is best to leave it
unmounted, and set the device weight to 0. This will allow all the
replicas that were on that drive to be replicated elsewhere until the drive
is replaced. Once the drive is replaced, the device weight can be increased
again. Setting the device weight to 0 instead of removing the drive from the
ring gives Swift the chance to replicate data from the failing disk too (in case
it is still possible to read some of the data).
Setting the device weight to 0 (or removing a failed drive from the ring) has
another benefit: all partitions that were stored on the failed drive are
distributed over the remaining disks in the cluster, and each disk only needs to
store a few new partitions. This is much faster compared to replicating all
partitions to a single, new disk. It decreases the time to recover from a
degraded number of replicas significantly, and becomes more and more important
with bigger disks.
-----------------------
Handling Server Failure
-----------------------
If a server is having hardware issues, it is a good idea to make sure the
swift services are not running. This will allow Swift to work around the
failure while you troubleshoot.
If the server just needs a reboot, or a small amount of work that should
only last a couple of hours, then it is probably best to let Swift work
around the failure and get the machine fixed and back online. When the
machine comes back online, replication will make sure that anything that is
missing during the downtime will get updated.
If the server has more serious issues, then it is probably best to remove
all of the server's devices from the ring. Once the server has been repaired
and is back online, the server's devices can be added back into the ring.
It is important that the devices are reformatted before putting them back
into the ring as it is likely to be responsible for a different set of
partitions than before.
-----------------------
Detecting Failed Drives
-----------------------
It has been our experience that when a drive is about to fail, error messages
will spew into `/var/log/kern.log`. There is a script called
`swift-drive-audit` that can be run via cron to watch for bad drives. If
errors are detected, it will unmount the bad drive, so that Swift can
work around it. The script takes a configuration file with the following
settings:
[drive-audit]
================== ============== ===========================================
Option Default Description
------------------ -------------- -------------------------------------------
user swift Drop privileges to this user for non-root
tasks
log_facility LOG_LOCAL0 Syslog log facility
log_level INFO Log level
device_dir /srv/node Directory devices are mounted under
minutes 60 Number of minutes to look back in
`/var/log/kern.log`
error_limit 1 Number of errors to find before a device
is unmounted
log_file_pattern /var/log/kern* Location of the log file with globbing
pattern to check against device errors
regex_pattern_X (see below) Regular expression patterns to be used to
locate device blocks with errors in the
log file
================== ============== ===========================================
The default regex pattern used to locate device blocks with errors are
`\berror\b.*\b(sd[a-z]{1,2}\d?)\b` and `\b(sd[a-z]{1,2}\d?)\b.*\berror\b`.
One is able to overwrite the default above by providing new expressions
using the format `regex_pattern_X = regex_expression`, where `X` is a number.
This script has been tested on Ubuntu 10.04 and Ubuntu 12.04, so if you are
using a different distro or OS, some care should be taken before using in production.
------------------------------
Preventing Disk Full Scenarios
------------------------------
Prevent disk full scenarios by ensuring that the ``proxy-server`` blocks PUT
requests and rsync prevents replication to the specific drives.
You can prevent `proxy-server` PUT requests to low space disks by ensuring
``fallocate_reserve`` is set in the ``object-server.conf``. By default,
``fallocate_reserve`` is set to 1%. This blocks PUT requests that leave the
free disk space below 1% of the disk.
In order to prevent rsync replication to specific drives, firstly
setup ``rsync_module`` per disk in your ``object-replicator``.
Set this in ``object-server.conf``:
.. code::
[object-replicator]
rsync_module = {replication_ip}::object_{device}
Set the individual drives in ``rsync.conf``. For example:
.. code::
[object_sda]
max connections = 4
lock file = /var/lock/object_sda.lock
[object_sdb]
max connections = 4
lock file = /var/lock/object_sdb.lock
Finally, monitor the disk space of each disk and adjust the rsync
``max connections`` per drive to ``-1``. We recommend utilising your existing
monitoring solution to achieve this. The following is an example script:
.. code-block:: python
#!/usr/bin/env python
import os
import errno
RESERVE = 500 * 2 ** 20 # 500 MiB
DEVICES = '/srv/node1'
path_template = '/etc/rsync.d/disable_%s.conf'
config_template = '''
[object_%s]
max connections = -1
'''
def disable_rsync(device):
with open(path_template % device, 'w') as f:
f.write(config_template.lstrip() % device)
def enable_rsync(device):
try:
os.unlink(path_template % device)
except OSError as e:
# ignore file does not exist
if e.errno != errno.ENOENT:
raise
for device in os.listdir(DEVICES):
path = os.path.join(DEVICES, device)
st = os.statvfs(path)
free = st.f_bavail * st.f_frsize
if free < RESERVE:
disable_rsync(device)
else:
enable_rsync(device)
For the above script to work, ensure ``/etc/rsync.d/`` conf files are
included, by specifying ``&include`` in your ``rsync.conf`` file:
.. code::
&include /etc/rsync.d
Use this in conjunction with a cron job to periodically run the script, for example:
.. code::
# /etc/cron.d/devicecheck
* * * * * root /some/path/to/disable_rsync.py
.. _dispersion_report:
-----------------
Dispersion Report
-----------------
There is a swift-dispersion-report tool for measuring overall cluster health.
This is accomplished by checking if a set of deliberately distributed
containers and objects are currently in their proper places within the cluster.
For instance, a common deployment has three replicas of each object. The health
of that object can be measured by checking if each replica is in its proper
place. If only 2 of the 3 is in place the object's heath can be said to be at
66.66%, where 100% would be perfect.
A single object's health, especially an older object, usually reflects the
health of that entire partition the object is in. If we make enough objects on
a distinct percentage of the partitions in the cluster, we can get a pretty
valid estimate of the overall cluster health. In practice, about 1% partition
coverage seems to balance well between accuracy and the amount of time it takes
to gather results.
The first thing that needs to be done to provide this health value is create a
new account solely for this usage. Next, we need to place the containers and
objects throughout the system so that they are on distinct partitions. The
swift-dispersion-populate tool does this by making up random container and
object names until they fall on distinct partitions. Last, and repeatedly for
the life of the cluster, we need to run the swift-dispersion-report tool to
check the health of each of these containers and objects.
These tools need direct access to the entire cluster and to the ring files
(installing them on a proxy server will probably do). Both
swift-dispersion-populate and swift-dispersion-report use the same
configuration file, /etc/swift/dispersion.conf. Example conf file::
[dispersion]
auth_url = http://localhost:8080/auth/v1.0
auth_user = test:tester
auth_key = testing
endpoint_type = internalURL
There are also options for the conf file for specifying the dispersion coverage
(defaults to 1%), retries, concurrency, etc. though usually the defaults are
fine. If you want to use keystone v3 for authentication there are options like
auth_version, user_domain_name, project_domain_name and project_name.
Once the configuration is in place, run `swift-dispersion-populate` to populate
the containers and objects throughout the cluster.
Now that those containers and objects are in place, you can run
`swift-dispersion-report` to get a dispersion report, or the overall health of
the cluster. Here is an example of a cluster in perfect health::
$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 19s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space
Queried 2619 objects for dispersion reporting, 7s, 0 retries
100.00% of object copies found (7857 of 7857)
Sample represents 1.00% of the object partition space
Now I'll deliberately double the weight of a device in the object ring (with
replication turned off) and rerun the dispersion report to show what impact
that has::
$ swift-ring-builder object.builder set_weight d0 200
$ swift-ring-builder object.builder rebalance
...
$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 8s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space
Queried 2619 objects for dispersion reporting, 7s, 0 retries
There were 1763 partitions missing one copy.
77.56% of object copies found (6094 of 7857)
Sample represents 1.00% of the object partition space
You can see the health of the objects in the cluster has gone down
significantly. Of course, I only have four devices in this test environment, in
a production environment with many many devices the impact of one device change
is much less. Next, I'll run the replicators to get everything put back into
place and then rerun the dispersion report::
... start object replicators and monitor logs until they're caught up ...
$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 17s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space
Queried 2619 objects for dispersion reporting, 7s, 0 retries
100.00% of object copies found (7857 of 7857)
Sample represents 1.00% of the object partition space
You can also run the report for only containers or objects::
$ swift-dispersion-report --container-only
Queried 2621 containers for dispersion reporting, 17s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space
$ swift-dispersion-report --object-only
Queried 2619 objects for dispersion reporting, 7s, 0 retries
100.00% of object copies found (7857 of 7857)
Sample represents 1.00% of the object partition space
Alternatively, the dispersion report can also be output in json format. This
allows it to be more easily consumed by third party utilities::
$ swift-dispersion-report -j
{"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container": {"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected": 12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}
Note that you may select which storage policy to use by setting the option
'--policy-name silver' or '-P silver' (silver is the example policy name here).
If no policy is specified, the default will be used per the swift.conf file.
When you specify a policy the containers created also include the policy index,
thus even when running a container_only report, you will need to specify the
policy not using the default.
-----------------------------------------------
Geographically Distributed Swift Considerations
-----------------------------------------------
Swift provides two features that may be used to distribute replicas of objects
across multiple geographically distributed data-centers: with
:doc:`overview_global_cluster` object replicas may be dispersed across devices
from different data-centers by using `regions` in ring device descriptors; with
:doc:`overview_container_sync` objects may be copied between independent Swift
clusters in each data-center. The operation and configuration of each are
described in their respective documentation. The following points should be
considered when selecting the feature that is most appropriate for a particular
use case:
#. Global Clusters allows the distribution of object replicas across
data-centers to be controlled by the cluster operator on per-policy basis,
since the distribution is determined by the assignment of devices from
each data-center in each policy's ring file. With Container Sync the end
user controls the distribution of objects across clusters on a
per-container basis.
#. Global Clusters requires an operator to coordinate ring deployments across
multiple data-centers. Container Sync allows for independent management of
separate Swift clusters in each data-center, and for existing Swift
clusters to be used as peers in Container Sync relationships without
deploying new policies/rings.
#. Global Clusters seamlessly supports features that may rely on
cross-container operations such as large objects and versioned writes.
Container Sync requires the end user to ensure that all required
containers are sync'd for these features to work in all data-centers.
#. Global Clusters makes objects available for GET or HEAD requests in both
data-centers even if a replica of the object has not yet been
asynchronously migrated between data-centers, by forwarding requests
between data-centers. Container Sync is unable to serve requests for an
object in a particular data-center until the asynchronous sync process has
copied the object to that data-center.
#. Global Clusters may require less storage capacity than Container Sync to
achieve equivalent durability of objects in each data-center. Global
Clusters can restore replicas that are lost or corrupted in one
data-center using replicas from other data-centers. Container Sync
requires each data-center to independently manage the durability of
objects, which may result in each data-center storing more replicas than
with Global Clusters.
#. Global Clusters execute all account/container metadata updates
synchronously to account/container replicas in all data-centers, which may
incur delays when making updates across WANs. Container Sync only copies
objects between data-centers and all Swift internal traffic is
confined to each data-center.
#. Global Clusters does not yet guarantee the availability of objects stored
in Erasure Coded policies when one data-center is offline. With Container
Sync the availability of objects in each data-center is independent of the
state of other data-centers once objects have been synced. Container Sync
also allows objects to be stored using different policy types in different
data-centers.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checking handoff partition distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can check if handoff partitions are piling up on a server by
comparing the expected number of partitions with the actual number on
your disks. First get the number of partitions that are currently
assigned to a server using the ``dispersion`` command from
``swift-ring-builder``::
swift-ring-builder sample.builder dispersion --verbose
Dispersion is 0.000000, Balance is 0.000000, Overload is 0.00%
Required overload is 0.000000%
--------------------------------------------------------------------------
Tier Parts % Max 0 1 2 3
--------------------------------------------------------------------------
r1 8192 0.00 2 0 0 8192 0
r1z1 4096 0.00 1 4096 4096 0 0
r1z1-172.16.10.1 4096 0.00 1 4096 4096 0 0
r1z1-172.16.10.1/sda1 4096 0.00 1 4096 4096 0 0
r1z2 4096 0.00 1 4096 4096 0 0
r1z2-172.16.10.2 4096 0.00 1 4096 4096 0 0
r1z2-172.16.10.2/sda1 4096 0.00 1 4096 4096 0 0
r1z3 4096 0.00 1 4096 4096 0 0
r1z3-172.16.10.3 4096 0.00 1 4096 4096 0 0
r1z3-172.16.10.3/sda1 4096 0.00 1 4096 4096 0 0
r1z4 4096 0.00 1 4096 4096 0 0
r1z4-172.16.20.4 4096 0.00 1 4096 4096 0 0
r1z4-172.16.20.4/sda1 4096 0.00 1 4096 4096 0 0
r2 8192 0.00 2 0 8192 0 0
r2z1 4096 0.00 1 4096 4096 0 0
r2z1-172.16.20.1 4096 0.00 1 4096 4096 0 0
r2z1-172.16.20.1/sda1 4096 0.00 1 4096 4096 0 0
r2z2 4096 0.00 1 4096 4096 0 0
r2z2-172.16.20.2 4096 0.00 1 4096 4096 0 0
r2z2-172.16.20.2/sda1 4096 0.00 1 4096 4096 0 0
As you can see from the output, each server should store 4096 partitions, and
each region should store 8192 partitions. This example used a partition power
of 13 and 3 replicas.
With write_affinity enabled it is expected to have a higher number of
partitions on disk compared to the value reported by the
swift-ring-builder dispersion command. The number of additional (handoff)
partitions in region r1 depends on your cluster size, the amount
of incoming data as well as the replication speed.
Let's use the example from above with 6 nodes in 2 regions, and write_affinity
configured to write to region r1 first. `swift-ring-builder` reported that
each node should store 4096 partitions::
Expected partitions for region r2: 8192
Handoffs stored across 4 nodes in region r1: 8192 / 4 =Â 2048
Maximum number of partitions on each server in region r1: 2048 + 4096 = 6144
Worst case is that handoff partitions in region 1 are populated with new
object replicas faster than replication is able to move them to region 2.
In that case you will see ~ 6144 partitions per
server in region r1. Your actual number should be lower and
between 4096 and 6144 partitions (preferably on the lower side).
Now count the number of object partitions on a given server in region 1,
for example on 172.16.10.1. Note that the pathnames might be
different; `/srv/node/` is the default mount location, and `objects`
applies only to storage policy 0 (storage policy 1 would use
`objects-1` and so on)::
find -L /srv/node/ -maxdepth 3 -type d -wholename "*objects/*" | wc -l
If this number is always on the upper end of the expected partition
number range (4096 to 6144) or increasing you should check your
replication speed and maybe even disable write_affinity.
Please refer to the next section how to collect metrics from Swift, and
especially :ref:`swift-recon -r <recon-replication>` how to check replication
stats.
--------------------------------
Cluster Telemetry and Monitoring
--------------------------------
Various metrics and telemetry can be obtained from the account, container, and
object servers using the recon server middleware and the swift-recon cli. To do
so update your account, container, or object servers pipelines to include recon
and add the associated filter config.
object-server.conf sample::
[pipeline:main]
pipeline = recon object-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
container-server.conf sample::
[pipeline:main]
pipeline = recon container-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
account-server.conf sample::
[pipeline:main]
pipeline = recon account-server
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
The recon_cache_path simply sets the directory where stats for a few items will
be stored. Depending on the method of deployment you may need to create this
directory manually and ensure that swift has read/write access.
Finally, if you also wish to track asynchronous pending on your object
servers you will need to setup a cronjob to run the swift-recon-cron script
periodically on your object servers::
*/5 * * * * swift /usr/bin/swift-recon-cron /etc/swift/object-server.conf
Once the recon middleware is enabled, a GET request for
"/recon/<metric>" to the backend object server will return a
JSON-formatted response::
fhines@ubuntu:~$ curl -i http://localhost:6030/recon/async
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 20
Date: Tue, 18 Oct 2011 21:03:01 GMT
{"async_pending": 0}
Note that the default port for the object server is 6200, except on a
Swift All-In-One installation, which uses 6010, 6020, 6030, and 6040.
The following metrics and telemetry are currently exposed:
========================= ========================================================================================
Request URI Description
------------------------- ----------------------------------------------------------------------------------------
/recon/load returns 1,5, and 15 minute load average
/recon/mem returns /proc/meminfo
/recon/mounted returns *ALL* currently mounted filesystems
/recon/unmounted returns all unmounted drives if mount_check = True
/recon/diskusage returns disk utilization for storage devices
/recon/ringmd5 returns object/container/account ring md5sums
/recon/quarantined returns # of quarantined objects/accounts/containers
/recon/sockstat returns consumable info from /proc/net/sockstat|6
/recon/devices returns list of devices and devices dir i.e. /srv/node
/recon/async returns count of async pending
/recon/replication returns object replication info (for backward compatibility)
/recon/replication/<type> returns replication info for given type (account, container, object)
/recon/auditor/<type> returns auditor stats on last reported scan for given type (account, container, object)
/recon/updater/<type> returns last updater sweep times for given type (container, object)
========================= ========================================================================================
Note that 'object_replication_last' and 'object_replication_time' in object
replication info are considered to be transitional and will be removed in
the subsequent releases. Use 'replication_last' and 'replication_time' instead.
This information can also be queried via the swift-recon command line utility::
fhines@ubuntu:~$ swift-recon -h
Usage:
usage: swift-recon <server_type> [-v] [--suppress] [-a] [-r] [-u] [-d]
[-l] [-T] [--md5] [--auditor] [--updater] [--expirer] [--sockstat]
<server_type> account|container|object
Defaults to object server.
ex: swift-recon container -l --auditor
Options:
-h, --help show this help message and exit
-v, --verbose Print verbose info
--suppress Suppress most connection related errors
-a, --async Get async stats
-r, --replication Get replication stats
--auditor Get auditor stats
--updater Get updater stats
--expirer Get expirer stats
-u, --unmounted Check cluster for unmounted devices
-d, --diskusage Get disk usage stats
-l, --loadstats Get cluster load average stats
-q, --quarantined Get cluster quarantine stats
--md5 Get md5sum of servers ring and compare to local copy
--sockstat Get cluster socket usage stats
-T, --time Check time synchronization
--all Perform all checks. Equal to
-arudlqT --md5 --sockstat --auditor --updater
--expirer --driveaudit --validate-servers
-z ZONE, --zone=ZONE Only query servers in specified zone
-t SECONDS, --timeout=SECONDS
Time to wait for a response from a server
--swiftdir=SWIFTDIR Default = /etc/swift
.. _recon-replication:
For example, to obtain container replication info from all hosts in zone "3"::
fhines@ubuntu:~$ swift-recon container -r --zone 3
===============================================================================
--> Starting reconnaissance on 1 hosts
===============================================================================
[2012-04-02 02:45:48] Checking on replication
[failure] low: 0.000, high: 0.000, avg: 0.000, reported: 1
[success] low: 486.000, high: 486.000, avg: 486.000, reported: 1
[replication_time] low: 20.853, high: 20.853, avg: 20.853, reported: 1
[attempted] low: 243.000, high: 243.000, avg: 243.000, reported: 1
---------------------------
Reporting Metrics to StatsD
---------------------------
If you have a StatsD_ server running, Swift may be configured to send it
real-time operational metrics. To enable this, set the following
configuration entries (see the sample configuration files)::
log_statsd_host = localhost
log_statsd_port = 8125
log_statsd_default_sample_rate = 1.0
log_statsd_sample_rate_factor = 1.0
log_statsd_metric_prefix = [empty-string]
If `log_statsd_host` is not set, this feature is disabled. The default values
for the other settings are given above. The `log_statsd_host` can be a
hostname, an IPv4 address, or an IPv6 address (not surrounded with brackets, as
this is unnecessary since the port is specified separately). If a hostname
resolves to an IPv4 address, an IPv4 socket will be used to send StatsD UDP
packets, even if the hostname would also resolve to an IPv6 address.
.. _StatsD: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
.. _Graphite: http://graphite.wikidot.com/
.. _Ganglia: http://ganglia.sourceforge.net/
The sample rate is a real number between 0 and 1 which defines the
probability of sending a sample for any given event or timing measurement.
This sample rate is sent with each sample to StatsD and used to
multiply the value. For example, with a sample rate of 0.5, StatsD will
multiply that counter's value by 2 when flushing the metric to an upstream
monitoring system (Graphite_, Ganglia_, etc.).
Some relatively high-frequency metrics have a default sample rate less than
one. If you want to override the default sample rate for all metrics whose
default sample rate is not specified in the Swift source, you may set
`log_statsd_default_sample_rate` to a value less than one. This is NOT
recommended (see next paragraph). A better way to reduce StatsD load is to
adjust `log_statsd_sample_rate_factor` to a value less than one. The
`log_statsd_sample_rate_factor` is multiplied to any sample rate (either the
global default or one specified by the actual metric logging call in the Swift
source) prior to handling. In other words, this one tunable can lower the
frequency of all StatsD logging by a proportional amount.
To get the best data, start with the default `log_statsd_default_sample_rate`
and `log_statsd_sample_rate_factor` values of 1 and only lower
`log_statsd_sample_rate_factor` if needed. The
`log_statsd_default_sample_rate` should not be used and remains for backward
compatibility only.
The metric prefix will be prepended to every metric sent to the StatsD server
For example, with::
log_statsd_metric_prefix = proxy01
the metric `proxy-server.errors` would be sent to StatsD as
`proxy01.proxy-server.errors`. This is useful for differentiating different
servers when sending statistics to a central StatsD server. If you run a local
StatsD server per node, you could configure a per-node metrics prefix there and
leave `log_statsd_metric_prefix` blank.
Note that metrics reported to StatsD are counters or timing data (which are
sent in units of milliseconds). StatsD usually expands timing data out to min,
max, avg, count, and 90th percentile per timing metric, but the details of
this behavior will depend on the configuration of your StatsD server. Some
important "gauge" metrics may still need to be collected using another method.
For example, the `object-server.async_pendings` StatsD metric counts the generation
of async_pendings in real-time, but will not tell you the current number of
async_pending container updates on disk at any point in time.
Note also that the set of metrics collected, their names, and their semantics
are not locked down and will change over time.
Metrics for `account-auditor`:
========================== =========================================================
Metric Name Description
-------------------------- ---------------------------------------------------------
`account-auditor.errors` Count of audit runs (across all account databases) which
caught an Exception.
`account-auditor.passes` Count of individual account databases which passed audit.
`account-auditor.failures` Count of individual account databases which failed audit.
`account-auditor.timing` Timing data for individual account database audits.
========================== =========================================================
Metrics for `account-reaper`:
============================================== ====================================================
Metric Name Description
---------------------------------------------- ----------------------------------------------------
`account-reaper.errors` Count of devices failing the mount check.
`account-reaper.timing` Timing data for each reap_account() call.
`account-reaper.return_codes.X` Count of HTTP return codes from various operations
(e.g. object listing, container deletion, etc.). The
value for X is the first digit of the return code
(2 for 201, 4 for 404, etc.).
`account-reaper.containers_failures` Count of failures to delete a container.
`account-reaper.containers_deleted` Count of containers successfully deleted.
`account-reaper.containers_remaining` Count of containers which failed to delete with
zero successes.
`account-reaper.containers_possibly_remaining` Count of containers which failed to delete with
at least one success.
`account-reaper.objects_failures` Count of failures to delete an object.
`account-reaper.objects_deleted` Count of objects successfully deleted.
`account-reaper.objects_remaining` Count of objects which failed to delete with zero
successes.
`account-reaper.objects_possibly_remaining` Count of objects which failed to delete with at
least one success.
============================================== ====================================================
Metrics for `account-server` ("Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data):
======================================== =======================================================
Metric Name Description
---------------------------------------- -------------------------------------------------------
`account-server.DELETE.errors.timing` Timing data for each DELETE request resulting in an
error: bad request, not mounted, missing timestamp.
`account-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`account-server.PUT.errors.timing` Timing data for each PUT request resulting in an error:
bad request, not mounted, conflict, recently-deleted.
`account-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`account-server.HEAD.errors.timing` Timing data for each HEAD request resulting in an
error: bad request, not mounted.
`account-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`account-server.GET.errors.timing` Timing data for each GET request resulting in an
error: bad request, not mounted, bad delimiter,
account listing limit too high, bad accept header.
`account-server.GET.timing` Timing data for each GET request not resulting in
an error.
`account-server.REPLICATE.errors.timing` Timing data for each REPLICATE request resulting in an
error: bad request, not mounted.
`account-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`account-server.POST.errors.timing` Timing data for each POST request resulting in an
error: bad request, bad or missing timestamp, not
mounted.
`account-server.POST.timing` Timing data for each POST request not resulting in
an error.
======================================== =======================================================
Metrics for `account-replicator`:
===================================== ====================================================
Metric Name Description
------------------------------------- ----------------------------------------------------
`account-replicator.diffs` Count of syncs handled by sending differing rows.
`account-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`account-replicator.no_changes` Count of accounts found to be in sync.
`account-replicator.hashmatches` Count of accounts found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`account-replicator.rsyncs` Count of completely missing accounts which were sent
via rsync.
`account-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`account-replicator.attempts` Count of database replication attempts.
`account-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`account-replicator.removes.<device>` Count of databases on <device> deleted because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`account-replicator.successes` Count of replication attempts to an individual node
which were successful.
`account-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
===================================== ====================================================
Metrics for `container-auditor`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`container-auditor.errors` Incremented when an Exception is caught in an audit
pass (only once per pass, max).
`container-auditor.passes` Count of individual containers passing an audit.
`container-auditor.failures` Count of individual containers failing an audit.
`container-auditor.timing` Timing data for each container audit.
============================ ====================================================
Metrics for `container-replicator`:
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`container-replicator.diffs` Count of syncs handled by sending differing rows.
`container-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`container-replicator.no_changes` Count of containers found to be in sync.
`container-replicator.hashmatches` Count of containers found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`container-replicator.rsyncs` Count of completely missing containers where were sent
via rsync.
`container-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`container-replicator.attempts` Count of database replication attempts.
`container-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`container-replicator.removes.<device>` Count of databases deleted on <device> because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`container-replicator.successes` Count of replication attempts to an individual node
which were successful.
`container-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
======================================= ====================================================
Metrics for `container-server` ("Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data):
========================================== ====================================================
Metric Name Description
------------------------------------------ ----------------------------------------------------
`container-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
not mounted, missing timestamp, conflict.
`container-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`container-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
missing timestamp, not mounted, conflict.
`container-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`container-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`container-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`container-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, parameters not utf8, bad accept header.
`container-server.GET.timing` Timing data for each GET request not resulting in
an error.
`container-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`container-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`container-server.POST.errors.timing` Timing data for POST request errors: bad request,
bad x-container-sync-to, not mounted.
`container-server.POST.timing` Timing data for each POST request not resulting in
an error.
========================================== ====================================================
Metrics for `container-sync`:
=============================== ====================================================
Metric Name Description
------------------------------- ----------------------------------------------------
`container-sync.skips` Count of containers skipped because they don't have
sync'ing enabled.
`container-sync.failures` Count of failures sync'ing of individual containers.
`container-sync.syncs` Count of individual containers sync'ed successfully.
`container-sync.deletes` Count of container database rows sync'ed by
deletion.
`container-sync.deletes.timing` Timing data for each container database row
synchronization via deletion.
`container-sync.puts` Count of container database rows sync'ed by PUTing.
`container-sync.puts.timing` Timing data for each container database row
synchronization via PUTing.
=============================== ====================================================
Metrics for `container-updater`:
============================== ====================================================
Metric Name Description
------------------------------ ----------------------------------------------------
`container-updater.successes` Count of containers which successfully updated their
account.
`container-updater.failures` Count of containers which failed to update their
account.
`container-updater.no_changes` Count of containers which didn't need to update
their account.
`container-updater.timing` Timing data for processing a container; only
includes timing for containers which needed to
update their accounts (i.e. "successes" and
"failures" but not "no_changes").
============================== ====================================================
Metrics for `object-auditor`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-auditor.quarantines` Count of objects failing audit and quarantined.
`object-auditor.errors` Count of errors encountered while auditing objects.
`object-auditor.timing` Timing data for each object audit (does not include
any rate-limiting sleep time for
max_files_per_second, but does include rate-limiting
sleep time for max_bytes_per_second).
============================ ====================================================
Metrics for `object-expirer`:
======================== ====================================================
Metric Name Description
------------------------ ----------------------------------------------------
`object-expirer.objects` Count of objects expired.
`object-expirer.errors` Count of errors encountered while attempting to
expire an object.
`object-expirer.timing` Timing data for each object expiration attempt,
including ones resulting in an error.
======================== ====================================================
Metrics for `object-reconstructor`:
====================================================== ======================================================
Metric Name Description
------------------------------------------------------ ------------------------------------------------------
`object-reconstructor.partition.delete.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node because they
didn't belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object reconstruction activity on each device.
`object-reconstructor.partition.delete.timing` Timing data for partitions reconstructed and synced to
another node because they didn't belong on this node.
This metric is not tracked per device.
`object-reconstructor.partition.update.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node, but also
belong on this node. As with delete.count, this metric
is tracked per-device.
`object-reconstructor.partition.update.timing` Timing data for partitions reconstructed which also
belong on this node. This metric is not tracked
per-device.
`object-reconstructor.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-reconstructor.suffix.syncs` Count of suffix directories reconstructed with ssync.
====================================================== ======================================================
Metrics for `object-replicator`:
=================================================== ====================================================
Metric Name Description
--------------------------------------------------- ----------------------------------------------------
`object-replicator.partition.delete.count.<device>` A count of partitions on <device> which were
replicated to another node because they didn't
belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object replication activity on each device.
`object-replicator.partition.delete.timing` Timing data for partitions replicated to another
node because they didn't belong on this node. This
metric is not tracked per device.
`object-replicator.partition.update.count.<device>` A count of partitions on <device> which were
replicated to another node, but also belong on this
node. As with delete.count, this metric is tracked
per-device.
`object-replicator.partition.update.timing` Timing data for partitions replicated which also
belong on this node. This metric is not tracked
per-device.
`object-replicator.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-replicator.suffix.syncs` Count of suffix directories replicated with rsync.
=================================================== ====================================================
Metrics for `object-server`:
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`object-server.quarantines` Count of objects (files) found bad and moved to
quarantine.
`object-server.async_pendings` Count of container updates saved as async_pendings
(may result from PUT or DELETE requests).
`object-server.POST.errors.timing` Timing data for POST request errors: bad request,
missing timestamp, delete-at in past, not mounted.
`object-server.POST.timing` Timing data for each POST request not resulting in
an error.
`object-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
not mounted, missing timestamp, object creation
constraint violation, delete-at in past.
`object-server.PUT.timeouts` Count of object PUTs which exceeded max_upload_time.
`object-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`object-server.PUT.<device>.timing` Timing data per kB transferred (ms/kB) for each
non-zero-byte PUT request on each device.
Monitoring problematic devices, higher is bad.
`object-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, header timestamps before the epoch,
precondition failed.
File errors resulting in a quarantine are not
counted here.
`object-server.GET.timing` Timing data for each GET request not resulting in an
error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`object-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
missing timestamp, not mounted, precondition
failed. Includes requests which couldn't find or
match the object.
`object-server.DELETE.timing` Timing data for each DELETE request not resulting
in an error.
`object-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`object-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
======================================= ====================================================
Metrics for `object-updater`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-updater.errors` Count of drives not mounted or async_pending files
with an unexpected name.
`object-updater.timing` Timing data for object sweeps to flush async_pending
container updates. Does not include object sweeps
which did not find an existing async_pending storage
directory.
`object-updater.quarantines` Count of async_pending container updates which were
corrupted and moved to quarantine.
`object-updater.successes` Count of successful container updates.
`object-updater.failures` Count of failed container updates.
`object-updater.unlinks` Count of async_pending files unlinked. An
async_pending file is unlinked either when it is
successfully processed or when the replicator sees
that there is a newer async_pending file for the
same object.
============================ ====================================================
Metrics for `proxy-server` (in the table, `<type>` is the proxy-server
controller responsible for the request and will be one of "account",
"container", or "object"):
======================================== ====================================================
Metric Name Description
---------------------------------------- ----------------------------------------------------
`proxy-server.errors` Count of errors encountered while serving requests
before the controller type is determined. Includes
invalid Content-Length, errors finding the internal
controller to handle the request, invalid utf8, and
bad URLs.
`proxy-server.<type>.handoff_count` Count of node hand-offs; only tracked if log_handoffs
is set in the proxy-server config.
`proxy-server.<type>.handoff_all_count` Count of times *only* hand-off locations were
utilized; only tracked if log_handoffs is set in the
proxy-server config.
`proxy-server.<type>.client_timeouts` Count of client timeouts (client did not read within
`client_timeout` seconds during a GET or did not
supply data within `client_timeout` seconds during
a PUT).
`proxy-server.<type>.client_disconnects` Count of detected client disconnects during PUT
operations (does NOT include caught Exceptions in
the proxy-server which caused a client disconnect).
======================================== ====================================================
Metrics for `proxy-logging` middleware (in the table, `<type>` is either the
proxy-server controller responsible for the request: "account", "container",
"object", or the string "SOS" if the request came from the `Swift Origin Server`_
middleware. The `<verb>` portion will be one of "GET", "HEAD", "POST", "PUT",
"DELETE", "COPY", "OPTIONS", or "BAD_METHOD". The list of valid HTTP methods
is configurable via the `log_statsd_valid_http_methods` config variable and
the default setting yields the above behavior):
.. _Swift Origin Server: https://github.com/dpgoetz/sos
==================================================== ============================================
Metric Name Description
---------------------------------------------------- --------------------------------------------
`proxy-server.<type>.<verb>.<status>.timing` Timing data for requests, start to finish.
The <status> portion is the numeric HTTP
status code for the request (e.g. "200" or
"404").
`proxy-server.<type>.GET.<status>.first-byte.timing` Timing data up to completion of sending the
response headers (only for GET requests).
<status> and <type> are as for the main
timing metric.
`proxy-server.<type>.<verb>.<status>.xfer` This counter metric is the sum of bytes
transferred in (from clients) and out (to
clients) for requests. The <type>, <verb>,
and <status> portions of the metric are just
like the main timing metric.
==================================================== ============================================
The `proxy-logging` middleware also groups these metrics by policy. The
`<policy-index>` portion represents a policy index):
========================================================================== =====================================
Metric Name Description
-------------------------------------------------------------------------- -------------------------------------
`proxy-server.object.policy.<policy-index>.<verb>.<status>.timing` Timing data for requests, aggregated
by policy index.
`proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing` Timing data up to completion of
sending the response headers,
aggregated by policy index.
`proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer` Sum of bytes transferred in and out,
aggregated by policy index.
========================================================================== =====================================
Metrics for `tempauth` middleware (in the table, `<reseller_prefix>` represents
the actual configured reseller_prefix or "`NONE`" if the reseller_prefix is the
empty string):
========================================= ====================================================
Metric Name Description
----------------------------------------- ----------------------------------------------------
`tempauth.<reseller_prefix>.unauthorized` Count of regular requests which were denied with
HTTPUnauthorized.
`tempauth.<reseller_prefix>.forbidden` Count of regular requests which were denied with
HTTPForbidden.
`tempauth.<reseller_prefix>.token_denied` Count of token requests which were denied.
`tempauth.<reseller_prefix>.errors` Count of errors.
========================================= ====================================================
------------------------
Debugging Tips and Tools
------------------------
When a request is made to Swift, it is given a unique transaction id. This
id should be in every log line that has to do with that request. This can
be useful when looking at all the services that are hit by a single request.
If you need to know where a specific account, container or object is in the
cluster, `swift-get-nodes` will show the location where each replica should be.
If you are looking at an object on the server and need more info,
`swift-object-info` will display the account, container, replica locations
and metadata of the object.
If you are looking at a container on the server and need more info,
`swift-container-info` will display all the information like the account,
container, replica locations and metadata of the container.
If you are looking at an account on the server and need more info,
`swift-account-info` will display the account, replica locations
and metadata of the account.
If you want to audit the data for an account, `swift-account-audit` can be
used to crawl the account, checking that all containers and objects can be
found.
-----------------
Managing Services
-----------------
Swift services are generally managed with `swift-init`. the general usage is
``swift-init <service> <command>``, where service is the swift service to
manage (for example object, container, account, proxy) and command is one of:
========== ===============================================
Command Description
---------- -----------------------------------------------
start Start the service
stop Stop the service
restart Restart the service
shutdown Attempt to gracefully shutdown the service
reload Attempt to gracefully restart the service
========== ===============================================
A graceful shutdown or reload will finish any current requests before
completely stopping the old service. There is also a special case of
`swift-init all <command>`, which will run the command for all swift services.
In cases where there are multiple configs for a service, a specific config
can be managed with ``swift-init <service>.<config> <command>``.
For example, when a separate replication network is used, there might be
`/etc/swift/object-server/public.conf` for the object server and
`/etc/swift/object-server/replication.conf` for the replication services.
In this case, the replication services could be restarted with
``swift-init object-server.replication restart``.
--------------
Object Auditor
--------------
On system failures, the XFS file system can sometimes truncate files it's
trying to write and produce zero-byte files. The object-auditor will catch
these problems but in the case of a system crash it would be advisable to run
an extra, less rate limited sweep to check for these specific files. You can
run this command as follows:
`swift-object-auditor /path/to/object-server/config/file.conf once -z 1000`
"-z" means to only check for zero-byte files at 1000 files per second.
At times it is useful to be able to run the object auditor on a specific
device or set of devices. You can run the object-auditor as follows:
swift-object-auditor /path/to/object-server/config/file.conf once --devices=sda,sdb
This will run the object auditor on only the sda and sdb devices. This param
accepts a comma separated list of values.
-----------------
Object Replicator
-----------------
At times it is useful to be able to run the object replicator on a specific
device or partition. You can run the object-replicator as follows:
swift-object-replicator /path/to/object-server/config/file.conf once --devices=sda,sdb
This will run the object replicator on only the sda and sdb devices. You can
likewise run that command with --partitions. Both params accept a comma
separated list of values. If both are specified they will be ANDed together.
These can only be run in "once" mode.
-------------
Swift Orphans
-------------
Swift Orphans are processes left over after a reload of a Swift server.
For example, when upgrading a proxy server you would probably finish
with a `swift-init proxy-server reload` or `/etc/init.d/swift-proxy
reload`. This kills the parent proxy server process and leaves the
child processes running to finish processing whatever requests they
might be handling at the time. It then starts up a new parent proxy
server process and its children to handle new incoming requests. This
allows zero-downtime upgrades with no impact to existing requests.
The orphaned child processes may take a while to exit, depending on
the length of the requests they were handling. However, sometimes an
old process can be hung up due to some bug or hardware issue. In these
cases, these orphaned processes will hang around
forever. `swift-orphans` can be used to find and kill these orphans.
`swift-orphans` with no arguments will just list the orphans it finds
that were started more than 24 hours ago. You shouldn't really check
for orphans until 24 hours after you perform a reload, as some
requests can take a long time to process. `swift-orphans -k TERM` will
send the SIG_TERM signal to the orphans processes, or you can `kill
-TERM` the pids yourself if you prefer.
You can run `swift-orphans --help` for more options.
------------
Swift Oldies
------------
Swift Oldies are processes that have just been around for a long
time. There's nothing necessarily wrong with this, but it might
indicate a hung process if you regularly upgrade and reload/restart
services. You might have so many servers that you don't notice when a
reload/restart fails; `swift-oldies` can help with this.
For example, if you upgraded and reloaded/restarted everything 2 days
ago, and you've already cleaned up any orphans with `swift-orphans`,
you can run `swift-oldies -a 48` to find any Swift processes still
around that were started more than 2 days ago and then investigate
them accordingly.
-------------------
Custom Log Handlers
-------------------
Swift supports setting up custom log handlers for services by specifying a
comma-separated list of functions to invoke when logging is setup. It does so
via the `log_custom_handlers` configuration option. Logger hooks invoked are
passed the same arguments as Swift's get_logger function (as well as the
getLogger and LogAdapter object):
============== ===============================================
Name Description
-------------- -----------------------------------------------
conf Configuration dict to read settings from
name Name of the logger received
log_to_console (optional) Write log messages to console on stderr
log_route Route for the logging received
fmt Override log format received
logger The logging.getLogger object
adapted_logger The LogAdapter object
============== ===============================================
A basic example that sets up a custom logger might look like the
following:
.. code-block:: python
def my_logger(conf, name, log_to_console, log_route, fmt, logger,
adapted_logger):
my_conf_opt = conf.get('some_custom_setting')
my_handler = third_party_logstore_handler(my_conf_opt)
logger.addHandler(my_handler)
See :ref:`custom-logger-hooks-label` for sample use cases.
------------------------
Securing OpenStack Swift
------------------------
Please refer to the security guides at:
* http://docs.openstack.org/sec/
* http://docs.openstack.org/security-guide/content/object-storage.html
|