1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447
|
.. _MonitoringSystemPerformance:
Monitoring System Performance
#############################
.. contents::
This chapter describes the performance monitoring tools available in Performance Co-Pilot (PCP). This product provides a group of commands and tools
for measuring system performance. Each tool is described completely by its own man page. The man pages are accessible through the **man** command.
For example, the man page for the tool **pmrep** is viewed by entering the following command::
man pmrep
The following major sections are covered in this chapter:
Section 4.1, “`The pmstat Command`_”, discusses **pmstat**, a utility that provides a periodic one-line summary of system performance.
Section 4.2, “`The pmrep Command`_”, discusses **pmrep**, a utility that shows the current values for named performance metrics.
Section 4.3, “`The pmval Command`_”, describes **pmval**, a utility that displays performance metrics in a textual format.
Section 4.4, “`The pminfo Command`_”, describes **pminfo**, a utility that displays information about performance metrics.
Section 4.5, “`The pmstore Command`_”, describes the use of the **pmstore** utility to arbitrarily set or reset selected performance metric values.
The following sections describe the various graphical and text-based PCP tools used to monitor local or remote system performance.
The pmstat Command
******************
The **pmstat** command provides a periodic, one-line summary of system performance. This command is intended to monitor system performance at the highest
level, after which other tools may be used for examining subsystems to observe potential performance problems in greater detail. After entering the
**pmstat** command, you see output similar to the following, with successive lines appearing periodically:
.. sourcecode:: none
pmstat
@ Thu Aug 15 09:25:56 2017
loadavg memory swap io system cpu
1 min swpd free buff cache pi po bi bo in cs us sy id
1.29 833960 5614m 144744 265824 0 0 0 1664 13K 23K 6 7 81
1.51 833956 5607m 144744 265712 0 0 0 1664 13K 24K 5 7 83
1.55 833956 5595m 145196 271908 0 0 14K 1056 13K 24K 7 7 74
An additional line of output is added every five seconds. The **-t** *interval* option may be used to vary the update interval (i.e. the sampling interval).
The output from **pmstat** is directed to standard output, and the columns in the report are interpreted as follows:
**loadavg**
The 1-minute load average (runnable processes).
**memory**
The swpd column indicates average swap space used during the interval (all columns reported in Kbytes unless otherwise indicated). The **free**
column indicates average free memory during the interval. The **buff** column indicates average buffer memory in use during the interval. The **cache**
column indicates average cached memory in use during the interval.
**swap**
Reports the average number of pages that are paged-in (**pi**) and paged-out (**po**) per second during the interval. It is normal for the paged-in values
to be non-zero, but the system is suffering memory stress if the paged-out values are non-zero over an extended period.
**io**
The **bi** and **bo** columns indicate the average rate per second of block input and block output operations respectfully, during the interval.
These rates are independent of the I/O block size. If the values become large, they are reported as thousands of operations per second (K suffix)
or millions of operations per second (M suffix).
**system**
Context switch rate (**cs**) and interrupt rate (**in**). Rates are expressed as average operations per second during the interval. Note that the
interrupt rate is normally at least HZ (the clock interrupt rate, and **kernel.all.hz** metric) interrupts per second.
**cpu**
Percentage of CPU time spent executing user code (**us**), system and interrupt code (**sy**), idle loop (**id**).
As with most PCP utilities, real-time metric, and archives are interchangeable.
For example, the following command uses a local system PCP archive *20170731* and the timezone of the host (**smash**) from which performance metrics
in the archive were collected:
.. sourcecode:: none
pmstat -a ${PCP_LOG_DIR}/pmlogger/smash/20170731 -t 2hour -A 1hour -z
Note: timezone set to local timezone of host "smash"
@ Wed Jul 31 10:00:00 2017
loadavg memory swap io system cpu
1 min swpd free buff cache pi po bi bo in cs us sy id
3.90 24648 6234m 239176 2913m ? ? ? ? ? ? ? ? ?
1.72 24648 5273m 239320 2921m 0 0 4 86 11K 19K 5 5 84
3.12 24648 5194m 241428 2969m 0 0 0 84 10K 19K 5 5 85
1.97 24644 4945m 244004 3146m 0 0 0 84 10K 19K 5 5 84
3.82 24640 4908m 244116 3147m 0 0 0 83 10K 18K 5 5 85
3.38 24620 4860m 244116 3148m 0 0 0 83 10K 18K 5 4 85
2.89 24600 4804m 244120 3149m 0 0 0 83 10K 18K 5 4 85
pmFetch: End of PCP archive
For complete information on **pmstat** usage and command line options, see the **pmstat(1)** man page.
The pmrep Command
******************
The **pmrep** command displays performance metrics in ASCII tables, suitable for export into databases or report generators. It is a flexible command.
For example, the following command provides continuous memory statistics on a host named **surf**:
.. sourcecode:: none
pmrep -p -h surf kernel.all.load kernel.all.pswitch
k.a.load k.a.load k.a.load k.a.pswitch
1 minute 5 minute 15 minut
count/s
10:41:37 0.160 0.170 0.180 N/A
10:41:38 0.160 0.170 0.180 1427.016
10:41:39 0.160 0.170 0.180 2129.040
10:41:40 0.160 0.170 0.180 5335.163
10:41:41 0.160 0.170 0.180 723.125
10:41:42 0.140 0.160 0.180 591.859
See the **pmrep(1)** man page for more information.
The pmval Command
******************
The **pmval** command dumps the current values for the named performance metrics. For example, the following command reports the value of performance
metric **proc.nprocs** once per second (by default), and produces output similar to this:
.. sourcecode:: none
pmval proc.nprocs
metric: proc.nprocs
host: localhost
semantics: instantaneous value
units: none
samples: all
interval: 1.00 sec
81
81
82
81
In this example, the number of running processes was reported once per second.
Where the semantics of the underlying performance metrics indicate that it would be sensible, **pmval** reports the rate of change or resource utilization.
For example, the following command reports idle processor utilization for each of four CPUs on the remote host **dove**, each five seconds apart,
producing output of this form:
.. sourcecode:: none
pmval -h dove -t 5sec -s 4 kernel.percpu.cpu.idle
metric: kernel.percpu.cpu.idle
host: dove
semantics: cumulative counter (converting to rate)
units: millisec (converting to time utilization)
samples: 4
interval: 5.00 sec
cpu:1.1.0.a cpu:1.1.0.c cpu:1.1.1.a cpu:1.1.1.c
1.000 0.9998 0.9998 1.000
1.000 0.9998 0.9998 1.000
0.8989 0.9987 0.9997 0.9995
0.9568 0.9998 0.9996 1.000
Similarly, the following command reports disk I/O read rate every minute for just the disk **/dev/disk1**, and produces output similar to the following:
.. sourcecode:: none
pmval -t 1min -i disk1 disk.dev.read
metric: disk.dev.read
host: localhost
semantics: cumulative counter (converting to rate)
units: count (converting to count / sec)
samples: indefinite
interval: 60.00 sec
disk1
33.67
48.71
52.33
11.33
2.333
The **-r** flag may be used to suppress the rate calculation (for metrics with counter semantics) and display the raw values of the metrics.
In the example below, manipulation of the time within the archive is achieved by the exchange of time control messages between **pmval** and **pmtime**.
::
pmval -g -a ${PCP_LOG_DIR}/pmlogger/myserver/20170731 kernel.all.load
The **pmval** command is documented by the **pmval(1)** man page, and annotated examples of the use of **pmval** can be found in the *PCP Tutorials and Case Studies*
companion document.
The pminfo Command
*******************
The **pminfo** command displays various types of information about performance metrics available through the Performance Co-Pilot (PCP) facilities.
The **-T** option is extremely useful; it provides help text about performance metrics:
.. sourcecode:: none
pminfo -T mem.util.cached
mem.util.cached
Help:
Memory used by the page cache, including buffered file data.
This is in-memory cache for files read from the disk (the pagecache)
but doesn't include SwapCached.
The **-t** option displays the one-line help text associated with the selected metrics. The **-T** option prints more verbose help text.
Without any options, **pminfo** verifies that the specified metrics exist in the namespace, and echoes those names. Metrics may be specified as arguments
to **pminfo** using their full metric names. For example, this command returns the following response::
pminfo hinv.ncpu network.interface.total.bytes
hinv.ncpu
network.interface.total.bytes
A group of related metrics in the namespace may also be specified. For example, to list all of the **hinv** metrics you would use this command::
pminfo hinv
hinv.physmem
hinv.pagesize
hinv.ncpu
hinv.ndisk
hinv.nfilesys
hinv.ninterface
hinv.nnode
hinv.machine
hinv.map.scsi
hinv.map.cpu_num
hinv.map.cpu_node
hinv.map.lvname
hinv.cpu.clock
hinv.cpu.vendor
hinv.cpu.model
hinv.cpu.stepping
hinv.cpu.cache
hinv.cpu.bogomips
If no metrics are specified, **pminfo** displays the entire collection of metrics. This can be useful for searching for metrics, when only part of the
full name is known. For example, this command returns the following response::
pminfo | grep nfs
nfs.client.calls
nfs.client.reqs
nfs.server.calls
nfs.server.reqs
nfs3.client.calls
nfs3.client.reqs
nfs3.server.calls
nfs3.server.reqs
nfs4.client.calls
nfs4.client.reqs
nfs4.server.calls
nfs4.server.reqs
The **-d** option causes **pminfo** to display descriptive information about metrics (refer to the **pmLookupDesc(3)** man page for an explanation of this metadata information).
The following command and response show use of the **-d** option:
.. sourcecode:: none
pminfo -d proc.nprocs disk.dev.read filesys.free
proc.nprocs
Data Type: 32-bit unsigned int InDom: PM_INDOM_NULL 0xffffffff
Semantics: instant Units: none
disk.dev.read
Data Type: 32-bit unsigned int InDom: 60.1 0xf000001
Semantics: counter Units: count
filesys.free
Data Type: 64-bit unsigned int InDom: 60.5 0xf000005
Semantics: instant Units: Kbyte
The **-l** option causes **pminfo** to display labels about metrics (refer to the **pmLookupLabels(3)** man page for an explanation of this metadata
information). If the metric has an instance domain, the labels associated with each instance of the metric is printed. The following command and
response show use of the **-l** option:
.. sourcecode:: none
pminfo -l -h shard kernel.pernode.cpu.user
kernel.percpu.cpu.sys
inst [0 or "cpu0"] labels
{"agent":"linux","cpu":0,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [1 or "cpu1"] labels
{"agent":"linux","cpu":1,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [2 or "cpu2"] labels
{"agent":"linux","cpu":2,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [3 or "cpu3"] labels
{"agent":"linux","cpu":3,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [4 or "cpu4"] labels
{"agent":"linux","cpu":4,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [5 or "cpu5"] labels
{"agent":"linux","cpu":5,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [6 or "cpu6"] labels
{"agent":"linux","cpu":6,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
inst [7 or "cpu7"] labels
{"agent":"linux","cpu":7,"device_type":"cpu","domainname":"acme.com","groupid":1000,"hostname":"shard","indom_name":"per cpu","userid":1000}
The **-f** option to **pminfo** forces the current value of each named metric to be fetched and printed. In the example below, all metrics in the group **hinv**
are selected:
.. sourcecode:: none
pminfo -f hinv
hinv.physmem
value 15701
hinv.pagesize
value 16384
hinv.ncpu
value 4
hinv.ndisk
value 6
hinv.nfilesys
value 2
hinv.ninterface
value 8
hinv.nnode
value 2
hinv.machine
value "IP35"
hinv.map.cpu_num
inst [0 or "cpu:1.1.0.a"] value 0
inst [1 or "cpu:1.1.0.c"] value 1
inst [2 or "cpu:1.1.1.a"] value 2
inst [3 or "cpu:1.1.1.c"] value 3
hinv.map.cpu_node
inst [0 or "node:1.1.0"] value "/dev/hw/module/001c01/slab/0/node"
inst [1 or "node:1.1.1"] value "/dev/hw/module/001c01/slab/1/node"
hinv.cpu.clock
inst [0 or "cpu:1.1.0.a"] value 800
inst [1 or "cpu:1.1.0.c"] value 800
inst [2 or "cpu:1.1.1.a"] value 800
inst [3 or "cpu:1.1.1.c"] value 800
hinv.cpu.vendor
inst [0 or "cpu:1.1.0.a"] value "GenuineIntel"
inst [1 or "cpu:1.1.0.c"] value "GenuineIntel"
inst [2 or "cpu:1.1.1.a"] value "GenuineIntel"
inst [3 or "cpu:1.1.1.c"] value "GenuineIntel"
hinv.cpu.model
inst [0 or "cpu:1.1.0.a"] value "0"
inst [1 or "cpu:1.1.0.c"] value "0"
inst [2 or "cpu:1.1.1.a"] value "0"
inst [3 or "cpu:1.1.1.c"] value "0"
hinv.cpu.stepping
inst [0 or "cpu:1.1.0.a"] value "6"
inst [1 or "cpu:1.1.0.c"] value "6"
inst [2 or "cpu:1.1.1.a"] value "6"
inst [3 or "cpu:1.1.1.c"] value "6"
hinv.cpu.cache
inst [0 or "cpu:1.1.0.a"] value 0
inst [1 or "cpu:1.1.0.c"] value 0
inst [2 or "cpu:1.1.1.a"] value 0
inst [3 or "cpu:1.1.1.c"] value 0
hinv.cpu.bogomips
inst [0 or "cpu:1.1.0.a"] value 1195.37
inst [1 or "cpu:1.1.0.c"] value 1195.37
inst [2 or "cpu:1.1.1.a"] value 1195.37
inst [3 or "cpu:1.1.1.c"] value 1195.37
The **-h** option directs **pminfo** to retrieve information from the specified host. If the metric has an instance domain,
the value associated with each instance of the metric is printed:
.. sourcecode:: none
pminfo -h dove -f filesys.mountdir
filesys.mountdir
inst [0 or "/dev/xscsi/pci00.01.0/target81/lun0/part3"] value "/"
inst [1 or "/dev/xscsi/pci00.01.0/target81/lun0/part1"] value "/boot/efi"
The **-m** option prints the Performance Metric Identifiers (PMIDs) of the selected metrics. This is useful for finding out which PMDA supplies the metric.
For example, the output below identifies the PMDA supporting domain 4 (the leftmost part of the PMID) as the one supplying information for the metric
**environ.extrema.mintemp**::
pminfo -m environ.extrema.mintemp
environ.extrema.mintemp PMID: 4.0.3
The **-v** option verifies that metric definitions in the PMNS correspond with supported metrics, and checks that a value is available for the metric.
Descriptions and values are fetched, but not printed. Only errors are reported.
Complete information on the **pminfo** command is found in the **pminfo(1)** man page. There are further examples of the use of **pminfo** in the
*PCP Tutorials and Case Studies*.
The pmstore Command
********************
From time to time you may wish to change the value of a particular metric. Some metrics are counters that may need to be reset, and some are simply
control variables for agents that collect performance metrics. When you need to change the value of a metric for any reason, the command to use is **pmstore**.
.. note::
For obvious reasons, the ability to arbitrarily change the value of a performance metric is not supported. Rather, PCP collectors selectively allow some metrics to be modified in a very controlled fashion.
The basic syntax of the command is as follows::
pmstore metricname value
There are also command line flags to further specify the action. For example, the **-i** option restricts the change to one or more instances of the
performance metric.
The *value* may be in one of several forms, according to the following rules:
1. If the metric has an integer type, then value should consist of an optional leading hyphen, followed either by decimal digits or “0x” and some hexadecimal digits; “0X” is also acceptable instead of “0x.”
2. If the metric has a floating point type, then value should be in the form of an integer (described above), a fixed point number, or a number in scientific notation.
3. If the metric has a string type, then value is interpreted as a literal string of ASCII characters.
4. If the metric has an aggregate type, then an attempt is made to interpret value as an integer, a floating point number, or a string. In the first two cases, the minimal word length encoding is used; for example, “123” would be interpreted as a four-byte aggregate, and “0x100000000” would be interpreted as an eight-byte aggregate.
The following example illustrates the use of **pmstore** to enable performance metrics collection in the **txmon** PMDA (see ``${PCP_PMDAS_DIR}/txmon``
for the source code of the txmon PMDA). When the metric **txmon.control.level** has the value 0, no performance metrics are collected.
Values greater than 0 enable progressively more verbose instrumentation.
::
pminfo -f txmon.count
txmon.count
No value(s) available!
pmstore txmon.control.level 1
txmon.control.level old value=0 new value=1
pminfo -f txmon.count
txmon.count
inst [0 or "ord-entry"] value 23
inst [1 or "ord-enq"] value 11
inst [2 or "ord-ship"] value 10
inst [3 or "part-recv"] value 3
inst [4 or "part-enq"] value 2
inst [5 or "part-used"] value 1
inst [6 or "b-o-m"] value 0
For complete information on **pmstore** usage and syntax, see the **pmstore(1)** man page.
|