1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
|
<html>
<head>
<link rel=stylesheet href="style.css" type="text/css">
<title>collectl - Disk Info</title>
</head>
<body>
<center><h1>Disk Monitoring</h1></center>
<p>
<h3>Introduction</h3>
As with other subsystems that have specific devices, collectl can report disk summary as well as detail data.
Like other summary data, disk summary data represents the total activity across all disks or to be more precise, those
enumerated in /proc/diskstats, with the following caveats:
<ul>
<li>Partition data is skipped by default for both summary and detail data, noting filters contain a trailing <i>letter-space</i></li>
<li>Only those disk names that explicitly match the pattern in DiskFilter (see collectl.conf) are collected and subseqently
included in the results. You can also override this on demand by specifying <i>--rawdskfilt</i>. Note that in both cases
any disks not selected will not have any data recorded for them.</li>
<li>You can force collectl to collect/report partition level details by specifying a disk filter without the trailing space.
If you do so, the partition data will NOT be included in the summary stats since that would result in double counting.</li>
<li>If you specify a filter with <i>--diskfilt</i>, these filters are only applied to output and allow you to report on a
subset of disks for which data has been collected.
<li>Device mapper disks, while listed in the detail data are <i>NOT</i> included in the summary data to avoid
double counting</li>
</ul>
The three key counters for disk activity are bytes, iops and merges, though only bytes and iops are reported
in brief summary mode. The average I/O size which is also reportd in verbose and detail modes may be optionally
included in brief mode by including --iosize or --dskopts i. If you're not sure why/when you'd care about summary
data, be sure to read <a href=WhySummary.html>this</a>.
<p>
Disk detail goes a step further and in addition to including the same information that's reported for summary data
also includes key device specific metrics relating to queue lengths, wait and service times as well as utilization.
For those familiar with iostat, this is the same data it reports. These numbers will help you determine if your
individual disks are operating properly since high wait and/or service times are a bad thing and indicate something
is causing an undesired delay somewhere.
<p><b>Basic Filtering</b>
<br>If you'd like to limit the disks included in either the detail output or the summary
totals, you can explicity include or exclude them using <i>--dskfilt</i>. The target of
this switch is actually one or more perl expressions, but if you don't know perl all you
really need to know is these are strings that are compared to each disk name. If the first
(or only) name is preceded with a ^, disks that match the string(s) will be excluded.
<p><i>No filtering...</i>
<div class=terminal>
<pre>
collectl -sD
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
sda 0 0 0 0 291 67 5 58 58 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-0 0 0 0 0 291 0 73 4 4 0 1 0 0
dm-1 0 0 0 0 0 0 0 0 0 0 0 0 0
hda 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><i>Only include sd disks...</i>
<div class=terminal>
<pre>
collectl -sD --dskfilt sd
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
sda 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><i>Exclude sd and dm disks...</i>
<div class=terminal>
<pre>
collectl -sD --dskfilt ^sd,dm
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
hda 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><i>Exclude disks with the letter 'a' in their name...</i>
<div class=terminal>
<pre>
collectl -sD --dskfilt ^a
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-0 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-1 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><b>Raw Filtering</b>
<br>
As mentioned in the previous section, basic disk fitering is applied after the data is collected,
so if you don't collect it in the first place there's nothing to filter. So what about special
situations where you maye have a disk collectl doesn't know about in it's default filtering
string which is specified in /etc/collectl.conf as DiskFilter (and commented out since that is
the default)? OR what if you'd like to see partition level data which is also filtered out?
<p>
If you want to override this filtering you actually have 2 choices available to you. Either edit
the collectl.conf file or simply use --rawdskfilt which essentially redefines DiskFilter. It can
be a handy way to specify filtering via a switch in case you don't want to have to modify the conf
file.
<p><i>Show stats for unknown disk named nvme1n1 - the wrong way</i>
<p>
Since we only specified a partial name, we see everything that matches including partition.
<div class=terminal>
<pre>
collectl.pl -sD --rawdskfilt nvme -c1
# DISK STATISTICS (/sec)
# <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util
nvme1n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
nvme1n1p1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
nvme1n1p2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
nvme0n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><i>Show stats for unknown disk named nvme1n1 - the right way</i>
<p>
If we want to just see nvme0n1 and nvme1n1, we need to be more specific and be sure to include a
space at the end of the pattern which will also require the string to be quoted.
<div class=terminal>
<pre>
collectl.pl -sD --rawdskfilt 'nvme\dn\d '
# DISK STATISTICS (/sec)
# <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util
nvme1n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
nvme0n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><i>Show stats for specific partition(s)</i>
<p>
<div class=terminal>
<pre>
collectl.pl -sD --rawdskfilt sda1
# DISK STATISTICS (/sec)
# <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util
sda1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
</pre>
</div>
<p><b>Dynamic Disk Discovery</b>
<br>
Dynamic disks are handled by the exact same algorithms that are applied to dynamic networks and
while they have not yet been found to have the same problems as netoworks do with potentially
hundreds of orphaned names no longer in use, the same logic for dealing with stale disks has
been added to netowrk processing data and rather than be repetitious, read the descrption
for <a href=Network.html#dynamic>dynamic network processing</a> and learn how the new disk
option <i>--dskopts o</i> would be applied.
<p><table width=100%><tr><td align=right><i>updated Nov 8, 2016</i></td></tr></colgroup></table>
</body>
</html>
|