1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314
|
title: topplot
class: animation-fade
layout: true
<!-- This slide will serve as the base layout for all your slides -->
.bottom-bar[
.middle[<img src="logo.png" style="width:50px;"/> {{title}}]
]
---
class: impact
# <img src="logo.png" style="width:300px;"/> {{title}}
## Munge top logs in to graphs
### https://gitlab.com/ebardie/topplot
### Jonathan Sambrook / ebardie
---
## Why I wrote topplot
The customer's bugreport had 300,000 log lines from `top` attached to it.
--
.center["These might be helpful," the customer said.]
--
.center[They weren't.]
???
Or at least not in that format.
Humans are good at visual pattern recognition, but we're not so hot on high volumes of text.
--
I looked for an existing graphing tool for top logs.
???
Think about what googling for "top" and any other keyword(s) results in.
--
+ "Top 8 Log Analyzers - LinuxLinks"
+ "Top 10+ Log Analysis Tools - Making Data-Driven Decisions"
+ "Best Log Management Tools: 51 Useful Tools for Log Management, Monitoring, Analytics, and More"
???
There may well be programmes out there.
In any case, I needed something immediately, couldn't find anything, so now there is something. Or another something.
---
# What is top?
???
I've assumed until now that you know what top is.
---
`top` takes over your terminal and looks like:
.x-small[
```
top - 16:34:36 up 1:58, 0 users, load average: 0.13, 0.28, 0.41
Tasks: 264 total, 2 running, 262 sleeping, 0 stopped, 0 zombie
%Cpu0 : 5.6 us, 16.7 sy, 0.0 ni, 77.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 11.1 us, 0.0 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 11.1 us, 0.0 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15717.0 total, 8962.4 free, 3779.8 used, 2974.8 buff/cache
MiB Swap: 15792.0 total, 15792.0 free, 0.0 used. 10673.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
31426 jonatha+ 20 0 3361260 617024 188372 R 11.8 3.8 6:51.65 3 /usr/lib/thunderbird/thunderbird --sm-client-id 10e46f696e000157313890600000150050013
15760 jonatha+ 20 0 9328 3780 3128 R 5.9 0.0 0:00.02 0 top -b -n 1
31238 jonatha+ 20 0 239376 30516 22376 S 5.9 0.2 0:04.95 3 /usr/lib/ibus/ibus-x11 --kill-daemon
31696 jonatha+ 20 0 3118340 573160 235376 S 5.9 3.6 2:26.61 3 /opt/firefox/firefox-bin -contentproc -childID 1 -isForBrowser -prefsLen 1 -prefMapSize 233062 -par+
1 root 20 0 167004 10872 7904 S 0.0 0.1 0:01.79 2 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 2 [kthreadd]
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 0 [rcu_gp]
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 0 [rcu_par_gp]
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 0 [kworker/0:0H-kblockd]
12 root rt 0 0 0 0 S 0.0 0.0 0:00.03 0 [migration/0]
680 systemd+ 20 0 91964 6272 5420 S 0.0 0.0 0:00.36 3 /lib/systemd/systemd-timesyncd
682 _rpc 20 0 6828 3696 3268 S 0.0 0.0 0:00.00 2 /sbin/rpcbind -f -w
683 root 20 0 8088 4800 1608 S 0.0 0.0 0:03.38 2 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
738 root 0 -20 2276 72 0 S 0.0 0.0 0:01.32 3 /usr/sbin/atopacctd
741 root 20 0 116096 20468 11500 S 0.0 0.1 0:01.80 1 /opt/lenovo_fix/venv/bin/python3 /opt/lenovo_fix/lenovo_fix.py
742 root 20 0 2316 824 756 S 0.0 0.0 0:08.21 1 /usr/sbin/acpid
753 root 20 0 82072 3668 3304 S 0.0 0.0 0:00.58 3 /usr/sbin/irqbalance --foreground
754 root 20 0 25516 8596 6940 S 0.0 0.1 0:00.01 0 /usr/sbin/cupsd -l
755 message+ 20 0 8432 5776 3588 S 0.0 0.0 0:02.78 2 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog+
756 root 20 0 326516 20948 14176 S 0.0 0.1 0:01.09 3 /usr/sbin/NetworkManager --no-daemon
757 root 20 0 13452 5176 4568 S 0.0 0.0 0:00.02 3 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant
760 root 20 0 392668 13020 10896 S 0.0 0.1 0:00.27 1 /usr/lib/udisks2/udisksd
763 avahi 20 0 6056 3552 3084 S 0.0 0.0 0:00.74 0 avahi-daemon: running [yoink.local]
765 root 20 0 11292 5768 5292 S 0.0 0.0 0:00.05 3 /usr/lib/bluetooth/bluetoothd
```
]
???
What does it look like?
Summary section at the top, information about tasks underneath.
--
It will sit there, refreshing the display every couple of seconds until you press: .small[`<Ctrl+C>`]
???
264 processes, customer's device had >2k
---
From the man page:
--
> "The top program provides a dynamic real-time view of a running system.<br><br>It can display system summary information as well as a list of processes or threads currently being managed by the Linux kernel.<br><br>The types of system summary information shown and the types, order and size of information displayed for processes are all user configurable and that configuration can be made persistent across restarts."
.right[\- top(1)]
---
### Configuring top
Run top in its normal, interactive mode and type `?` to see how to toggle various settings.
???
As the man page said, top is configurable.
--
Settings you'll probably want to turn on:
- cpu summary: split in to user, system, nice, idle, wait <i>et al.</i>
- cpu summary by cpu core: on
- cpu core column: on
--
Press `W` to write the config file.
--
.col-10[.em[Top tip:]] .col-90[ Configure to taste on the target system; take a copy of the config file produced; install this on freshly wiped/reinstalled systems.]
???
Geddit? Please yourselves...
---
### Batch mode
#### Problems:
.indent[In interactive mode `top` only displays a screenful of information. The _top_ of the list, according to the current sort criterion.
Worse, it spits out control characters to get the terminal to jump through various hoops.
]
--
#### Solution:
.indent[To collect clean, full logs, run top in batch mode:
```
top -b -d 2 -n 300 > top.log
```
This collects five minutes worth of complete top logs every two seconds.
]
???
Since top uses resources itself, and you may be using it diagnose problems on a resource poor system, you don't want to run it too frequently.
We've had a look at top. Why would you *not* want to use it?
---
## Why you *don't* want to be using top/topplot
.col-10[.em[Top tip:]] .col-90[For everyday use `htop` has a better interactive mode, and `atop` displays a wider range of information.]
<br>
<br>
<br>
--
Top is designed for displaying the data to a human.
Other approaches might be more efficient. (`systat`, `munin`, `cacti`, `nagios` or whatnot.)
--
If you really want to see what's going on, two or three seconds granularity is not fine grained enough, and you'll want to bring in the big guns e.g. `lltng`
???
So if you have a choice, investigate what's out there before jumping on top.
--
But when you're handed 300,000 lines of top logs...
---
# Demo
--
.center[.huge[<ftttzzzzz/>?]]
---
# Filtering
`topplot` command-line options enable filtering according to:
- time
- cpu usage (total or peak)
- mem usage (ditto)
- regex
---
# lessons learned
## Choose carefully
I chose gnuplot because I thought matplotlib (mpl) was lacking interactive functionality, and gnuplot seemed superficially simpler to use.
But complex gnuplot scripting is arcane and quite limiting.
--
It turns out that mpl has *different* interactive functionality and is much more extensible.
---
# Notes
### Porting from gnuplot to matplotlib
`gnuplot` required the data to be munged to temporary files and a script generated and passed to 'gnuplot'.
???
So `topplot` constructed a template `gnuplot` script whilst parsing the data, did some post-processing, and at the last minute f-stringed key info in to place and passed the script over to `gnuplot`.
--
So far I've ported the graphical side of things, but `topplot` is still using its original approach to parsing and producing data structures necessary for `gnuplot`.
???
The data is then converted in to Pandas DataFrames because Pandas plays nicely (mostly) with mpl.
It might be worth reworking data munging to make earlier and better use of DataFrames and skip the temporary files.
--
mpl has various arcanities of its own (e.g getting legends to be interactive requires jumping through hoops, and even reaching behind the mpl API to the fancy stuff)
---
# Notes
### Contributing to mplcursors and pandas
mplcursors (annotations extension library for mpl) is a small project with relaxed approach to fixes/contributions
--
- one line fix, with suggestion that maintainer might want to generalize the approach, which they did
--
Pandas (data analysis toolkit) is much larger in scope, so is more structured. And slower at dealing with peripheral contributions. Much slower.
--
- add a new function -> drive-by suggestion in PR review -> reworked the code and added a test -> awaiting reviewer requested by triager
---
# Future directions
#### GUI enhancement
- Add one-to-many legend to toggle processes across all graphs on a _per cpu_ figure.
--
#### Cross platform
- Run on Windows/Mac (Some benighted companies develop for Linux on other platforms.)
- Parse QNX formatted input.
---
# Tail
.em[Source:] https://gitlab.com/ebardie/topplot
.em[Install:] `pip3 install topplot`
Feedback welcome.
## Any questions?
|