File: presentation.md

package info (click to toggle)
topplot 0.2.2%2Brepack-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,104 kB
  • sloc: python: 3,420; makefile: 59; sh: 10
file content (314 lines) | stat: -rw-r--r-- 9,711 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
title: topplot
class: animation-fade
layout: true

<!-- This slide will serve as the base layout for all your slides -->
.bottom-bar[
.middle[<img src="logo.png" style="width:50px;"/>&nbsp;&nbsp;{{title}}]
]

---

class: impact

# <img src="logo.png" style="width:300px;"/> {{title}}
## Munge top logs in to graphs
### https://gitlab.com/ebardie/topplot
### Jonathan Sambrook / ebardie

---

## Why I wrote topplot

The customer's bugreport had 300,000 log lines from `top` attached to it.

--

.center["These might be helpful," the customer said.]

--

.center[They weren't.]

???

Or at least not in that format.

Humans are good at visual pattern recognition, but we're not so hot on high volumes of text.

--

I looked for an existing graphing tool for top logs.

???

Think about what googling for "top" and any other keyword(s) results in.


--

+ "Top 8 Log Analyzers - LinuxLinks"
+ "Top 10+ Log Analysis Tools - Making Data-Driven Decisions" 
+ "Best Log Management Tools: 51 Useful Tools for Log Management, Monitoring, Analytics, and More"

???

There may well be programmes out there.

In any case, I needed something immediately, couldn't find anything, so now there is something. Or another something.

---

# What is top?

???

I've assumed until now that you know what top is.

---

`top` takes over your terminal and looks like:
.x-small[
```
top - 16:34:36 up  1:58,  0 users,  load average: 0.13, 0.28, 0.41
Tasks: 264 total,   2 running, 262 sleeping,   0 stopped,   0 zombie
%Cpu0  :  5.6 us, 16.7 sy,  0.0 ni, 77.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 11.1 us,  0.0 sy,  0.0 ni, 88.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 11.1 us,  0.0 sy,  0.0 ni, 88.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15717.0 total,   8962.4 free,   3779.8 used,   2974.8 buff/cache
MiB Swap:  15792.0 total,  15792.0 free,      0.0 used.  10673.7 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ P COMMAND
31426 jonatha+  20   0 3361260 617024 188372 R  11.8   3.8   6:51.65 3 /usr/lib/thunderbird/thunderbird --sm-client-id 10e46f696e000157313890600000150050013
15760 jonatha+  20   0    9328   3780   3128 R   5.9   0.0   0:00.02 0 top -b -n 1
31238 jonatha+  20   0  239376  30516  22376 S   5.9   0.2   0:04.95 3 /usr/lib/ibus/ibus-x11 --kill-daemon
31696 jonatha+  20   0 3118340 573160 235376 S   5.9   3.6   2:26.61 3 /opt/firefox/firefox-bin -contentproc -childID 1 -isForBrowser -prefsLen 1 -prefMapSize 233062 -par+
    1 root      20   0  167004  10872   7904 S   0.0   0.1   0:01.79 2 /sbin/init
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 2 [kthreadd]
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 0 [rcu_gp]
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 0 [rcu_par_gp]
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 0 [kworker/0:0H-kblockd]
   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.03 0 [migration/0]
  680 systemd+  20   0   91964   6272   5420 S   0.0   0.0   0:00.36 3 /lib/systemd/systemd-timesyncd
  682 _rpc      20   0    6828   3696   3268 S   0.0   0.0   0:00.00 2 /sbin/rpcbind -f -w
  683 root      20   0    8088   4800   1608 S   0.0   0.0   0:03.38 2 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
  738 root       0 -20    2276     72      0 S   0.0   0.0   0:01.32 3 /usr/sbin/atopacctd
  741 root      20   0  116096  20468  11500 S   0.0   0.1   0:01.80 1 /opt/lenovo_fix/venv/bin/python3 /opt/lenovo_fix/lenovo_fix.py
  742 root      20   0    2316    824    756 S   0.0   0.0   0:08.21 1 /usr/sbin/acpid
  753 root      20   0   82072   3668   3304 S   0.0   0.0   0:00.58 3 /usr/sbin/irqbalance --foreground
  754 root      20   0   25516   8596   6940 S   0.0   0.1   0:00.01 0 /usr/sbin/cupsd -l
  755 message+  20   0    8432   5776   3588 S   0.0   0.0   0:02.78 2 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog+
  756 root      20   0  326516  20948  14176 S   0.0   0.1   0:01.09 3 /usr/sbin/NetworkManager --no-daemon
  757 root      20   0   13452   5176   4568 S   0.0   0.0   0:00.02 3 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant
  760 root      20   0  392668  13020  10896 S   0.0   0.1   0:00.27 1 /usr/lib/udisks2/udisksd
  763 avahi     20   0    6056   3552   3084 S   0.0   0.0   0:00.74 0 avahi-daemon: running [yoink.local]
  765 root      20   0   11292   5768   5292 S   0.0   0.0   0:00.05 3 /usr/lib/bluetooth/bluetoothd
```
]

???

What does it look like? 

Summary section at the top, information about tasks underneath.

--

It will sit there, refreshing the display every couple of seconds until you press: .small[`<Ctrl+C>`]

???

264 processes, customer's device had >2k

---

From the man page:

--

> "The  top  program provides a dynamic real-time view of a running system.<br><br>It can display system summary information as well as a list of processes or threads currently being managed by the Linux kernel.<br><br>The types of system summary information shown and the types, order and size of information displayed for processes are all user configurable and that configuration can be made persistent across restarts." 

.right[\- top(1)]

---

### Configuring top

Run top in its normal, interactive mode and type `?` to see how to toggle various settings.

???

As the man page said, top is configurable.

--

Settings you'll probably want to turn on:

- cpu summary: split in to user, system, nice, idle, wait <i>et al.</i>
- cpu summary by cpu core: on
- cpu core column: on

--

Press `W` to write the config file. 

--


.col-10[.em[Top tip:]] .col-90[ Configure to taste on the target system; take a copy of the config file produced; install this on freshly wiped/reinstalled systems.]

???

Geddit? Please yourselves...

---

### Batch mode

#### Problems:
.indent[In interactive mode `top` only displays a screenful of information. The _top_ of the list, according to the current sort criterion.

Worse, it spits out control characters to get the terminal to jump through various hoops. 
]

--

#### Solution:
.indent[To collect clean, full logs, run top in batch mode:

```
top -b -d 2 -n 300 > top.log
```

This collects five minutes worth of complete top logs every two seconds.
]

???

Since top uses resources itself, and you may be using it diagnose problems on a resource poor system, you don't want to run it too frequently.

We've had a look at top. Why would you *not* want to use it?

---

## Why you *don't* want to be using top/topplot

.col-10[.em[Top tip:]] .col-90[For everyday use `htop` has a better interactive mode, and `atop` displays a wider range of information.]
<br>
<br>
<br>
--

Top is designed for displaying the data to a human.

Other approaches might be more efficient. (`systat`, `munin`, `cacti`, `nagios` or whatnot.)

--

If you really want to see what's going on, two or three seconds granularity is not fine grained enough, and you'll want to bring in the big guns e.g. `lltng`

???

So if you have a choice, investigate what's out there before jumping on top.

--

But when you're handed 300,000 lines of top logs...

---

# Demo

--

.center[.huge[&lt;ftttzzzzz/&gt;?]]

---

# Filtering

`topplot` command-line options enable filtering according to:

- time
- cpu usage (total or peak)
- mem usage (ditto)
- regex

---

# lessons learned
## Choose carefully

I chose gnuplot because I thought matplotlib (mpl) was lacking interactive functionality, and gnuplot seemed superficially simpler to use. 

But complex gnuplot scripting is arcane and quite limiting. 

--

It turns out that mpl has *different* interactive functionality and is much more extensible.

---

# Notes
### Porting from gnuplot to matplotlib

`gnuplot` required the data to be munged to temporary files and a script generated and passed to 'gnuplot'.

???

So `topplot` constructed a template `gnuplot` script whilst parsing the data, did some post-processing, and at the last minute f-stringed key info in to place and passed the script over to `gnuplot`.

--

So far I've ported the graphical side of things, but `topplot` is still using its original approach to parsing and producing data structures necessary for `gnuplot`.

???

The data is then converted in to Pandas DataFrames because Pandas plays nicely (mostly) with mpl.

It might be worth reworking data munging to make earlier and better use of DataFrames and skip the temporary files.

--

mpl has various arcanities of its own (e.g getting legends to be interactive requires jumping through hoops, and even reaching behind the mpl API to the fancy stuff)

---
# Notes
### Contributing to mplcursors and pandas

mplcursors (annotations extension library for mpl) is a small project with relaxed approach to fixes/contributions

--

- one line fix, with suggestion that maintainer might want to generalize the approach, which they did

--

Pandas (data analysis toolkit) is much larger in scope, so is more structured. And slower at dealing with peripheral contributions. Much slower.

--

- add a new function -> drive-by suggestion in PR review -> reworked the code and added a test -> awaiting reviewer requested by triager

---
# Future directions
#### GUI enhancement
- Add one-to-many legend to toggle processes across all graphs on a _per cpu_ figure.

--

#### Cross platform
- Run on Windows/Mac (Some benighted companies develop for Linux on other platforms.)
- Parse QNX formatted input.
---

# Tail

.em[Source:] https://gitlab.com/ebardie/topplot

.em[Install:] `pip3 install topplot`

Feedback welcome.

## Any questions?