1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
|
=========================
Tracing Ceph With LTTng
=========================
Configuring Ceph with LTTng
===========================
if you compile code, please use -DWITH_LTTNG option (default: ON)::
./do_cmake -DWITH_LTTNG=ON
If your Ceph deployment is package-based (YUM, DNF, APT) vs containerized, install the required software packages according to the module which you want to trackļ¼ otherwise, it may cause a coredump due to missing *tp.solibrary files::
librbd-devel
librgw-devel
librados-devel
Config option for tracing must be set to true in ceph.conf.
Following options are currently available::
bluestore_tracing
event_tracing (-DWITH_EVENTTRACE)
osd_function_tracing (-DWITH_OSD_INSTRUMENT_FUNCTIONS)
osd_objectstore_tracing (actually filestore tracing)
rbd_tracing
osd_tracing
rados_tracing
rgw_op_tracing
rgw_rados_tracing
Testing Trace
=============
Start LTTng daemon::
lttng-sessiond --daemonize
Run vstart cluster with enabling trace options::
../src/vstart.sh -d -n -l -e -o "osd_tracing = true"
List available tracepoints::
lttng list --userspace
You will get something like::
UST events:
-------------
PID: 100859 - Name: /path/to/ceph-osd
pg:queue_op (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
osd:do_osd_op_post (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
osd:do_osd_op_pre_unknown (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
osd:do_osd_op_pre_copy_from (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
osd:do_osd_op_pre_copy_get (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
...
Create tracing session, enable tracepoints and start trace::
lttng create trace-test
lttng enable-event --userspace osd:*
lttng start
Perform some Ceph operation::
rados bench -p ec 5 write
Stop tracing and view result::
lttng stop
lttng view
Destroy tracing session::
lttng destroy
=========================
Tracing Ceph With Blkin
=========================
Ceph can use Blkin, a library created by Marios Kogias and others,
which enables tracking a specific request from the time it enters
the system at higher levels till it is finally served by RADOS.
In general, Blkin implements the Dapper_ tracing semantics
in order to show the causal relationships between the different
processing phases that an IO request may trigger. The goal is an
end-to-end visualisation of the request's route in the system,
accompanied by information concerning latencies in each processing
phase. Thanks to LTTng this can happen with a minimal overhead and
in realtime. The LTTng traces can then be visualized with Twitter's
Zipkin_.
.. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
.. _Zipkin: https://zipkin.io/
Configuring Ceph with Blkin
===========================
Use -DWITH_BLKIN option (which requires -DWITH_LTTNG)::
./do_cmake -DWITH_LTTNG=ON -DWITH_BLKIN=ON
Config option for blkin must be set to true in ceph.conf.
Following options are currently available::
rbd_blkin_trace_all
osd_blkin_trace_all
osdc_blkin_trace_all
Testing Blkin
=============
It's easy to test Ceph's Blkin tracing. Let's assume you don't have
Ceph already running, and you compiled Ceph with Blkin support but
you didn't install it. Then launch Ceph with the ``vstart.sh`` script
in Ceph's src directory so you can see the possible tracepoints.::
OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
lttng list --userspace
You'll see something like the following:::
UST events:
-------------
PID: 8987 - Name: ./ceph-osd
zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
PID: 8407 - Name: ./ceph-mon
zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
...
Next, stop Ceph so that the tracepoints can be enabled.::
../src/stop.sh
Start up an LTTng session and enable the tracepoints.::
lttng create blkin-test
lttng enable-event --userspace zipkin:timestamp
lttng enable-event --userspace zipkin:keyval_integer
lttng enable-event --userspace zipkin:keyval_string
lttng start
Then start up Ceph again.::
OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
You may want to check that ceph is up.::
ceph status
Now put something in using rados, check that it made it, get it back, and remove it.::
ceph osd pool create test-blkin
rados put test-object-1 ../src/vstart.sh --pool=test-blkin
rados -p test-blkin ls
ceph osd map test-blkin test-object-1
rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
md5sum vstart*
rados rm test-object-1 --pool=test-blkin
You could also use the example in ``examples/librados/`` or ``rados bench``.
Then stop the LTTng session and see what was collected.::
lttng stop
lttng view
You'll see something like:::
[15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" }
[15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 }
[15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" }
Install Zipkin
===============
One of the points of using Blkin is so that you can look at the traces
using Zipkin. Users should run Zipkin as a tracepoints collector and
also a web service. The executable jar runs a collector on port 9410 and
the web interface on port 9411
Download Zipkin Package::
git clone https://github.com/openzipkin/zipkin && cd zipkin
wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
java -jar zipkin.jar
Or, launch docker image::
docker run -d -p 9411:9411 openzipkin/Zipkin
Show Ceph's Blkin Traces in Zipkin-web
======================================
Download babeltrace-zipkin project. This project takes the traces
generated with blkin and sends them to a Zipkin collector using scribe::
git clone https://github.com/vears91/babeltrace-zipkin
cd babeltrace-zipkin
Send lttng data to Zipkin::
python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
Example::
python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
Check Ceph traces on webpage::
Browse http://${zipkin-collector-ip}:9411
Click "Find traces"
|