File: profiling.rst

package info (click to toggle)
mistral 21.0.0-6
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 16,140 kB
  • sloc: python: 54,052; sh: 701; makefile: 58
file content (298 lines) | stat: -rw-r--r-- 13,000 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
Profiling Mistral
=================

What Is Profiling?
------------------
Profiling is a procedure for gathering runtime statistics about certain code
snippets like:

 - The maximum run time
 - The minimum run time
 - The average run time
 - The number of runs

Such info is a key to understanding performance bottlenecks residing in
a system. Having these metrics, we can focus on places in code that slow
down the system most and come up with optimisations to improve them.

A typical code snippet eligible for gathering this kind of information is a
function or a method since, most popular engineering techniques encourage
developers to decompose code into functions/methods representing well defined
parts of program logic. However, any arbitrary piece of code may be a target
for measuring.

'osprofiler' Project
--------------------

`osprofiler <https://osprofiler.readthedocs.io/en/latest/>`_ is a project
created within the OpenStack ecosystem to do profiling. The paragraphs below
explain how Mistral uses 'osprofiler' for profiling. The central concept of
'osprofiler' is a profile trace. A developer can mark code snippets with
profiler traces and 'osprofiler' will be tracking them. In general,
'osprofiler' allows cross-service profiling, that is, tracking a chain of
calls that belong to different RESTful services but related with the same
user request. However, this guide doesn't cover this more complex use case
and focus on profiling within just one service, Mistral.

Profiler Traces
---------------

The most common way to create a profiler trace in the code is adding a
special ''@trace"

 .. code-block:: python

    from osprofiler import profiler


    class DefaultEngine(base.Engine):
        ...

        @profiler.trace('engine-on-action-complete', hide_args=True)
        def on_action_complete(self, action_ex_id, result, wf_action=False,
                               async_=False):
            with db_api.transaction():
                if wf_action:
                    action_ex = db_api.get_workflow_execution(action_ex_id)

                    if result is None:
                        result = ml_actions.Result(data=action_ex.output)
                else:
                    action_ex = db_api.get_action_execution(action_ex_id)

                action_handler.on_action_complete(action_ex, result)

                return action_ex.get_clone()

In this example, we applied a special decorator to a method that adds a
profiling trace. The most important argument of the decorator is a trace
name. Its value is 'engine-on-action-complete' in our case. The second
argument 'hide_args' defines whether 'osprofiler' needs to pass method
argument values down to other layers. More specifically, there's a notion
metrics collector in 'osprofiler' that accumulates info about traces
in any desirable form, it depends on a particular implementation. This
topic though is out of the scope of this document. For our purposes, it's
better to set this argument to **True** which will not lead to loosing
performance on processing additional data (argument values of all method
calls).

Another way of adding a profiling trace is the following:

 .. code-block:: python

    try:
        profiler.start("engine-on-action-complete")

        action_handler.on_action_complete(action_ex, result)
    finally:
        profiler.stop()


Here we don't decorate the entire method, we only want to profile just one
line of code. But like in the previous example, we added a profiling trace.
The obvious advantage of using the decorator is that it can live in code
permanently because it doesn't pollute it too much and we can use them any
time we want to profile the system.

Even simpler and more concise way to achieve the same is use a special
context manager from 'osprofiler':

 .. code-block:: python

    with profiler.Trace('engine-on-action-complete'):
        action_handler.on_action_complete(action_ex, result)

Configuring Mistral for Profiling
---------------------------------

To start a profiling session, one needs to make the steps below.

Mistral Configuration File
^^^^^^^^^^^^^^^^^^^^^^^^^^

Make these change in the config file:

 .. code-block:: cfg

    [DEFAULT]
    log_config_append = wf_trace_logging.conf

    [profiler]
    enabled = True
    hmac_keys = secret_word

Defining the 'log_config_append' property allows to have all the logging
configuration in a separate file. In the example above, it's called
'wf_trace_logging.conf' but it can have a different name, if needed.
'[profiler]' group directly refers to the 'osprofiler' project and is
brought by it. The property 'enabled' is self-explaining, but the other one
is not. The value of the property 'hmac_keys' basically needs to be known
by someone who wants to start a profiling session. This value needs to be
passed as part of the user request. It will be shown a bit later.

Logging Configuration File
^^^^^^^^^^^^^^^^^^^^^^^^^^

The content of the logging configuration file conforms the documentation for
the standard 'logging' Python module. Find more details at
https://docs.python.org/3/library/logging.config.html#configuration-file-format

This particular example of the logging file configures three different loggers
and their corresponding counterparts like handlers. For the purpose of this
document though we only need to pay attention how 'profiler_trace' logger is
configure. Every entity starting with 'profiler' is related to profiling
configuration. The reason why other loggers are also included here is to show
how different loggers can coexist within one configuration file and how they
can reuse same entities.


 .. code-block:: cfg

    [loggers]
    keys=workflow_trace,profiler_trace,root

    [handlers]
    keys=consoleHandler, wfTraceFileHandler, profilerFileHandler, fileHandler

    [formatters]
    keys=wfFormatter, profilerFormatter, simpleFormatter, verboseFormatter

    [logger_workflow_trace]
    level=INFO
    handlers=consoleHandler, wfTraceFileHandler
    qualname=workflow_trace
    propagate=0

    [logger_profiler_trace]
    level=INFO
    handlers=profilerFileHandler
    qualname=profiler_trace

    [logger_root]
    level=DEBUG
    handlers=fileHandler

    [handler_fileHandler]
    class=FileHandler
    level=DEBUG
    formatter=verboseFormatter
    args=("/tmp/mistral.log",)

    [handler_consoleHandler]
    class=StreamHandler
    level=INFO
    formatter=simpleFormatter
    args=(sys.stdout,)

    [handler_wfTraceFileHandler]
    class=FileHandler
    level=INFO
    formatter=wfFormatter
    args=("/tmp/mistral_wf_trace.log",)

    [handler_profilerFileHandler]
    class=FileHandler
    level=INFO
    formatter=profilerFormatter
    args=("/tmp/mistral_osprofile.log",)

    [formatter_verboseFormatter]
    format=%(asctime)s %(thread)s %(levelname)s %(module)s [-] %(message)s
    datefmt=

    [formatter_simpleFormatter]
    format=%(asctime)s - %(message)s
    datefmt=%y-%m-%d %H:%M:%S

    [formatter_wfFormatter]
    format=%(asctime)s WF [-] %(message)s
    datefmt=

    [formatter_profilerFormatter]
    format=%(message)s
    datefmt=%H:%M:%S


Triggering Profiling Sessions
-----------------------------

Once Mistral is configured like explained above, in order to start a
profiling session we need to make a user request to Mistral that we
want to analyse but adding one property to it. The name of the property
is 'profile' and it needs to be set to the value of the 'hmac_keys'
property from the main configuration file.

.. code-block:: bash

    $ mistral execution-create my_slow_workflow --profile secret_word

Profiling Session Result
------------------------

When started in a profiling mode like just shown, Mistral will be writing
info about the profiling traces into the configured file. In our case it is
'/tmp/mistral_osprofile.log'.

 .. code-block:: cfg

    2020-02-27T08:04:25.789433          f12e75d5-5d59-4cbc-b74d-357f19290dd7 f12e75d5-5d59-4cbc-b74d-357f19290dd7 b9b29981-0916-4635-af18-d6c92f991f46 engine-start-workflow-start
    2020-02-27T08:04:25.790232          f12e75d5-5d59-4cbc-b74d-357f19290dd7 b9b29981-0916-4635-af18-d6c92f991f46 3cdd41b5-318a-4926-a38e-63344b6aef7a workflow-handler-start-workflow-start
    2020-02-27T08:04:25.812879          f12e75d5-5d59-4cbc-b74d-357f19290dd7 3cdd41b5-318a-4926-a38e-63344b6aef7a 603f1fab-be78-438d-af13-d94ed3b7e416 workflow-start-start
    2020-02-27T08:04:25.954502          f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 b1d0a77a-52f5-4415-a6c4-f16b3591a47d workflow-set-state-start
    2020-02-27T08:04:25.961298 0.006782 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 b1d0a77a-52f5-4415-a6c4-f16b3591a47d workflow-set-state-stop
    2020-02-27T08:04:25.961769          f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 27b58351-aebe-4e37-9cec-91fdbef5c68b wf-controller-get-controller-start
    2020-02-27T08:04:25.962041 0.000267 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 27b58351-aebe-4e37-9cec-91fdbef5c68b wf-controller-get-controller-stop
    2020-02-27T08:04:25.962311          f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 605ebfc2-a2bb-4fe1-8159-fc16f6741f5f workflow-controller-continue-workflow-start
    2020-02-27T08:04:26.023134 0.060832 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 605ebfc2-a2bb-4fe1-8159-fc16f6741f5f workflow-controller-continue-workflow-stop
    2020-02-27T08:04:26.023600          f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 3a5a384a-9598-4844-a740-981f92e604af dispatcher-dispatch-commands-start
    2020-02-27T08:04:26.023918          f12e75d5-5d59-4cbc-b74d-357f19290dd7 3a5a384a-9598-4844-a740-981f92e604af d84a13e4-4763-4321-ab08-8cbd19656f2f task-handler-run-task-start
    2020-02-27T08:04:26.024179          f12e75d5-5d59-4cbc-b74d-357f19290dd7 d84a13e4-4763-4321-ab08-8cbd19656f2f 7878e4f8-aaaa-4b9b-b15a-35848b5cdd61 task-handler-build-task-from-command-start
    2020-02-27T08:04:26.024422 0.000243 f12e75d5-5d59-4cbc-b74d-357f19290dd7 d84a13e4-4763-4321-ab08-8cbd19656f2f 7878e4f8-aaaa-4b9b-b15a-35848b5cdd61 task-handler-build-task-from-command-stop

So any time Mistral runs code marked as a profiling trace it prints two
entries into the file: right before the code snippet starts and right
after its completion. Notice also that for the corresponding "-stop" entry
(the suffix going after the trace name) Mistral prints an additional number
in the second column. This is a duration of the code snippet.

This content of this file itself is probably not so useful (although, it
might be for some purpose) but based on it we can build the following
report:

 .. code-block:: bash

    Total time | Max time | Avg time | Occurrences | Trace name
    -------------------------------------------------------------------------------------------
    2948.326     8.612      1.218      2420          engine-on-action-complete
    2859.172     8.516      1.181      2420          action-handler-on-action-complete
    2812.726     8.482      1.162      2420          task-handler-on-action-complete
    2767.836     8.412      1.144      2420          regular-task-on-action-complete
    2766.199     8.411      1.143      2420          task-complete
    2702.764     8.351      0.460      5878          task-run
    2506.531     8.354      0.850      2948          dispatcher-dispatch-commands
    2503.398     8.353      0.437      5735          task-handler-run-task
    2488.940     8.350      0.434      5735          task-run-new
    1669.179     54.737     0.881      1894          default-executor-run-action
    1201.582     3.687      0.497      2420          regular-task-get-action-input
    1126.351     2.093      0.476      2366          ad-hoc-action-validate-input
    1125.129     2.092      0.238      4732          ad-hoc-action-prepare-input
    687.619      7.594      0.651      1056          task-handler-refresh-task-state
    387.622      3.872      0.300      1291          workflow-handler-check-and-fix-integrity
    234.231      4.068      0.392      597           workflow-handler-check-and-complete
    224.026      4.042      0.375      597           workflow-check-and-complete
    210.184      6.694      1.470      143           task-run-existing
    160.118      8.343      0.304      526           workflow-action-schedule
    141.398      4.546      0.268      528           workflow-handler-start-workflow
    109.641      4.361      0.208      528           workflow-start
    78.683       2.004      0.077      1024          direct-wf-controller-get-join-logical-state

    ...

To generate this report, run:

 .. code-block:: bash

    $ python tools/rank_profiled_methods.py /tmp/mistral_osprofile.log report.txt

And this report is somewhat really useful when it comes to analysing
performance bottlenecks. All times are shown in seconds.