| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 
 | HOW-TO use rt-app and workgen to emulate a workload ?
**** rt-app and workgen ****
rt-app/workgen is a tool that can be used to emulate a use case. Not only the
sleep and run pattern can be emulated but also the dependency between tasks
like accessing same critical resources, creating sequencial wake up or syncing
the wake up of threads. The use case is described in a json like file which is
first parsed by workgen then rt-app.
workgen is a python script that will parse, check and update the json like file
in order to strictly match the json grammar before calling rt-app that will
execute the sequence described in it.
Even if the json protocol is used in rt-app, workgen enables some freedom
compared to stricter json grammar rules. For examples,
- You don't have to ensure uniqueness of key in json object, workgen will check
the file and ensure that each key is unique by appending an index if necessary.
The feature is quite useful when you describe a sequence made of same kind of
events e.g. a sequence of multiple sleep and run events.
- Some values can be omitted and workgen will add them when it parses the file
before starting the use case.
**** Compiling and cross-compiling rt-app ****
rt-app uses libjson-c to parse the .json file
You can either install the libjson-c2 and libjson-c-dev deb packages or compile
from source code available here: https://github.com/json-c/json-c
The following sequence can be used to cross compile rt-app with static linkage
for aarch64:
For libjson-c:
export ac_cv_func_malloc_0_nonnull=yes
export ac_cv_func_realloc_0_nonnull=yes
./autogen.sh
./configure --host=aarch64-linux-gnu --disable-shared --enable-static
make
For rt-app:
export ac_cv_lib_json_c_json_object_from_file=yes
./autogen.sh
./configure --host=aarch64-linux-gnu LDFLAGS=" --static	-L<path to parent of json repo>/json-c/." CFLAGS="-I<path to parent of json repo>" --with-deadline
make
e.g, with a directory structure like the following:
$ tree -d
.
├── autom4te.cache
├── build
├── build-aux
├── doc
│   └── examples
│       ├── cpufreq_governor_efficiency
│       ├── merge
│       └── tutorial
├── json-c
│   ├── autom4te.cache
│   ├── doc
│   │   └── html
│   └── tests
├── libdl
├── m4
└── src
the configure step becomes
./configure --host=aarch64-linux-gnu LDFLAGS="--static -L./../json-c/." CFLAGS="-I./.." --with-deadline
**** json file skeleton ****
The json file that describes a workload is made on 3 main objects: tasks,
resources and global objects. Only tasks object is mandatory; default value will
be used if global object is not defined and resources object is kept for
backward compatibility with old version of rt-app but it doesn't give any
benefit to declare it as the resources are now dynamically created by rt-app
when it parses the file.
**** global object ****
The global object defines parameters for the whole use case:
* duration : Integer. Duration of the use case in seconds. All the threads will
be killed once the duration has elapsed. if -1 has been set, the use case will
run indefinitly until all threads kill themselves (as an example if a finite
number of loop has been defined in their running pattern) or if a signal
is received to stop the use case.
* calibration : String or Integer: A String defines the CPU that will be used
to calibrate the ns per loop value. "CPU0" is the default value (see run event
section for details about ns per loop value). A integer skips the calibration
step and uses the integer value as ns per loop.
* default_policy : String. Default scheduling policy of threads. Default
value is SCHED_OTHER.
* pi_enabled: Boolean. Enable the prority inheritance of mutex. Default value
is False.
* lock_pages : Boolean. Lock the mem page in RAM. Locking the page in RAM
ensures that your RT thread will not be stalled until a page is moved from swap
to RAM.  The lock of the page is only possible for non CFS tasks. Default value
is False. 
* logdir : String. Path to store the various log files. The default path is
the current directory (./).
* log_basename : String. Prefix used for all log files of the use case.
"rt-app-" is used by default.
* log_size : String or Integer. A Integer defines a fix size in MB of the
temporary buffer (size per thread) that will be used to store the log data
before saving them in a file. This temporary buffer is used as a cicular
buffer so the oldest data will be lost in case of overflow. A string is used
to set a predifined behavior:
  - "file" will be used to store the log data directly in the file without
	using a temporary buffer.
  - "Disable" will disable the log mecahnism.
  - "Auto" will let rt-app compute the buffer size to not overflow the latter 
	during the use case.
The use of a temporary buffer prevents the threads of unexpected wait during
io access. The "Auto" mode is not implemented yet and fallback to "file" mode
for the moment.
* ftrace: Boolean. If enable, rt-app logs in ftrace the main events of the use
case. Default value is False.
* gnuplot : Boolean. if True, it will create a gnu plot compatible file for
each threads (see gnuplot section for more details). Default value is False.
"io_device" : Text. Path to the file which will be written to create IO-bounded
busy loop. Specify it carefully since it might damage the specified file.
Default value is "/dev/null".
"mem_buffer_size" : Integer. The size of per-thread memory buffer in byte being
used to create IO-bounded and memory-bounded busy loop. Default value is
4194304(4MB).
* cumulative_slack : Boolean. Accumulate slack (see below) measured during
  successive timer events in a phase. Default value is False (time between the
  end of last event and the end of the phase).
*** default global object:
	"global" : {
		"duration" : -1,
		"calibration" : "CPU0",
		"default_policy" : "SCHED_OTHER",
		"pi_enabled" : false,
		"lock_pages" : false,
		"logdir" : "./",
		"log_size" : "file",
		"log_basename" : "rt-app",
		"ftrace" : false,
		"gnuplot" : false,
		"io_device" : "/dev/null"
		"mem_buffer_size" : 4194304,
		"cumulative_slack" : false
	}
**** tasks object ****
 The tasks object is made of sub-objects that describe threads. No other kind
 of object than thread object is allowed. The key value of each object will be
 used as thread's name.
"tasks" : {
	"thread_name1" : {
		...
	},
	"thread_name2" : {
		...
	},
	"thread_name3" : {
		...
	},
	...
}
*** thread object ***
Thread object describes the behavior of one kind of thread which means that
several threads can be created with the same object (see instance parameter
below).
* instance : Integer. Define the number of threads that will be created with
the properties of this thread object. Default value is 1.
*** policy : String: Define the scheduling policy of the thread. default_policy
is used if not defined
* priority : Integer. Define the scheduling priority of the thread. The value
must be aligned with the range allowed by the policy. The default priority is
0 for sched_other class and 10 for other classes
* dl-runtime : Integer: Define the runtime budget for deadline scheduling class.
Default value is 0. The unit is usec.
* dl-period : Integer. Define the period duration for deadline scheduling class.
Default value is runtime. The unit is usec.
* dl-deadline : Integer. Define the deadline parameter for deadline scheduling
class. Default value is period. The unit is usec.
* cpus: Array of Integer. Define the CPU affinity of the thread. Default
value is all CPUs of the system. An example :  "cpus" : [0, 2, 3]
* delay: Integer. Initial delay before a thread starts execution. The unit
is usec.
* phases: Object. The phases object described TBF
If there is only 1 phase, the sequence of events can be directly put in the
thread object instead of creating a phases object. As an example, the 2 objects
below are equals:
"task" : {
	"thread1" : {
		"phases" : {
			"phase1" : {
				"run" : 10,
				"sleep" : 10
			}
		}
	}
}
"task" : {
	"thread1" : {
		"run" : 10,
		"sleep" : 10
	}
}
"cpus" can be specified at task level or phase level. As an example, below
creates two threads. One thread will run with affinity of CPU 2 and 3. The
second task will run first phase with affinity to CPU 0 and 1, second phase with
affinity to CPU 2, and the third phase with affinity to CPU 4, 5, and 6 (it will
inherit the affinity from the task level):
"task" : {
	"thread1" : {
		"cpus" : [2,3],
		"phases" : {
			"phase1" : {
				"run" : 10,
				"sleep" : 10
			}
		}
	}
	"thread2" : {
		"cpus" : [4,5,6],
		"phases" : {
			"phase1" : {
				"cpus" : [0,1],
				"run" : 10,
				"sleep" : 10
			}
			"phase2" : {
				"cpus" : [2],
				"run" : 10,
				"sleep" : 10
			}
			"phase3" : {
				"run" : 10,
				"sleep" : 10
			}
		}
	}
}
* loop: Integer. Define the number of times the parent object must be run.
The parent object can be a phase or a thread. For phase object, the loop
defines the number of time the phase will be executed before starting the next
phase. For thread object, the loop defines the number of times that the
complete "phases" object will be executed. The default value is -1.
*** events ***
events are simple action that will be performed by the thread or on the
thread. They have to be listed by execution order. 
* run : Integer. Emulate the execution of a load. The duration is defined in
usec but the run event will effectively run a number of time a loop that waste
cpu cycles. When the run event is executed, the duration is transformed in a
number of loop thanks to the ns per loop value (see calibration). This way of
working enables to emulate a load with a duration that will vary with the
frequency or the compute capacity of the CPU.
* runtime : Integer.  The duration is define in usec.  Similar to the
run event, it emulates the execution of a load.  Unlike run, runtime
runs for a specific amount of time irrespective of the compute
capacity of the CPU or the frequency.
* sleep : Integer. Emulate the sleep of a task. The duration is defined in
usec.
* mem : Integer. Emulate the memory write operation. The value defines the size
in byte to be written into the memory buffer. The size of the memory buffer is
defined by "mem_buffer_size" in "global" object.
* iorun : Integer. Emulate the IO write operation. The value defined the size
in byte to be write into the IO device specified by "io_device" in "global"
object.
* timer : Object. Emulate the wake up of the thread by a timer. Timer differs
from sleep event by the start time of the timer duration. Sleep duration starts
at the beginning of the sleep event whereas timer duration starts at the end of
the last use of the timer. So Timer event are immunized against preemption,
frequency scaling and computing capacity of a CPU. The initial starting time of
the timer is set during the 1st use of the latter.
          |<------19---------->|<------19---------->|<------19---------->|
task A ...|run 5|timer 19      |------run 5|timer 19|run 5|timer 19      |
task B                      |run 10  |   
As an example to point out the difference between sleep and timer, let consider
a task A that run 5 and then sleep 10. The period of task A should be 15.
Let's add a task B that runs 5 and use a timer to wakes up every 19. 
         |<------15------>|<------15------>|<------19---------->|<------15------>|<------16------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|
taskA ...|run 5|sleep 10  |run 5|sleep 10  |----run 5|sleep 10  |run 5|sleep 10  |-run 5|sleep 10  |----run 5|sleep 10  |----run 5|sleep 10  |----run 5|sleep 10  |----run 5|sleep 10  |----run 5|sleep 10  |
taskB ...|------run 5|timer 19|--run 5|timer 19    |run 5|timer 19      |run 5|timer 19      |run 5|timer 19      |run 5|timer 19      |run 5|timer 19      |run 5|timer 19      |run 5|timer 19      |
         |<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|
We can see that task B period stays to 19 even if the run event is delayed
because of scheduling preemption whereas the period of task A starts at 15 but
increases to 19 because of the scheduling delay.
"unique" timer device: When a thread that uses a timer is instanciated, the
timer will be shared across the thread instance which disturbs the original
sequence. In order to have 1 timer per instance, you can use the "unique"
prefix so a new timer will be created for each instance. Uniqueness
applies to the thread boundary which means that using the same unique
name in the sequence of a thread will refer to the same timer like the example
below:
	"phases" : {
		"light" : {
			"loop" : 10,
			"run" :   1000,
			"timer" : { "ref" : "unique", "period" : 30000 }
		},
		"heavy" : {
			"loop" : 10,
			"run" :   4000,
			"timer" : { "ref" : "unique", "period" : 30000 }
		}
	}
If you want to refer to different timer you must use different name like
below:
	"phases" : {
		"light" : {
			"loop" : 10,
			"run" :   1000,
			"timer" : { "ref" : "uniqueA", "period" : 30000 }
		},
		"heavy" : {
			"loop" : 10,
			"run" :   4000,
			"timer" : { "ref" : "uniqueB", "period" : 400000 }
		}
	}
Timers can work with a "relative" or an "absolute" reference. By default they
work in "relative" mode, but this mode can also be explicity specified as the
following:
	"phases" : {
		"phase0" : {
			"loop" : 10,
			"run0" :  10000,
			"timer0" : { "ref" : "unique", "period" : 20000, "mode" : "relative" },
	}
"relative" mode means that the reference for setting the next timer event is
relative to the end of the current phase. This in turn means that if, for some
reason (i.e., clock frequency was too low), events in a certain phase took too
long to execute and the timer of that phase couldn't actually fire at all, the
next phase won't be affected. For example:
  +---- +     +-----+     +-------------------+-----+     +---
  |r0   |     |r0   |     |r0                 |r0   |     |r0
  |     |     |     |     |                   |     |     |
  o-----------o-----------o-------------------o-----------o------->
  0    10    20    30    40    50    60    70    80   100   120
              ^           ^           ^                   ^
              |           |           | MISS!             |
              +           +           +                   +
            Timer0      Timer0       Timer0              Timer0
In this example character "o" denotes when phases finish/start. Third
activation of Timer0 is missed, since r0 executed for more that 20ms. However
the next phase is not affected as Timer0 was set considering the instant of
time when the misbehaving r0 finished executing.
"absolute" mode is specified as the following:
	"phases" : {
		"phase0" : {
			"loop" : 10,
			"run0" :  10000,
			"timer0" : { "ref" : "unique", "period" : 20000, "mode" : "absolute" },
	}
"absolute" mode means that the reference for setting the next timer event is
fixed and always consider the starting time of the first phase. This means that
if, for some reason (i.e., clock frequency was too low), events in a certain
phase took too long to execute and the timer of that phase couldn't actually
fire at all, the next phase (and potentially other subsequent phases) _will_ be
affected. For example, considering again the example above:
  +---- +     +-----+     +-------------------+-----+-----+   +---
  |r0   |     |r0   |     |r0                 |r0   |r0   |   |r0
  |     |     |     |     |                   |     |     |   |
  o-----------o-----------o-------------------o-----o---------o---->
  0    10    20    30    40    50    60    70    80   100   120
              ^           ^           ^           ^           ^
              |           |           | MISS!     | MISS!     |
              +           +           +           +           +
            Timer0      Timer0       Timer0      Timer0      Timer0
Third activation of Timer0 is missed, since r0 executed for more that 20ms.
Even if 4th activation of r0 executes for 10ms (as specified in the
configuration), 4th Timer0 is still missed because the reference didn't change.
In this example 5th activation of r0 then managed to recover, but in general it
depends on how badly a certain phase misbehaves.
* lock : String. Lock the mutex defined by the string value.
* unlock : String. Unlock the mutex defined by the string value.
* wait : Object {"ref" : String, "mutex" : String }. Block the calling thread
until another thread sends a wake up signal to the resource defined by "ref".
The mutex defined by "mutex" is used during the block sequence: See
pthread_cond_wait() for more details about the sequence (especially regarding
the use of the mutex).
* signal : String. Send a wake up signal to the resource defined by the
string. The 1st thread in the wait list will be wokn up. See
pthread_cond_signal() for more details.
* broad : String. Send a wake up signal to the resource defined by the
string. All threads that are blocked on the resource wil wake up. See
pthread_cond_broadcast() for more details.
* sync : Object {"ref" : String, "mutex" : String }. Atomically wakes 
up a blocked thread and then blocks the calling thread on the condition. 
taskA ...|run 5|wait|------------|run 5| wait |----------------
taskB ...-------------------|sync|-------- 
The sync event "sync" : {"ref" : "CondA", "mutex" : "mutexA" } generates the
following sequence:
{
	"lock" : "mutexA",
	"signal" : "CondA",
	"wait" : { "ref" : "condA", "mutex" : "mutexA" },
	"unlock" : "mutexA"
}
* barrier : String. Used as at least a pair where the name must match.
Any number of matching uses will cause all threads hitting the barrier event
to wait for a signal. The number of users is recorded, so that when the last user
hits the barrier event, that thread will broadcast and continue to the next
step. This is conceptually exactly the same as a pthread_barrier_wait operation
however, using a pthread_barrier would impose some strict conditions on usage
around thread cleanups - primarily that you cannot cancel an in-progress barrier
operation which would mean that we have to restrict cleanup to only be possible
at the end of a loop cycle (i.e. all phases are complete). This would be too
restrictive for most uses so here we use an alternative.
In this implementation, the barrier event manages its own mutex and uses a
variable shared between users to track waiting tasks, protected by the mutex.
You must use a unique name for each sync event since the number of users is
taken from the number of references to the name in the input json. i.e. each
name must represent a single sync point, and be shared amongst all threads
which wish to syncronise at that point.
The barrier event "barrier" : "SyncPointA"
generates the following sequence:
{
    "lock" : "SyncPointA" (internal mutex),
    If the shared variable is 0:
      "signal" : "SyncPointA" (internal condvar),
    Else:
      decrement the shared variable,
      "wait" : { "ref" : "SyncPointA" (internal condvar),
                 "mutex" : "SyncPointA" (internal mutex) },
      increment the shared variable,
    "unlock" : "SyncPointA" (internal mutex)
}
* suspend : String. Block the calling thread until another thread wakes it up
with resume. The String can be let empty as it will be filled by workgen with
the right thread's name before starting the use case.
* resume : String. Wake up the thread defined by the string.
taskA ...|run 5|suspend |----------------|run 5|suspend |----------------
taskB ...-------------------|resume taskA|run 10    |-------------------- 
* yield: String. Calls pthread_yield(), freeing the CPU for other tasks. This has a 
special meaning for SCHED_DEADLINE tasks. String can be empty.
**** Trace and Log ****
Some traces and log hooks have been added to ease the debug and monitor various
metrics of the use cases
*** Trace ***
A trace can be inserted into ftrace buffer for each event in order to sync
kernel events like sched_switch with the use case's sequence.
A simple example of ftrace log:
         thread0-23345 [003] 115591.560884: tracing_mark_write: [0] starts
         thread0-23345 [003] 115591.560890: tracing_mark_write: [0] begins loop 0 phase 0 step 0
         thread0-23345 [003] 115591.560894: tracing_mark_write: [0] executing 0
         thread0-23345 [003] 115591.580212: tracing_mark_write: [0] executing 1
         thread0-23345 [003] 115591.580217: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0     [003] 115591.670198: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
         thread0-23345 [003] 115591.670243: tracing_mark_write: [0] end loop 0 phase 0 step 0
         thread0-23345 [003] 115591.670251: tracing_mark_write: [0] begins loop 0 phase 0 step 1
         thread0-23345 [003] 115591.670254: tracing_mark_write: [0] executing 0
         thread0-23345 [003] 115591.688081: tracing_mark_write: [0] executing 1
         thread0-23345 [003] 115591.688085: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0     [003] 115591.778063: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
         thread0-23345 [003] 115591.778108: tracing_mark_write: [0] end loop 0 phase 0 step 1
         thread0-23345 [003] 115591.778116: tracing_mark_write: [0] begins loop 0 phase 0 step 2
         thread0-23345 [003] 115591.778119: tracing_mark_write: [0] executing 0
         thread0-23345 [003] 115591.794619: tracing_mark_write: [0] executing 1
         thread0-23345 [003] 115591.794623: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0     [003] 115591.884607: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
         thread0-23345 [003] 115591.884652: tracing_mark_write: [0] end loop 0 phase 0 step 2
		 ...
         thread0-23345 [003] 115593.394022: tracing_mark_write: [0] begins loop 0 phase 0 step 17
         thread0-23345 [003] 115593.394025: tracing_mark_write: [0] executing 0
         thread0-23345 [003] 115593.410583: tracing_mark_write: [0] executing 1
         thread0-23345 [003] 115593.410594: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0     [003] 115593.500567: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
         thread0-23345 [003] 115593.500612: tracing_mark_write: [0] end loop 0 phase 0 step 17
         thread0-23345 [003] 115593.500620: tracing_mark_write: [0] begins loop 0 phase 0 step 18
         thread0-23345 [003] 115593.500623: tracing_mark_write: [0] executing 0
         thread0-23345 [003] 115593.520198: tracing_mark_write: [0] executing 1
         thread0-23345 [003] 115593.520202: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0     [003] 115593.610185: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
         thread0-23345 [003] 115593.610230: tracing_mark_write: [0] end loop 0 phase 0 step 18
         thread0-23345 [003] 115593.610258: tracing_mark_write: [0] exiting
The main events are :
* "tracing_mark_write: [0] starts" indicates the start of the thread with index 0
* "tracing_mark_write: [0] begins loop A phase B loop C" indicates that the
thread with index 0 starts to execute the loop C of the phase B of the main
loop A
* "tracing_mark_write: [0] executing X": indicates that the thread with index 0
execute the event X of the current phase
-"tracing_mark_write: [0] end loop A phase B step C" indicates that the
thread with index 0 ends to execute the event C of the phase B of the loop A
*"tracing_mark_write: [0] exiting": indiscates that the trhead with index 0 has
finished to execute its events.
*** Log and gnuplot ***
You can log some metrics for each phase that is executed by a thread. These
metrics are:
- perf: number of loop that has been executed by run events
- run: time spent by the thread to execute the run events
- period: duration to execute the complte phase
- start/end : absolute start and end time of a phase. Same time base is used in
  ftrace
- rel_st: start time of a phase relatively to the beg of the use case
- slack: if global option "cumulative_slack" (see above) is false, time between
  the end of last event and the end of the phase, e.g.
  taskA ...|-- run5 --|- sleep5 -|-- run5--|..timer20.|-- run5 --|- sleep5 -|-- run6 --|.timer20.|
            <--------------- period 20 --------------> <--------------- period 20 -------------->
                                            <-slack5->                                  <slack4->
  it can also be negative if the execution of a phases' events overshoots the
  current period, e.g.
  taskA ...|-- run5 --|- sleep5 -|------- run30 ------xxxxxxxxxx|
            <--------------- period 20 -------------->
                                                       <slack-5>
  if global option "cumulative_slack" is true, all the intermediate slacks of a
  phase with multiple timers are accumulated and reported when the phase
  completes
- c_duration: sum of the configured duration of run/runtime events
- c_period: sum of the timer(s) period(s)
- wu_lat: sum of wakeup latencies after timer events
Below is an extract of a log:
# Policy : SCHED_OTHER priority : 0
#idx     perf      run   period           start             end          rel_st      slack c_duration   c_period     wu_lat
   0    92164    19935    98965    504549567051    504549666016            2443      78701      20000     100000        266
   0    92164    19408    99952    504549666063    504549766015          101455      80217      20000     100000        265
   0    92164    19428    99952    504549766062    504549866014          201454      80199      20000     100000        264
   0    92164    19438    99955    504549866060    504549966015          301452      80190      20000     100000        265
   0    92164    19446    99952    504549966061    504550066013          401453      80093      20000     100000        264
   0    92164    19415    99953    504550066060    504550166013          501452      80215      20000     100000        263
   0    92164    19388    99954    504550166059    504550266013          601451      80242      20000     100000        264
   0    92164    19444    99956    504550266060    504550366015          701452      80185      20000     100000        265
Some gnuplot files are also created to generate charts based on the log files 
for each thread and for each kind of metrics. The format of the chart that
will be generated by gnuplot is eps (text art has been used for the chart
below)
                  Measured thread0 Loop stats
                                                Load [nb loop]
  120000 ++--+----+---+----+---+---+----+---+----+--++ 506000     load  ******
         +   +    +   +    +   +   +    +   +    +   +             run  ######
         |$$ :    :   :    :   $$$ :    :   :    :   |          period  $$$$$$
  110000 ++.$$$........$$$$...$...$..........$$$.$$$$+
         |   : $$$$$$$$    $$ $:   $ $$$$$$$$   $:   $
         |   :    :   :    : $ :   :$   :   :    :  ++ 504000
         |   :    :   :    :   :   :    :   :    :   |
  100000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :   |
   90000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :  ++ 502000
         |   :    :   :    :   :   :    :   :    :   |
   80000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :   |
   70000 +******************************************** 500000
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :   |
   60000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :  ++ 498000
   50000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :   |
         |   :    :   :    :   :   :    :   :    :   |
   40000 ++.........................................++
         |   :    :   :    :   :   :    :   :    :  ++ 496000
         |#  :    :   :    :   :   :    :   :    :   |
         | # :    :   :    :   ### :    :   :    :   |
   30000 ++.###.###....####...#...#..........###.####+
         |   : #  :####    ### :   # ########   #:   #
         +   +    +   +    +   +   +#   +   +    +   +
   20000 ++--+----+---+----+---+---+----+---+----+--++ 494000
       17560556057560556057560656065606756065606756075607
                    Loop start time [msec]
 |