HOW-TO use rt-app and workgen to emulate a workload ? **** rt-app and workgen **** rt-app/workgen is a tool that can be used to emulate a use case. Not only the sleep and run pattern can be emulated but also the dependency between tasks like accessing same critical resources, creating sequencial wake up or syncing the wake up of threads. The use case is described in a json like file which is first parsed by workgen then rt-app. workgen is a python script that will parse, check and update the json like file in order to strictly match the json grammar before calling rt-app that will execute the sequence described in it. Even if the json protocol is used in rt-app, workgen enables some freedom compared to stricter json grammar rules. For examples, - You don't have to ensure uniqueness of key in json object, workgen will check the file and ensure that each key is unique by appending an index if necessary. The feature is quite useful when you describe a sequence made of same kind of events e.g. a sequence of multiple sleep and run events. - Some values can be omitted and workgen will add them when it parses the file before starting the use case. **** Compiling and cross-compiling rt-app **** rt-app uses libjson-c to parse the .json file You can either install the libjson-c2 and libjson-c-dev deb packages or compile from source code available here: https://github.com/json-c/json-c The following sequence can be used to cross compile rt-app with static linkage for aarch64: For libjson-c: export ac_cv_func_malloc_0_nonnull=yes export ac_cv_func_realloc_0_nonnull=yes ./autogen.sh ./configure --host=aarch64-linux-gnu --disable-shared --enable-static make For rt-app: export ac_cv_lib_json_c_json_object_from_file=yes ./autogen.sh ./configure --host=aarch64-linux-gnu LDFLAGS=" --static -L/json-c/. libs/" CFLAGS="-I" --with-deadline make e.g, with a directory structure like the following: $ tree -d . ├── autom4te.cache ├── build ├── build-aux ├── doc │   └── examples │   ├── cpufreq_governor_efficiency │   ├── merge │   └── tutorial ├── json-c │   ├── autom4te.cache │   ├── doc │   │   └── html │   └── tests ├── libdl ├── m4 └── src the configure step becomes ./configure --host=aarch64-linux-gnu LDFLAGS="--static -L./../json-c/." CFLAGS="-I./.." --with-deadline **** json file skeleton **** The json file that describes a workload is made on 3 main objects: tasks, resources and global objects. Only tasks object is mandatory; default value will be used if global object is not defined and resources object is kept for backward compatibility with old version of rt-app but it doesn't give any benefit to declare it as the resources are now dynamically created by rt-app when it parses the file. **** global object **** The global object defines parameters for the whole use case: * duration : Integer. Duration of the use case in seconds. All the threads will be killed once the duration has elapsed. if -1 has been set, the use case will run indefinitly until all threads kill themselves (as an example if a finite number of loop has been defined in their running pattern) or if a signal is received to stop the use case. * calibration : String or Integer: A String defines the CPU that will be used to calibrate the ns per loop value. "CPU0" is the default value (see run event section for details about ns per loop value). A integer skips the calibration step and uses the integer value as ns per loop. * default_policy : String. Default scheduling policy of threads. Default value is SCHED_OTHER. * pi_enabled: Boolean. Enable the prority inheritance of mutex. Default value is False. * lock_pages : Boolean. Lock the mem page in RAM. Locking the page in RAM ensures that your RT thread will not be stalled until a page is moved from swap to RAM. The lock of the page is only possible for non CFS tasks. Default value is False. * logdir : String. Path to store the various log files. The default path is the current directory (./). * log_basename : String. Prefix used for all log files of the use case. "rt-app-" is used by default. * log_size : String or Integer. A Integer defines a fix size in MB of the temporary buffer (size per thread) that will be used to store the log data before saving them in a file. This temporary buffer is used as a cicular buffer so the oldest data will be lost in case of overflow. A string is used to set a predifined behavior: - "file" will be used to store the log data directly in the file without using a temporary buffer. - "Disable" will disable the log mecahnism. - "Auto" will let rt-app compute the buffer size to not overflow the latter during the use case. The use of a temporary buffer prevents the threads of unexpected wait during io access. The "Auto" mode is not implemented yet and fallback to "file" mode for the moment. * ftrace: Boolean. If enable, rt-app logs in ftrace the main events of the use case. Default value is False. * gnuplot : Boolean. if True, it will create a gnu plot compatible file for each threads (see gnuplot section for more details). Default value is False. "io_device" : Text. Path to the file which will be written to create IO-bounded busy loop. Specify it carefully since it might damage the specified file. Default value is "/dev/null". "mem_buffer_size" : Integer. The size of per-thread memory buffer in byte being used to create IO-bounded and memory-bounded busy loop. Default value is 4194304(4MB). * cumulative_slack : Boolean. Accumulate slack (see below) measured during successive timer events in a phase. Default value is False (time between the end of last event and the end of the phase). *** default global object: "global" : { "duration" : -1, "calibration" : "CPU0", "default_policy" : "SCHED_OTHER", "pi_enabled" : false, "lock_pages" : false, "logdir" : "./", "log_size" : "file", "log_basename" : "rt-app", "ftrace" : false, "gnuplot" : false, "io_device" : "/dev/null" "mem_buffer_size" : 4194304, "cumulative_slack" : false } **** tasks object **** The tasks object is made of sub-objects that describe threads. No other kind of object than thread object is allowed. The key value of each object will be used as thread's name. "tasks" : { "thread_name1" : { ... }, "thread_name2" : { ... }, "thread_name3" : { ... }, ... } *** thread object *** Thread object describes the behavior of one kind of thread which means that several threads can be created with the same object (see instance parameter below). * instance : Integer. Define the number of threads that will be created with the properties of this thread object. Default value is 1. *** policy : String: Define the scheduling policy of the thread. default_policy is used if not defined * priority : Integer. Define the scheduling priority of the thread. The value must be aligned with the range allowed by the policy. The default priority is 0 for sched_other class and 10 for other classes * dl-runtime : Integer: Define the runtime budget for deadline scheduling class. Default value is 0. The unit is usec. * dl-period : Integer. Define the period duration for deadline scheduling class. Default value is runtime. The unit is usec. * dl-deadline : Integer. Define the deadline parameter for deadline scheduling class. Default value is period. The unit is usec. * cpus: Array of Integer. Define the CPU affinity of the thread. Default value is all CPUs of the system. An example : "cpus" : [0, 2, 3] * delay: Integer. Initial delay before a thread starts execution. The unit is usec. * phases: Object. The phases object described TBF If there is only 1 phase, the sequence of events can be directly put in the thread object instead of creating a phases object. As an example, the 2 objects below are equals: "task" : { "thread1" : { "phases" : { "phase1" : { "run" : 10, "sleep" : 10 } } } } "task" : { "thread1" : { "run" : 10, "sleep" : 10 } } * loop: Integer. Define the number of times the parent object must be run. The parent object can be a phase or a thread. For phase object, the loop defines the number of time the phase will be executed before starting the next phase. For thread object, the loop defines the number of times that the complete "phases" object will be executed. The default value is -1. *** events *** events are simple action that will be performed by the thread or on the thread. They have to be listed by execution order. * run : Integer. Emulate the execution of a load. The duration is defined in usec but the run event will effectively run a number of time a loop that waste cpu cycles. When the run event is executed, the duration is transformed in a number of loop thanks to the ns per loop value (see calibration). This way of working enables to emulate a load with a duration that will vary with the frequency or the compute capacity of the CPU. * runtime : Integer. The duration is define in usec. Similar to the run event, it emulates the execution of a load. Unlike run, runtime runs for a specific amount of time irrespective of the compute capacity of the CPU or the frequency. * sleep : Integer. Emulate the sleep of a task. The duration is defined in usec. * mem : Integer. Emulate the memory write operation. The value defines the size in byte to be written into the memory buffer. The size of the memory buffer is defined by "mem_buffer_size" in "global" object. * iorun : Integer. Emulate the IO write operation. The value defined the size in byte to be write into the IO device specified by "io_device" in "global" object. * timer : Object. Emulate the wake up of the thread by a timer. Timer differs from sleep event by the start time of the timer duration. Sleep duration starts at the beginning of the sleep event whereas timer duration starts at the end of the last use of the timer. So Timer event are immunized against preemption, frequency scaling and computing capacity of a CPU. The initial starting time of the timer is set during the 1st use of the latter. |<------19---------->|<------19---------->|<------19---------->| task A ...|run 5|timer 19 |------run 5|timer 19|run 5|timer 19 | task B |run 10 | As an example to point out the difference between sleep and timer, let consider a task A that run 5 and then sleep 10. The period of task A should be 15. Let's add a task B that runs 5 and use a timer to wakes up every 19. |<------15------>|<------15------>|<------19---------->|<------15------>|<------16------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->| taskA ...|run 5|sleep 10 |run 5|sleep 10 |----run 5|sleep 10 |run 5|sleep 10 |-run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 | taskB ...|------run 5|timer 19|--run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 | |<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->| We can see that task B period stays to 19 even if the run event is delayed because of scheduling preemption whereas the period of task A starts at 15 but increases to 19 because of the scheduling delay. "unique" timer device: When a thread that uses a timer is instanciated, the timer will be shared across the thread instance which disturbs the original sequence. In order to have 1 timer per instance, you can use the "unique" prefix so a new timer will be created for each instance. Uniqueness applies to the thread boundary which means that using the same unique name in the sequence of a thread will refer to the same timer like the example below: "phases" : { "light" : { "loop" : 10, "run" : 1000, "timer" : { "ref" : "unique", "period" : 30000 } }, "heavy" : { "loop" : 10, "run" : 4000, "timer" : { "ref" : "unique", "period" : 30000 } } } If you want to refer to different timer you must use different name like below: "phases" : { "light" : { "loop" : 10, "run" : 1000, "timer" : { "ref" : "uniqueA", "period" : 30000 } }, "heavy" : { "loop" : 10, "run" : 4000, "timer" : { "ref" : "uniqueB", "period" : 400000 } } } Timers can work with a "relative" or an "absolute" reference. By default they work in "relative" mode, but this mode can also be explicity specified as the following: "phases" : { "phase0" : { "loop" : 10, "run0" : 10000, "timer0" : { "ref" : "unique", "period" : 20000, "mode" : "relative" }, } "relative" mode means that the reference for setting the next timer event is relative to the end of the current phase. This in turn means that if, for some reason (i.e., clock frequency was too low), events in a certain phase took too long to execute and the timer of that phase couldn't actually fire at all, the next phase won't be affected. For example: +---- + +-----+ +-------------------+-----+ +--- |r0 | |r0 | |r0 |r0 | |r0 | | | | | | | | o-----------o-----------o-------------------o-----------o-------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ ^ ^ | | | MISS! | + + + + Timer0 Timer0 Timer0 Timer0 In this example character "o" denotes when phases finish/start. Third activation of Timer0 is missed, since r0 executed for more that 20ms. However the next phase is not affected as Timer0 was set considering the instant of time when the misbehaving r0 finished executing. "absolute" mode is specified as the following: "phases" : { "phase0" : { "loop" : 10, "run0" : 10000, "timer0" : { "ref" : "unique", "period" : 20000, "mode" : "absolute" }, } "absolute" mode means that the reference for setting the next timer event is fixed and always consider the starting time of the first phase. This means that if, for some reason (i.e., clock frequency was too low), events in a certain phase took too long to execute and the timer of that phase couldn't actually fire at all, the next phase (and potentially other subsequent phases) _will_ be affected. For example, considering again the example above: +---- + +-----+ +-------------------+-----+-----+ +--- |r0 | |r0 | |r0 |r0 |r0 | |r0 | | | | | | | | | o-----------o-----------o-------------------o-----o---------o----> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ ^ ^ ^ | | | MISS! | MISS! | + + + + + Timer0 Timer0 Timer0 Timer0 Timer0 Third activation of Timer0 is missed, since r0 executed for more that 20ms. Even if 4th activation of r0 executes for 10ms (as specified in the configuration), 4th Timer0 is still missed because the reference didn't change. In this example 5th activation of r0 then managed to recover, but in general it depends on how badly a certain phase misbehaves. * lock : String. Lock the mutex defined by the string value. * unlock : String. Unlock the mutex defined by the string value. * wait : Object {"ref" : String, "mutex" : String }. Block the calling thread until another thread sends a wake up signal to the resource defined by "ref". The mutex defined by "mutex" is used during the block sequence: See pthread_cond_wait() for more details about the sequence (especially regarding the use of the mutex). * signal : String. Send a wake up signal to the resource defined by the string. The 1st thread in the wait list will be wokn up. See pthread_cond_signal() for more details. * broad : String. Send a wake up signal to the resource defined by the string. All threads that are blocked on the resource wil wake up. See pthread_cond_broadcast() for more details. * sync : Object {"ref" : String, "mutex" : String }. Atomically wakes up a blocked thread and then blocks the calling thread on the condition. taskA ...|run 5|wait|------------|run 5| wait |---------------- taskB ...-------------------|sync|-------- The sync event "sync" : {"ref" : "CondA", "mutex" : "mutexA" } generates the following sequence: { "lock" : "mutexA", "signal" : "CondA", "wait" : { "ref" : "condA", "mutex" : "mutexA" }, "unlock" : "mutexA" } * suspend : String. Block the calling thread until another thread wakes it up with resume. The String can be let empty as it will be filled by workgen with the right thread's name before starting the use case. * resume : String. Wake up the thread defined by the string. taskA ...|run 5|suspend |----------------|run 5|suspend |---------------- taskB ...-------------------|resume taskA|run 10 |-------------------- * yield: String. Calls pthread_yield(), freeing the CPU for other tasks. This has a special meaning for SCHED_DEADLINE tasks. String can be empty. **** Trace and Log **** Some traces and log hooks have been added to ease the debug and monitor various metrics of the use cases *** Trace *** A trace can be inserted into ftrace buffer for each event in order to sync kernel events like sched_switch with the use case's sequence. A simple example of ftrace log: thread0-23345 [003] 115591.560884: tracing_mark_write: [0] starts thread0-23345 [003] 115591.560890: tracing_mark_write: [0] begins loop 0 phase 0 step 0 thread0-23345 [003] 115591.560894: tracing_mark_write: [0] executing 0 thread0-23345 [003] 115591.580212: tracing_mark_write: [0] executing 1 thread0-23345 [003] 115591.580217: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 -0 [003] 115591.670198: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120 thread0-23345 [003] 115591.670243: tracing_mark_write: [0] end loop 0 phase 0 step 0 thread0-23345 [003] 115591.670251: tracing_mark_write: [0] begins loop 0 phase 0 step 1 thread0-23345 [003] 115591.670254: tracing_mark_write: [0] executing 0 thread0-23345 [003] 115591.688081: tracing_mark_write: [0] executing 1 thread0-23345 [003] 115591.688085: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 -0 [003] 115591.778063: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120 thread0-23345 [003] 115591.778108: tracing_mark_write: [0] end loop 0 phase 0 step 1 thread0-23345 [003] 115591.778116: tracing_mark_write: [0] begins loop 0 phase 0 step 2 thread0-23345 [003] 115591.778119: tracing_mark_write: [0] executing 0 thread0-23345 [003] 115591.794619: tracing_mark_write: [0] executing 1 thread0-23345 [003] 115591.794623: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 -0 [003] 115591.884607: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120 thread0-23345 [003] 115591.884652: tracing_mark_write: [0] end loop 0 phase 0 step 2 ... thread0-23345 [003] 115593.394022: tracing_mark_write: [0] begins loop 0 phase 0 step 17 thread0-23345 [003] 115593.394025: tracing_mark_write: [0] executing 0 thread0-23345 [003] 115593.410583: tracing_mark_write: [0] executing 1 thread0-23345 [003] 115593.410594: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 -0 [003] 115593.500567: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120 thread0-23345 [003] 115593.500612: tracing_mark_write: [0] end loop 0 phase 0 step 17 thread0-23345 [003] 115593.500620: tracing_mark_write: [0] begins loop 0 phase 0 step 18 thread0-23345 [003] 115593.500623: tracing_mark_write: [0] executing 0 thread0-23345 [003] 115593.520198: tracing_mark_write: [0] executing 1 thread0-23345 [003] 115593.520202: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 -0 [003] 115593.610185: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120 thread0-23345 [003] 115593.610230: tracing_mark_write: [0] end loop 0 phase 0 step 18 thread0-23345 [003] 115593.610258: tracing_mark_write: [0] exiting The main events are : * "tracing_mark_write: [0] starts" indicates the start of the thread with index 0 * "tracing_mark_write: [0] begins loop A phase B loop C" indicates that the thread with index 0 starts to execute the loop C of the phase B of the main loop A * "tracing_mark_write: [0] executing X": indicates that the thread with index 0 execute the event X of the current phase -"tracing_mark_write: [0] end loop A phase B step C" indicates that the thread with index 0 ends to execute the event C of the phase B of the loop A *"tracing_mark_write: [0] exiting": indiscates that the trhead with index 0 has finished to execute its events. *** Log and gnuplot *** You can log some metrics for each phase that is executed by a thread. These metrics are: - perf: number of loop that has been executed by run events - run: time spent by the thread to execute the run events - period: duration to execute the complte phase - start/end : absolute start and end time of a phase. Same time base is used in ftrace - rel_st: start time of a phase relatively to the beg of the use case - slack: if global option "cumulative_slack" (see above) is false, time between the end of last event and the end of the phase, e.g. taskA ...|-- run5 --|- sleep5 -|-- run5--|..timer20.|-- run5 --|- sleep5 -|-- run6 --|.timer20.| <--------------- period 20 --------------> <--------------- period 20 --------------> <-slack5-> it can also be negative if the execution of a phases' events overshoots the current period, e.g. taskA ...|-- run5 --|- sleep5 -|------- run30 ------xxxxxxxxxx| <--------------- period 20 --------------> if global option "cumulative_slack" is true, all the intermediate slacks of a phase with multiple timers are accumulated and reported when the phase completes - c_duration: sum of the configured duration of run/runtime events - c_period: sum of the timer(s) period(s) - wu_lat: sum of wakeup latencies after timer events Below is an extract of a log: # Policy : SCHED_OTHER priority : 0 #idx perf run period start end rel_st slack c_duration c_period wu_lat 0 92164 19935 98965 504549567051 504549666016 2443 78701 20000 100000 266 0 92164 19408 99952 504549666063 504549766015 101455 80217 20000 100000 265 0 92164 19428 99952 504549766062 504549866014 201454 80199 20000 100000 264 0 92164 19438 99955 504549866060 504549966015 301452 80190 20000 100000 265 0 92164 19446 99952 504549966061 504550066013 401453 80093 20000 100000 264 0 92164 19415 99953 504550066060 504550166013 501452 80215 20000 100000 263 0 92164 19388 99954 504550166059 504550266013 601451 80242 20000 100000 264 0 92164 19444 99956 504550266060 504550366015 701452 80185 20000 100000 265 Some gnuplot files are also created to generate charts based on the log files for each thread and for each kind of metrics. The format of the chart that will be generated by gnuplot is eps (text art has been used for the chart below) Measured thread0 Loop stats Load [nb loop] 120000 ++--+----+---+----+---+---+----+---+----+--++ 506000 load ****** + + + + + + + + + + + run ###### |$$ : : : : $$$ : : : : | period $$$$$$ 110000 ++.$$$........$$$$...$...$..........$$$.$$$$+ | : $$$$$$$$ $$ $: $ $$$$$$$$ $: $ | : : : : $ : :$ : : : ++ 504000 | : : : : : : : : : | 100000 ++.........................................++ | : : : : : : : : : | | : : : : : : : : : | 90000 ++.........................................++ | : : : : : : : : : ++ 502000 | : : : : : : : : : | 80000 ++.........................................++ | : : : : : : : : : | | : : : : : : : : : | 70000 +******************************************** 500000 | : : : : : : : : : | | : : : : : : : : : | | : : : : : : : : : | 60000 ++.........................................++ | : : : : : : : : : | | : : : : : : : : : ++ 498000 50000 ++.........................................++ | : : : : : : : : : | | : : : : : : : : : | 40000 ++.........................................++ | : : : : : : : : : ++ 496000 |# : : : : : : : : : | | # : : : : ### : : : : | 30000 ++.###.###....####...#...#..........###.####+ | : # :#### ### : # ######## #: # + + + + + + +# + + + + 20000 ++--+----+---+----+---+---+----+---+----+--++ 494000 17560556057560556057560656065606756065606756075607 Loop start time [msec]