1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
|
#!/usr/bin/env tesh
p Testing a simple master/worker example application handling failures TCP crosstraffic DISABLED
! output sort 19
$ ${bindir:=.}/c-platform-failures --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_failures.xml ${srcdir:=.}/../../cpp/platform-failures/s4u-platform-failures_d.xml --cfg=network/crosstraffic:0 "--log=root.fmt:[%10.6r]%e(%i:%a@%h)%e%m%n" --log=res_cpu.t:verbose
> [ 0.000000] (0:maestro@) Cannot launch actor 'worker' on failed host 'Fafard'
> [ 0.000000] (0:maestro@) Starting actor worker(Fafard) failed because its host is turned off.
> [ 0.000000] (1:master@Tremblay) Got 5 workers and 20 tasks to process
> [ 0.000000] (1:master@Tremblay) Send a message to worker-0
> [ 0.010309] (1:master@Tremblay) Send to worker-0 completed
> [ 0.010309] (2:worker@Tremblay) Start execution...
> [ 0.000000] (2:worker@Tremblay) Waiting a message on worker-0
> [ 0.000000] (3:worker@Jupiter) Waiting a message on worker-1
> [ 0.000000] (5:worker@Ginette) Waiting a message on worker-3
> [ 0.000000] (6:worker@Bourassa) Waiting a message on worker-4
> [ 0.010309] (1:master@Tremblay) Send a message to worker-1
> [ 1.000000] (0:maestro@) Restart actors on host Fafard
> [ 1.000000] (7:worker@Fafard) Waiting a message on worker-2
> [ 1.000000] (1:master@Tremblay) Mmh. The communication with 'worker-1' failed. Nevermind. Let's keep going!
> [ 1.000000] (1:master@Tremblay) Send a message to worker-2
> [ 2.000000] (1:master@Tremblay) Mmh. The communication with 'worker-2' failed. Nevermind. Let's keep going!
> [ 2.000000] (0:maestro@) Restart actors on host Jupiter
> [ 2.000000] (1:master@Tremblay) Send a message to worker-3
> [ 2.000000] (8:worker@Jupiter) Waiting a message on worker-1
> [ 2.010309] (2:worker@Tremblay) Execution complete.
> [ 2.010309] (2:worker@Tremblay) Waiting a message on worker-0
> [ 3.030928] (1:master@Tremblay) Send to worker-3 completed
> [ 3.030928] (1:master@Tremblay) Send a message to worker-4
> [ 3.030928] (5:worker@Ginette) Start execution...
> [ 4.061856] (1:master@Tremblay) Send to worker-4 completed
> [ 4.061856] (1:master@Tremblay) Send a message to worker-0
> [ 4.061856] (6:worker@Bourassa) Start execution...
> [ 4.072165] (1:master@Tremblay) Send to worker-0 completed
> [ 4.072165] (1:master@Tremblay) Send a message to worker-1
> [ 4.072165] (2:worker@Tremblay) Start execution...
> [ 5.030928] (5:worker@Ginette) Execution complete.
> [ 5.030928] (5:worker@Ginette) Waiting a message on worker-3
> [ 5.103093] (1:master@Tremblay) Send to worker-1 completed
> [ 5.103093] (1:master@Tremblay) Send a message to worker-2
> [ 5.103093] (8:worker@Jupiter) Start execution...
> [ 6.061856] (6:worker@Bourassa) Execution complete.
> [ 6.061856] (6:worker@Bourassa) Waiting a message on worker-4
> [ 6.072165] (2:worker@Tremblay) Execution complete.
> [ 6.072165] (2:worker@Tremblay) Waiting a message on worker-0
> [ 7.103093] (8:worker@Jupiter) Execution complete.
> [ 7.103093] (8:worker@Jupiter) Waiting a message on worker-1
> [ 15.103093] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 15.103093] (1:master@Tremblay) Send a message to worker-3
> [ 15.103093] (1:master@Tremblay) Mmh. The communication with 'worker-3' failed. Nevermind. Let's keep going!
> [ 15.103093] (1:master@Tremblay) Send a message to worker-4
> [ 15.103093] (5:worker@Ginette) Mmh. Something went wrong. Nevermind. Let's keep going!
> [ 15.103093] (5:worker@Ginette) Waiting a message on worker-3
> [ 16.134021] (1:master@Tremblay) Send to worker-4 completed
> [ 16.134021] (1:master@Tremblay) Send a message to worker-0
> [ 16.134021] (6:worker@Bourassa) Start execution...
> [ 16.144330] (1:master@Tremblay) Send to worker-0 completed
> [ 16.144330] (1:master@Tremblay) Send a message to worker-1
> [ 16.144330] (2:worker@Tremblay) Start execution...
> [ 17.175258] (1:master@Tremblay) Send to worker-1 completed
> [ 17.175258] (1:master@Tremblay) Send a message to worker-2
> [ 17.175258] (8:worker@Jupiter) Start execution...
> [ 18.134021] (6:worker@Bourassa) Execution complete.
> [ 18.134021] (6:worker@Bourassa) Waiting a message on worker-4
> [ 18.144330] (2:worker@Tremblay) Execution complete.
> [ 18.144330] (2:worker@Tremblay) Waiting a message on worker-0
> [ 19.175258] (8:worker@Jupiter) Execution complete.
> [ 19.175258] (8:worker@Jupiter) Waiting a message on worker-1
> [ 27.175258] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 27.175258] (1:master@Tremblay) Send a message to worker-3
> [ 28.206186] (1:master@Tremblay) Send to worker-3 completed
> [ 28.206186] (1:master@Tremblay) Send a message to worker-4
> [ 28.206186] (1:master@Tremblay) Mmh. The communication with 'worker-4' failed. Nevermind. Let's keep going!
> [ 28.206186] (1:master@Tremblay) Send a message to worker-0
> [ 28.206186] (5:worker@Ginette) Start execution...
> [ 28.206186] (6:worker@Bourassa) Mmh. Something went wrong. Nevermind. Let's keep going!
> [ 28.206186] (6:worker@Bourassa) Waiting a message on worker-4
> [ 28.216495] (1:master@Tremblay) Send to worker-0 completed
> [ 28.216495] (1:master@Tremblay) Send a message to worker-1
> [ 28.216495] (2:worker@Tremblay) Start execution...
> [ 29.247423] (1:master@Tremblay) Send to worker-1 completed
> [ 29.247423] (1:master@Tremblay) Send a message to worker-2
> [ 29.247423] (8:worker@Jupiter) Start execution...
> [ 30.206186] (5:worker@Ginette) Execution complete.
> [ 30.206186] (5:worker@Ginette) Waiting a message on worker-3
> [ 30.216495] (2:worker@Tremblay) Execution complete.
> [ 30.216495] (2:worker@Tremblay) Waiting a message on worker-0
> [ 31.247423] (8:worker@Jupiter) Execution complete.
> [ 31.247423] (8:worker@Jupiter) Waiting a message on worker-1
> [ 39.247423] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 39.247423] (1:master@Tremblay) Send a message to worker-3
> [ 40.278351] (1:master@Tremblay) Send to worker-3 completed
> [ 40.278351] (1:master@Tremblay) Send a message to worker-4
> [ 40.278351] (5:worker@Ginette) Start execution...
> [ 41.309278] (1:master@Tremblay) Send to worker-4 completed
> [ 41.309278] (1:master@Tremblay) All tasks have been dispatched. Let's tell everybody the computation is over.
> [ 41.309278] (2:worker@Tremblay) I'm done. See you!
> [ 41.309278] (6:worker@Bourassa) Start execution...
> [ 41.309278] (8:worker@Jupiter) I'm done. See you!
> [ 42.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 43.309278] (0:maestro@) Simulation time 43.3093
> [ 43.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-3'. Nevermind. Let's keep going!
> [ 43.309278] (1:master@Tremblay) Goodbye now!
> [ 43.309278] (6:worker@Bourassa) Execution complete.
> [ 43.309278] (6:worker@Bourassa) Waiting a message on worker-4
> [ 43.309278] (6:worker@Bourassa) I'm done. See you!
p Testing a simple master/worker example application handling failures. TCP crosstraffic ENABLED
! output sort 19
$ ${bindir:=.}/c-platform-failures --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_failures.xml ${srcdir:=.}/../../cpp/platform-failures/s4u-platform-failures_d.xml "--log=root.fmt:[%10.6r]%e(%i:%a@%h)%e%m%n" --log=res_cpu.t:verbose
> [ 0.000000] (0:maestro@) Cannot launch actor 'worker' on failed host 'Fafard'
> [ 0.000000] (0:maestro@) Starting actor worker(Fafard) failed because its host is turned off.
> [ 0.000000] (1:master@Tremblay) Got 5 workers and 20 tasks to process
> [ 0.000000] (1:master@Tremblay) Send a message to worker-0
> [ 0.000000] (2:worker@Tremblay) Waiting a message on worker-0
> [ 0.000000] (3:worker@Jupiter) Waiting a message on worker-1
> [ 0.000000] (5:worker@Ginette) Waiting a message on worker-3
> [ 0.000000] (6:worker@Bourassa) Waiting a message on worker-4
> [ 0.010825] (2:worker@Tremblay) Start execution...
> [ 0.010825] (1:master@Tremblay) Send to worker-0 completed
> [ 0.010825] (1:master@Tremblay) Send a message to worker-1
> [ 1.000000] (0:maestro@) Restart actors on host Fafard
> [ 1.000000] (7:worker@Fafard) Waiting a message on worker-2
> [ 1.000000] (1:master@Tremblay) Mmh. The communication with 'worker-1' failed. Nevermind. Let's keep going!
> [ 1.000000] (1:master@Tremblay) Send a message to worker-2
> [ 2.000000] (0:maestro@) Restart actors on host Jupiter
> [ 2.000000] (8:worker@Jupiter) Waiting a message on worker-1
> [ 2.000000] (1:master@Tremblay) Mmh. The communication with 'worker-2' failed. Nevermind. Let's keep going!
> [ 2.000000] (1:master@Tremblay) Send a message to worker-3
> [ 2.010825] (2:worker@Tremblay) Execution complete.
> [ 2.010825] (2:worker@Tremblay) Waiting a message on worker-0
> [ 3.082474] (5:worker@Ginette) Start execution...
> [ 3.082474] (1:master@Tremblay) Send to worker-3 completed
> [ 3.082474] (1:master@Tremblay) Send a message to worker-4
> [ 4.164948] (6:worker@Bourassa) Start execution...
> [ 4.164948] (1:master@Tremblay) Send to worker-4 completed
> [ 4.164948] (1:master@Tremblay) Send a message to worker-0
> [ 4.175773] (2:worker@Tremblay) Start execution...
> [ 4.175773] (1:master@Tremblay) Send to worker-0 completed
> [ 4.175773] (1:master@Tremblay) Send a message to worker-1
> [ 5.082474] (5:worker@Ginette) Execution complete.
> [ 5.082474] (5:worker@Ginette) Waiting a message on worker-3
> [ 5.258247] (8:worker@Jupiter) Start execution...
> [ 5.258247] (1:master@Tremblay) Send to worker-1 completed
> [ 5.258247] (1:master@Tremblay) Send a message to worker-2
> [ 6.164948] (6:worker@Bourassa) Execution complete.
> [ 6.164948] (6:worker@Bourassa) Waiting a message on worker-4
> [ 6.175773] (2:worker@Tremblay) Execution complete.
> [ 6.175773] (2:worker@Tremblay) Waiting a message on worker-0
> [ 7.258247] (8:worker@Jupiter) Execution complete.
> [ 7.258247] (8:worker@Jupiter) Waiting a message on worker-1
> [ 15.258247] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 15.258247] (1:master@Tremblay) Send a message to worker-3
> [ 15.258247] (5:worker@Ginette) Mmh. Something went wrong. Nevermind. Let's keep going!
> [ 15.258247] (5:worker@Ginette) Waiting a message on worker-3
> [ 15.258247] (1:master@Tremblay) Mmh. The communication with 'worker-3' failed. Nevermind. Let's keep going!
> [ 15.258247] (1:master@Tremblay) Send a message to worker-4
> [ 16.340722] (6:worker@Bourassa) Start execution...
> [ 16.340722] (1:master@Tremblay) Send to worker-4 completed
> [ 16.340722] (1:master@Tremblay) Send a message to worker-0
> [ 16.351546] (2:worker@Tremblay) Start execution...
> [ 16.351546] (1:master@Tremblay) Send to worker-0 completed
> [ 16.351546] (1:master@Tremblay) Send a message to worker-1
> [ 17.434021] (8:worker@Jupiter) Start execution...
> [ 17.434021] (1:master@Tremblay) Send to worker-1 completed
> [ 17.434021] (1:master@Tremblay) Send a message to worker-2
> [ 18.340722] (6:worker@Bourassa) Execution complete.
> [ 18.340722] (6:worker@Bourassa) Waiting a message on worker-4
> [ 18.351546] (2:worker@Tremblay) Execution complete.
> [ 18.351546] (2:worker@Tremblay) Waiting a message on worker-0
> [ 19.434021] (8:worker@Jupiter) Execution complete.
> [ 19.434021] (8:worker@Jupiter) Waiting a message on worker-1
> [ 27.434021] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 27.434021] (1:master@Tremblay) Send a message to worker-3
> [ 28.516495] (5:worker@Ginette) Start execution...
> [ 28.516495] (1:master@Tremblay) Send to worker-3 completed
> [ 28.516495] (1:master@Tremblay) Send a message to worker-4
> [ 28.516495] (6:worker@Bourassa) Mmh. Something went wrong. Nevermind. Let's keep going!
> [ 28.516495] (6:worker@Bourassa) Waiting a message on worker-4
> [ 28.516495] (1:master@Tremblay) Mmh. The communication with 'worker-4' failed. Nevermind. Let's keep going!
> [ 28.516495] (1:master@Tremblay) Send a message to worker-0
> [ 28.527320] (2:worker@Tremblay) Start execution...
> [ 28.527320] (1:master@Tremblay) Send to worker-0 completed
> [ 28.527320] (1:master@Tremblay) Send a message to worker-1
> [ 29.609794] (8:worker@Jupiter) Start execution...
> [ 29.609794] (1:master@Tremblay) Send to worker-1 completed
> [ 29.609794] (1:master@Tremblay) Send a message to worker-2
> [ 30.516495] (5:worker@Ginette) Execution complete.
> [ 30.516495] (5:worker@Ginette) Waiting a message on worker-3
> [ 30.527320] (2:worker@Tremblay) Execution complete.
> [ 30.527320] (2:worker@Tremblay) Waiting a message on worker-0
> [ 31.609794] (8:worker@Jupiter) Execution complete.
> [ 31.609794] (8:worker@Jupiter) Waiting a message on worker-1
> [ 39.609794] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 39.609794] (1:master@Tremblay) Send a message to worker-3
> [ 40.692268] (5:worker@Ginette) Start execution...
> [ 40.692268] (1:master@Tremblay) Send to worker-3 completed
> [ 40.692268] (1:master@Tremblay) Send a message to worker-4
> [ 41.774742] (6:worker@Bourassa) Start execution...
> [ 41.774742] (1:master@Tremblay) Send to worker-4 completed
> [ 41.774742] (1:master@Tremblay) All tasks have been dispatched. Let's tell everybody the computation is over.
> [ 41.774742] (2:worker@Tremblay) I'm done. See you!
> [ 41.774742] (8:worker@Jupiter) I'm done. See you!
> [ 42.774742] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going!
> [ 43.774742] (6:worker@Bourassa) Execution complete.
> [ 43.774742] (6:worker@Bourassa) Waiting a message on worker-4
> [ 43.774742] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-3'. Nevermind. Let's keep going!
> [ 43.774742] (6:worker@Bourassa) I'm done. See you!
> [ 43.774742] (1:master@Tremblay) Goodbye now!
> [ 43.774742] (0:maestro@) Simulation time 43.7747
p NOT testing the mixture of failures and CpuTI:
p This test leads to a deadlock because of a bug somewhere in EngineImpl::solve.
p We should debug this instead of ignoring the issue, but it's utterly
p complex with such an integration test. One day, we will setup a set of
p unit tests for the model's solver, and such issues will be addressable again.
p For the time being, I just give up, sorry.
p $ ${bindir:=.}/c-platform-failures --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_failures.xml ${srcdir:=.}/../../cpp/platform-failures/s4u-platform-failures_d.xml --cfg=cpu/optim:TI "--log=root.fmt:[%10.6r]%e(%i:%a@%h)%e%m%n" --log=res_cpu.t:verbose
|