File: state-machine.md

package info (click to toggle)
lava 2026.01-2
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 30,796 kB
  • sloc: python: 82,790; javascript: 16,658; sh: 1,364; makefile: 335
file content (109 lines) | stat: -rw-r--r-- 3,291 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# State machine

The state machine describes and controls the state and health of Workers,
Devices and Test Jobs.

## Workers

For each worker, two variables describe the current status: **state** and **health**.

### State

The **state** is an internal variable, set by [lava-server-gunicorn](./services/lava-server-gunicorn.md) when receiving (or not) pings from each worker.

* *Online*: the worker is sending PING to the server
* *Offline*: the worker hasn't sent any messages for a while

### Health

When worker **health** is set to *Maintenance*, no jobs will be ran on the
attached devices.

The worker health can be:

* *Active*
* *Maintenance*
* *Retired*


!!! warning
    When a worker is *Offline*, none of the attached devices will be used to schedule new jobs.

## Devices


For each device, two variables describe the current status: **state** and **health**.

### State

The **state** is an internal variable, set by [lava-scheduler](./services/lava-scheduler.md) and [lava-server-gunicorn](./services/lava-server-gunicorn.md) when scheduling, starting, canceling and ending test jobs.

* *Idle*: not in use by any test job
* *Reserved*: has been reserved for a test job but the test job is not running yet
* *Running*: currently running a test job

### Health

The **health** can be used by admins to indicate if a device should be used by the scheduler or not.

Moreover, when ending an health-check, the device health will be set according to the test job health.

* *Good*: the device passed the health-check
* *Unknown*
* *Looping*: should run health-checks in a loop
* *Bad*: the device failed the health-check
* *Maintenance*
* *Retired*


## TestJobs

For each test job, two variables are describing the current status: **state** and **health**.

### State

* *Submitted*: waiting in the queue
* *Scheduling*: part of a multinode test job where some sub-jobs are still in *Submitted*
* *Scheduled*: has been scheduled. For multinode, it means that all sub-jobs are also scheduled
* *Running*: currently running on a device
* *Canceling*: has been canceled but not ended yet
* *Finished*

!!! note Multinode scheduling
    Only multinode test jobs use *Scheduling*. When all
    sub-jobs are in *Scheduling*, [lava-scheduler](./services/lava-scheduler.md) will transition all test
    jobs to *Scheduled*.

### Health

* *Unknown*: default value that will be overridden when the job is finished.
* *Complete*: the job was able to finish
* *Incomplete*: the job was not able to dinish
* *Canceled*: the test job was canceled.


# Scheduler

The scheduler is called by [lava-scheduler](./services/lava-scheduler.md)
approximately every 20 seconds or when receiving specific events.
The scheduler starts by scheduling health-checks. The remaining devices are
then considered for test jobs.

## Health-checks

To ensure that health-checks are always scheduled when needed, they will be
considered first by the scheduler before regular test jobs.

The scheduler will only consider devices where:

* `device state` is *Idle*
* `device health` is *Good*, *Unknown* or *Looping*
* `worker state` is *Online*

## Test jobs

The scheduler will only consider devices where:

* ``device state`` is *Idle*
* ``device health`` is *Good* or *Unknown*
* ``worker state`` is *Online*