1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
|
# HangWatcher
HangWatcher is a mechanism for detecting hangs in Chrome, logging their
frequency and nature in UMA and uploading crash reports.
## Definition of a hang
In this document a hang is qualified as any scope that does not complete
within a certain wall-time allowance. A scope is defined by the lifetime
of a `WatchHangsInScope` object. The time-out value can be different for
each individual scope.
### Example 1
A task on the IO thread encounters a lock on which it blocks for 20s.
There is absolutely no progress made as the OS is bound to deschedule
the thread while the contention on the lock remains. This is a hang.
### Example 2
A small function that should execute relatively quickly spends 30s
burning CPU without making any outwardly visible progress. In this
case there is progress made by the thread in a sense, since the
[program counter](https://en.wikipedia.org/wiki/Program_counter)
is not static for the duration of the time-out. However, as far as
Chrome, and critically its user, is concerned we are stuck and not
making progress. This is a hang.
### Example 3
A message pump is busy pumping millions of tasks and dispatches
them quickly. The task at the end of the queue has to wait for up
to 30s to get executed. This is not a hang. This is congestion.
See //content/scheduler/responsiveness for more details.
## Design
Hangs are monitored by one thread per process. This is a thread in
the OS sense. It is not based on `base::Thread` and does not use
the task posting APIs.
Other threads that want to be monitored register with this watcher
thread. This can be done at thread creation or at any other time.
Monitored threads do not have any responsibilities apart from
marking the entering and leaving of monitored scopes. This is
done using a `WatchHangsInScope` object that is instantiated
on the stack, at the beginning of the scope.
### Example:
```
void FooBar(){
WatchHangsInScope scope(base::TimeDelta::FromSeconds(5));
DoWork();
}
```
The HangWatcher thread periodically traverses the list of
registered threads and verifies that they are not hung
within a monitored scope.
```
+-------------+ +-----------------+ +-----------------+
| HangWatcher | | WatchedThread1 | | WatchedThread2 |
+-------------+ +-----------------+ +-----------------+
| | |
| Init() | |
|------- | |
| | | |
|<------ | |
| | |
| Register() | |
|<----------------------| |
| | |
| | Register() |
|<----------------------------------------------------------------|
| | |
| | | SetDeadline()
| | |--------------
| | | |
| | |<-------------
| | |
| | | ClearDeadline()
| | |----------------
| | | |
| | |<---------------
| | |
| Monitor() | |
|---------------------->| |
| | ------------------------\ |
| |-| No deadline, no hang. | |
| | |-----------------------| |
| | |
| Monitor() | |
|---------------------------------------------------------------->|
| | | ------------------------\
| | |-| No deadline, no hang. |
| | | |-----------------------|
| | |
| | SetDeadline() |
| |-------------- |
| | | |
| |<------------- |
| | |
| Monitor() | |
|---------------------->| -------------------------------\ |
| |-| Live expired deadline. Hang! | |
| | |------------------------------| |
| | |
| RecordHang() | |
|------------- | |
| | | |
|<------------ | |
| | |
```
## Protections against non-actionable reports
### Ignoring normal long running code
There are cases where code is expected to take a long time to complete.
It's possible to keep such cases from triggering the detection of a hang.
Invoking `HangWatcher::InvalidateActiveExpectations()` from within a
scope will make sure that not hangs are logged while execution is within it.
### Example:
```
void RunTask(Task task) {
// In general, tasks shouldn't hang.
WatchHangsInScope scope(base::TimeDelta::FromSeconds(5));
std::move(task.task).Run(); // Calls `TaskKnownToBeVeryLong`.
}
void TaskKnownToBeVeryLong() {
// This particular function is known to take a long time. Never report it as a
// hang.
HangWatcher::InvalidateActiveExpectations();
BlockWaitingForUserInput();
}
```
### Protections against wrongfully blaming code
TODO
### Ignoring system suspend
TODO
|