File: ipc.md

package info (click to toggle)
firefox-esr 140.4.0esr-1~deb13u1
  • links: PTS, VCS
  • area: main
  • in suites: trixie-proposed-updates
  • size: 4,539,284 kB
  • sloc: cpp: 7,381,286; javascript: 6,388,710; ansic: 3,710,139; python: 1,393,780; xml: 628,165; asm: 426,916; java: 184,004; sh: 65,742; makefile: 19,302; objc: 13,059; perl: 12,912; yacc: 4,583; cs: 3,846; pascal: 3,352; lex: 1,720; ruby: 1,226; exp: 762; php: 436; lisp: 258; awk: 247; sql: 66; sed: 54; csh: 10
file content (209 lines) | stat: -rw-r--r-- 8,632 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
# Inter-process Communication (IPC)

Firefox Desktop is a multi-process desktop application.
Code requiring instrumentation may be on any of its processes,
so FOG provide facilities to do just that.

## Design

The IPC Design of FOG was worked out in
[bug 1618253](https://bugzilla.mozilla.org/show_bug.cgi?id=1618253).

It centred around a few specific concepts:

### Forbidding Non-Commutative Operations

Because we cannot nicely impose a canonical ordering of metric operations across all processes,
FOG forbids non-[commutative](https://en.wikipedia.org/wiki/Commutative_property)
metric operations in some circumstances.

For example,
`Add()`-ing to a Counter metric works from multiple processes because the order doesn't matter.
However, given a String metric being `Set()` from multiple processes simultaneously,
which value should it take?

This ambiguity is not a good foundation to build trust on,
so we forbid setting a String metric from multiple processes.

#### List of Forbidden Operations

* Boolean's `set` (this is the metric type's only operation)
* Labeled Boolean's `set` (this is the metric type's only operation)
* String's `set` (this is the metric type's only operation)
* Labeled String's `set` (this is the metric type's only operation)
* String List's `set`
    * `add` is permitted (order and uniqueness are not guaranteed)
* Timespan's `start`, `stop`, and `cancel` (these are the metric type's only operations)
* UUID's `set` and `generateAndSet` (these are the metric type's only operations)
* Datetime's `set` (this is the metric type's only operation)
* Quantity's `set` (this is the metric type's only operation)
* Labeled Quantity's `set` (this is the metric type's only operation)

This list may grow over time as new metric types are added.

#### The Unsafety Valve: `permit_non_commutative_operations_over_ipc`

If you wish to forgo FOG's protections and guarantees around ordering,
and use non-commutative operations in child processes,
you may mark your metric definition with the
`permit_non_commutative_operations_over_ipc` metadata property,
like so:

```yaml
unordered_category:
  unordered_boolean_metric:
    type: boolean
    metadata:
      permit_non_commutative_operations_over_ipc: true
    ...
```

This presently only supports:
* Boolean metrics
* Labeled Boolean metrics

```{note}
If there's an metric type not on this list that you need to use in a non-parent process,
please reach out
[on the #glean channel](https://chat.mozilla.org/#/room/#glean:mozilla.org)
and we'll help you out.
```

### Process Agnosticism

For metric types that can be used cross-process,
FOG provides no facility for identifying which process the instrumentation is on.

What this means is that if you accumulate to a
[Timing Distribution](https://mozilla.github.io/glean/book/user/metrics/timing_distribution.html)
in multiple processes,
all the samples from all the processes will be combined in the same metric.

If you wish to distinguish samples from different process types,
you will need multiple metrics and inline code to select the proper one for the given process.
For example:

```C++
if (XRE_GetProcessType() == GeckoProcessType_Default) {
  mozilla::glean::performance::cache_size.Accumulate(numBytes / 1024);
} else {
  mozilla::glean::performance::non_main_process_cache_size.Accumulate(numBytes / 1024);
}
```

### Scheduling

FOG makes no guarantee about when non-main-process metric values are sent across IPC.
FOG will try its best to schedule opportunistically in idle moments,
and during orderly shutdowns.

There are a few cases where we provide more firm guarantees:

#### Tests

There are test-only APIs in Rust, C++, and Javascript.
These do not await a flush of child process metric values.
You can use the test-only method `testFlushAllChildren` on the `FOG`
XPCOM component to await child data's arrival:
```js
await Services.fog.testFlushAllChildren();
```
See [the test documentation](testing.md) for more details on testing FOG.
For writing tests about instrumentation, see
[the instrumentation test documentation](../user/instrumentation_tests).

#### Pings

We do not guarantee that non-main-process data has made it into a specific ping.

[Built-in pings](https://mozilla.github.io/glean/book/user/pings/index.html)
are submitted by the Rust Glean SDK at times FOG doesn't directly control,
so there may be data not present in the parent process when a built-in ping is submitted.
We don't anticipate this causing a problem since child-process data that
"misses" a given ping will be included in the next one.

At this time,
[Custom Pings](https://mozilla.github.io/glean/book/user/pings/custom.html)
must be sent in the parent process and have no mechanism
to schedule their submission for after child-process data arrives in the parent process.
[bug 1732118](https://bugzilla.mozilla.org/show_bug.cgi?id=1732118)
tracks the addition of such a mechanism or guarantee.

#### Shutdown

We will make a best effort during an orderly shutdown to flush all pending data in child processes.
This means a disorderly shutdown (usually a crash)
may result in child process data being lost.

#### Size

We don't measure or keep an up-to-date calculation of the size of the IPC Payload.
We do, however, keep a count of the number of times the IPC Payload has been accessed.
This is used as a (very) conservative estimate of the size of the IPC Payload so we do not exceed the
[IPC message size limit](https://searchfox.org/mozilla-central/search?q=kMaximumMessageSize).

See [bug 1745660](https://bugzilla.mozilla.org/show_bug.cgi?id=1745660).

### Mechanics

The rough design is that the parent process can request an immediate flush of pending data,
and each child process can decide to flush its pending data whenever it wishes.
The former is via `FlushFOGData() returns (ByteBuf)` and the latter via  `FOGData(ByteBuf)`.

Pending Data is a buffer of bytes generated by `bincode` in Rust in the Child,
handed off to C++, passed over IPC,
then given back to `bincode` in Rust on the Parent.

Rust is then responsible for turning the pending data into
[metrics API][glean-metrics] calls on the metrics in the parent process.

#### Supported Process Types

FOG supports messaging between the following types of child process and the parent process:
* content children (via `PContent`
  (for now. See [bug 1641989](https://bugzilla.mozilla.org/show_bug.cgi?id=1641989))
* gmp children (via `PGMP`)
* gpu children (via `PGPU`)
* rdd children (via `PRDD`)
* socket children (via `PSocketProcess`)
* utility children (via `PUtilityProcess`)

See
[the process model docs](/dom/ipc/process_model.rst)
for more information about what that means.

### Adding Support for a new Process Type

Adding support for a new process type is a matter of extending the two messages
mentioned above in "Mechanics" to another process type's protocol (ipdl file).

1. Add two messages to the appropriate sections in `P<ProcessType>.ipdl`
    * (( **Note:** `PGPU` _should_ be the only ipdl where `parent`
      means the non-parent/-main/-UI process,
      but double-check that you get this correct.))
    * Add `async FOGData(ByteBuf&& aBuf);` to the parent/main/UI process side of things
      (most often `parent:`).
    * Add `async FlushFOGData() returns (ByteBuf buf);` to the non-parent/-main/-UI side
      (most often `child:`).
2. Implement the protocol endpoints in `P<ProcessType>{Child|Parent}.{h|cpp}`
    * The message added to the `parent: ` section goes in
      `P<ProcessType>Parent.{h|cpp}` and vice versa.
3. Add to `FOGIPC.cpp`'s `FlushAllChildData` code that
    1. Enumerates all processes of the newly-supported type (there may only be one),
    2. Calls `SendFlushFOGData on each, and
    3. Adds the resulting promise to the array.
4. Add to `FOGIPC.cpp`'s `SendFOGData` the correct `GeckoProcessType_*`
   enum value, and appropriate code for getting the parent process singleton and calling
   `SendFOGData` on it.
5. Add to the fog crate's `register_process_shutdown` function
   handling for at-shutdown flushing of IPC data.
   If this isn't added, we will log (but not panic)
   on the first use of Glean APIs on an unsupported process type.
    * "Handling" might be an empty block with a comment explaining where to find it
      (like how `PROCESS_TYPE_DEFAULT` is handled)
    * Or it might be custom code
      (like `PROCESS_TYPE_CONTENT`'s)
6. Add to the documented [list of supported process types](#supported-process-types)
   the process type you added support for.

[glean-metrics]: https://mozilla.github.io/glean/book/reference/metrics/index.html