1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
|
1. Tracepoint BPF contexts store a pointer to the pt_regs struct as the first
element in the context, yet BPF programs cannot read that value. That means
that DTrace uregs[] is not available for tracepoints.
2. BPF does not implement any functionality to read from userspace addresses.
We will need a bpf_probe_read_user() helper.
3. BPF has no concept of non-destructive vs destructive operation. Maybe there
should be an attribute that can be set to indicate whether a program is
allowed to use destructive operations. The bpf_probe_write_user() is one
that definitely shouldn't be allowed unless destructive code is allowed.
4. Hash map keys are initialized on the stack, and are therefore limited by the
BPF stack size (currently 512 bytes). It is not possible to use memory
obtained as value of a scratch map. That might be an enhancement that can
be offered, although there may be reasons why only stack is allowed. One
possible reason is that stack ensures that there cannot be uninitialized
data in the key (since the verifier tracks stack location usage).
Given that the default string size limit in D is set at 256 bytes, this may
not be as much of a limitation. However, given that dynamic variable keys
can be a tuple of multiple values, it could be multiple strings.
5. There is currently no way to determine (in eBPF) whether the program is
running because an event happened in a runing task or whether it is running
due to an event triggered from a hard IRQ. That is an issue for access to
TLS variables because we shouldn't consider hard IRQ triggered events part
of the execution thread of the underlying (interrupted) task.
6. I am pretty certain that the hashmap implementation is not safe under
potential race conditions between two BPF programs. The following scenario
seems possible:
BPF prog 1 BPF prog 2
---------- ----------
a[123] = 123;
valp = &a[123];
a[123] = 0;
a[456] = 456;
val = *valp;
If the hash slot freed by the deallocation of a[123] happens to get reused
for storing a[456], BPF prog 2 might end up reading the value 456, while the
only two valid values should be 123 and 0.
This problem has been acknowledged by BPF developers, and the use of
spinlocks seems to be the recommended solution (available at this point in
eBPF). It remains to be seen whether this is a good solutiuon for DTrace.
We could also use some kind of state on the value to track references and
only actually delete the entry when there are no more references left (and
a program has indicated it should be deleted.)
--
A. DIF-based DTrace generates a block of DIF code to be executed for every
statement, and in fact, it sometimes generates multiple blocks of code for
a single statement (e.g. for the arguments to printf). In BPF that won't
be the right way to do things, so the compiler needs to know how to create
more complex blocks of code, and userspace needs to have enough meta-data to
be able to process output data that contains multiple data items.
|