File: BPF-ISSUES

package info (click to toggle)
dtrace 2.0.5-1
links: PTS
area: main
in suites: sid
size: 24,408 kB
sloc: ansic: 61,247; sh: 17,997; asm: 1,717; lex: 947; awk: 754; yacc: 695; perl: 37; sed: 17; makefile: 15
file content (59 lines) | stat: -rw-r--r-- 3,121 bytes
1. Tracepoint BPF contexts store a pointer to the pt_regs struct as the first
   element in the context, yet BPF programs cannot read that value.  That means
   that DTrace uregs[] is not available for tracepoints.

2. BPF does not implement any functionality to read from userspace addresses.
   We will need a bpf_probe_read_user() helper.

3. BPF has no concept of non-destructive vs destructive operation.  Maybe there
   should be an attribute that can be set to indicate whether a program is
   allowed to use destructive operations.  The bpf_probe_write_user() is one
   that definitely shouldn't be allowed unless destructive code is allowed.

4. Hash map keys are initialized on the stack, and are therefore limited by the
   BPF stack size (currently 512 bytes).  It is not possible to use memory
   obtained as value of a scratch map.  That might be an enhancement that can
   be offered, although there may be reasons why only stack is allowed.  One
   possible reason is that stack ensures that there cannot be uninitialized
   data in the key (since the verifier tracks stack location usage).

   Given that the default string size limit in D is set at 256 bytes, this may
   not be as much of a limitation.  However, given that dynamic variable keys
   can be a tuple of multiple values, it could be multiple strings.

5. There is currently no way to determine (in eBPF) whether the program is
   running because an event happened in a runing task or whether it is running
   due to an event triggered from a hard IRQ.  That is an issue for access to
   TLS variables because we shouldn't consider hard IRQ triggered events part
   of the execution thread of the underlying (interrupted) task.

6. I am pretty certain that the hashmap implementation is not safe under
   potential race conditions between two BPF programs.  The following scenario
   seems possible:

		BPF prog 1			BPF prog 2
		----------			----------
		a[123] = 123;
						valp = &a[123];
		a[123] = 0;
		a[456] = 456;
						val = *valp;

   If the hash slot freed by the deallocation of a[123] happens to get reused
   for storing a[456], BPF prog 2 might end up reading the value 456, while the
   only two valid values should be 123 and 0.

   This problem has been acknowledged by BPF developers, and the use of
   spinlocks seems to be the recommended solution (available at this point in
   eBPF).  It remains to be seen whether this is a good solutiuon for DTrace.
   We could also use some kind of state on the value to track references and
   only actually delete the entry when there are no more references left (and
   a program has indicated it should be deleted.)

--
A. DIF-based DTrace generates a block of DIF code to be executed for every
   statement, and in fact, it sometimes generates multiple blocks of code for
   a single statement (e.g. for the arguments to printf).  In BPF that won't
   be the right way to do things, so the compiler needs to know how to create
   more complex blocks of code, and userspace needs to have enough meta-data to
   be able to process output data that contains multiple data items.