| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 
 | =================
DataFlowSanitizer
=================
.. toctree::
   :hidden:
   DataFlowSanitizerDesign
.. contents::
   :local:
Introduction
============
DataFlowSanitizer is a generalised dynamic data flow analysis.
Unlike other Sanitizer tools, this tool is not designed to detect a
specific class of bugs on its own.  Instead, it provides a generic
dynamic data flow analysis framework to be used by clients to help
detect application-specific issues within their own code.
How to build libc++ with DFSan
==============================
DFSan requires either all of your code to be instrumented or for uninstrumented
functions to be listed as ``uninstrumented`` in the `ABI list`_.
If you'd like to have instrumented libc++ functions, then you need to build it
with DFSan instrumentation from source. Here is an example of how to build
libc++ and the libc++ ABI with data flow sanitizer instrumentation.
.. code-block:: console
  cd libcxx-build
  # An example using ninja
  cmake -GNinja path/to/llvm-project/llvm \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DLLVM_USE_SANITIZER="DataFlow" \
    -DLLVM_ENABLE_LIBCXX=ON \
    -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi"
  ninja cxx cxxabi
Note: Ensure you are building with a sufficiently new version of Clang.
Usage
=====
With no program changes, applying DataFlowSanitizer to a program
will not alter its behavior.  To use DataFlowSanitizer, the program
uses API functions to apply tags to data to cause it to be tracked, and to
check the tag of a specific data item.  DataFlowSanitizer manages
the propagation of tags through the program according to its data flow.
The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
For further information about each function, please refer to the header
file.
.. _ABI list:
ABI List
--------
DataFlowSanitizer uses a list of functions known as an ABI list to decide
whether a call to a specific function should use the operating system's native
ABI or whether it should use a variant of this ABI that also propagates labels
through function parameters and return values.  The ABI list file also controls
how labels are propagated in the former case.  DataFlowSanitizer comes with a
default ABI list which is intended to eventually cover the glibc library on
Linux but it may become necessary for users to extend the ABI list in cases
where a particular library or function cannot be instrumented (e.g. because
it is implemented in assembly or another language which DataFlowSanitizer does
not support) or a function is called from a library or function which cannot
be instrumented.
DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
The pass treats every function in the ``uninstrumented`` category in the
ABI list file as conforming to the native ABI.  Unless the ABI list contains
additional categories for those functions, a call to one of those functions
will produce a warning message, as the labelling behavior of the function
is unknown.  The other supported categories are ``discard``, ``functional``
and ``custom``.
* ``discard`` -- To the extent that this function writes to (user-accessible)
  memory, it also updates labels in shadow memory (this condition is trivially
  satisfied for functions which do not write to user-accessible memory).  Its
  return value is unlabelled.
* ``functional`` -- Like ``discard``, except that the label of its return value
  is the union of the label of its arguments.
* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
  is called, where ``F`` is the name of the function.  This function may wrap
  the original function or provide its own implementation.  This category is
  generally used for uninstrumentable functions which write to user-accessible
  memory or which have more complex label propagation behavior.  The signature
  of ``__dfsw_F`` is based on that of ``F`` with each argument having a
  label of type ``dfsan_label`` appended to the argument list.  If ``F``
  is of non-void return type a final argument of type ``dfsan_label *``
  is appended to which the custom function can store the label for the
  return value.  For example:
.. code-block:: c++
  void f(int x);
  void __dfsw_f(int x, dfsan_label x_label);
  void *memcpy(void *dest, const void *src, size_t n);
  void *__dfsw_memcpy(void *dest, const void *src, size_t n,
                      dfsan_label dest_label, dfsan_label src_label,
                      dfsan_label n_label, dfsan_label *ret_label);
If a function defined in the translation unit being compiled belongs to the
``uninstrumented`` category, it will be compiled so as to conform to the
native ABI.  Its arguments will be assumed to be unlabelled, but it will
propagate labels in shadow memory.
For example:
.. code-block:: none
  # main is called by the C runtime using the native ABI.
  fun:main=uninstrumented
  fun:main=discard
  # malloc only writes to its internal data structures, not user-accessible memory.
  fun:malloc=uninstrumented
  fun:malloc=discard
  # tolower is a pure function.
  fun:tolower=uninstrumented
  fun:tolower=functional
  # memcpy needs to copy the shadow from the source to the destination region.
  # This is done in a custom function.
  fun:memcpy=uninstrumented
  fun:memcpy=custom
Example
=======
The following program demonstrates label propagation by checking that
the correct labels are propagated.
.. code-block:: c++
  #include <sanitizer/dfsan_interface.h>
  #include <assert.h>
  int main(void) {
    int i = 1;
    dfsan_label i_label = dfsan_create_label("i", 0);
    dfsan_set_label(i_label, &i, sizeof(i));
    int j = 2;
    dfsan_label j_label = dfsan_create_label("j", 0);
    dfsan_set_label(j_label, &j, sizeof(j));
    int k = 3;
    dfsan_label k_label = dfsan_create_label("k", 0);
    dfsan_set_label(k_label, &k, sizeof(k));
    dfsan_label ij_label = dfsan_get_label(i + j);
    assert(dfsan_has_label(ij_label, i_label));
    assert(dfsan_has_label(ij_label, j_label));
    assert(!dfsan_has_label(ij_label, k_label));
    dfsan_label ijk_label = dfsan_get_label(i + j + k);
    assert(dfsan_has_label(ijk_label, i_label));
    assert(dfsan_has_label(ijk_label, j_label));
    assert(dfsan_has_label(ijk_label, k_label));
    return 0;
  }
Current status
==============
DataFlowSanitizer is a work in progress, currently under development for
x86\_64 Linux.
Design
======
Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
 |