File: README.md

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (220 lines) | stat: -rw-r--r-- 5,848 bytes parent folder | download | duplicates (21)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
# Decision Forest Code Completion Model

## Decision Forest
A **decision forest** is a collection of many decision trees. A **decision tree** is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a **binary decision** based on the input data, and leaf nodes represent a prediction.

In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf. 

An input (code completion candidate) is characterized as a set of **features**, such as the *type of symbol* or the *number of existing references*.

At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one *feature** of the input against a constant. The condition can be of two types:
- **if_greater**: Checks whether a numerical feature is **>=** a **threshold**.
- **if_member**: Check whether the **enum** feature is contained in the **set** defined in the node.

A leaf node contains the value **score**.
To compute an overall **quality** score, we traverse each tree in this way and add up the scores.

## Model Input Format
The input model is represented in json format.

### Features
The file **features.json** defines the features available to the model. 
It is a json list of features. The features can be of following two kinds.

#### Number
```
{
  "name": "a_numerical_feature",
  "kind": "NUMBER"
}
```
#### Enum
```
{
  "name": "an_enum_feature",
  "kind": "ENUM",
  "enum": "fully::qualified::enum",
  "header": "path/to/HeaderDeclaringEnum.h"
}
```
The field `enum` specifies the fully qualified name of the enum.
The maximum cardinality of the enum can be **32**.

The field `header` specifies the header containing the declaration of the enum.
This header is included by the inference runtime.


### Decision Forest
The file `forest.json` defines the  decision forest. It is a json list of **DecisionTree**.

**DecisionTree** is one of **IfGreaterNode**, **IfMemberNode**, **LeafNode**.
#### IfGreaterNode
```
{
  "operation": "if_greater",
  "feature": "a_numerical_feature",
  "threshold": A real number,
  "then": {A DecisionTree},
  "else": {A DecisionTree}
}
```
#### IfMemberNode
```
{
  "operation": "if_member",
  "feature": "an_enum_feature",
  "set": ["enum_value1", "enum_value2", ...],
  "then": {A DecisionTree},
  "else": {A DecisionTree}
}
```
#### LeafNode
```
{
  "operation": "boost",
  "score": A real number
}
```

## Code Generator for Inference
The implementation of inference runtime is split across:

### Code generator
The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library: 
- `${output_dir}/{filename}.h`
- `${output_dir}/{filename}.cpp`

Invocation
```
python3 CompletionModelCodegen.py \
        --model path/to/model/dir \
        --output_dir path/to/output/dir \
        --filename OutputFileName \
        --cpp_class clang::clangd::YourExampleClass
```
### Build System
`CompletionModel.cmake` provides `gen_decision_forest` method . 
Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library.
It can then use the generated API by including and depending on this library.

### Generated API for inference
The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`.

Members of this generated class comprises of all the features mentioned in `features.json`. 
Thus this class can represent a code completion candidate that needs to be scored.

The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate.


## Example
### model/features.json
```
[
  {
    "name": "ANumber",
    "type": "NUMBER"
  },
  {
    "name": "AFloat",
    "type": "NUMBER"
  },
  {
    "name": "ACategorical",
    "type": "ENUM",
    "enum": "ns1::ns2::TestEnum",
    "header": "model/CategoricalFeature.h"
  }
]
```
### model/forest.json
```
[
  {
    "operation": "if_greater",
    "feature": "ANumber",
    "threshold": 200.0,
    "then": {
      "operation": "if_greater",
      "feature": "AFloat",
      "threshold": -1,
      "then": {
        "operation": "boost",
        "score": 10.0
      },
      "else": {
        "operation": "boost",
        "score": -20.0
      }
    },
    "else": {
      "operation": "if_member",
      "feature": "ACategorical",
      "set": [
        "A",
        "C"
      ],
      "then": {
        "operation": "boost",
        "score": 3.0
      },
      "else": {
        "operation": "boost",
        "score": -4.0
      }
    }
  },
  {
    "operation": "if_member",
    "feature": "ACategorical",
    "set": [
      "A",
      "B"
    ],
    "then": {
      "operation": "boost",
      "score": 5.0
    },
    "else": {
      "operation": "boost",
      "score": -6.0
    }
  }
]
```
### DecisionForestRuntime.h
```
...
namespace ns1 {
namespace ns2 {
namespace test {
class Example {
public:
  void setANumber(float V) { ... }
  void setAFloat(float V) { ... }
  void setACategorical(unsigned V) { ... }

private:
  ...
};

float Evaluate(const Example&);
} // namespace test
} // namespace ns2
} // namespace ns1
```

### CMake Invocation
Inorder to use the inference runtime, one can use `gen_decision_forest` function 
described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments.

For example, the following invocation reads the model present in `path/to/model` and creates 
`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp` 
describing a `class` named `MyClass` in namespace `fully::qualified`.



```
gen_decision_forest(path/to/model
  myfilename
  ::fully::qualifed::MyClass)
```