File: DesignOverview.md

package info (click to toggle)
pytorch 1.7.1-7
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 80,340 kB
  • sloc: cpp: 670,830; python: 343,991; ansic: 67,845; asm: 5,503; sh: 2,924; java: 2,888; xml: 266; makefile: 244; ruby: 148; yacc: 144; objc: 51; lex: 44
file content (115 lines) | stat: -rw-r--r-- 5,480 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Current workflow

## Step 1: input from the user.

User construct a kernel from tensor expressions, like:
```
    Buffer a_buf("a", kFloat32, {M, N});
    Buffer b_buf("b", kFloat32, {N, K});
    Buffer c_buf("c", kFloat32, {M, N});
    Buffer d_buf("d", kFloat32, {M, K});

    Tensor* x = Compute(
        "x",
        {{M, "m1"}, {N, "n1"}, {K, "k1"}},
        [&](const VarHandle& m, const VarHandle& n, const VarHandle& k) {
          return a_buf(m, n) * b_buf(n, k);
        });
    Tensor* y = ...;
    Tensor* z = ...;
    std::vector<Tensor*> tensors_to_compute = {x, z}; // Tensor y might be used in x or z - in this case it will also be computed. 
```

## Step 2: Lower to a LoopNest:
```
   LoopNest l(tensors_to_compute);
```
LoopNest consists of a root statement (`Stmt`) and some metadata. The root statement of a loop nest is a block statement containing other statements.

A statement can be one of the following:
 - `Store` statement: such statements represent access to tensor elements. They specify the base variable (`Var`), an expression for the index, an expression for the stored value, and the mask.
 - `LetStmt` statement: 'let' statements are used for binding variables to given expressions. Such statements consist of the variable to bind (`Var`), the expression to bind to, and the body statement in which the binding should be performed.
 - `For` statement: these statements represent a loop. They specify the index variable (`Var`), expressions for the beginning and the end of the iteration space, a `Block` statement for the body, and some metadata.
 - `Cond` statement: these statements represent if-s: they consist of a condition expression and two `Block` statements for true and false branches (both are allowed to be null).
 - `Block` statement: these statements represent a linear sequence of other statements.

An example of a root statement:
```
for (int m = 0; m < 100; m++) {
  for (int n = 0; n < 200; n++) {
    c[m * 200 + n] = a[m * 200 + n + 1] + a[m * 200 + n];
  }
}
for (int i = 0; i < W; i++) {
  q[i] = i + 1
}
```

## Step 3: Apply loop transformations
One can apply various loop transformations on a loop nest. The transformations mutate statements in the loop nest and loop nest can record the history of the transformations.

## Step 4: Prepare loop nest for codegen
After all desired loop transformations are applied, a final transformation is carried out on the loop nest's root statement. A result of this transformation is also a statement, but it now can include lower-level statements like `Allocate` and `Free`, which are not allowed to exist during the loop transformation phase.

## Step 5: Pass the final statement for codegen (LLVM/CUDA/IREval)
Codegen is implemented as an IR visitor over the statements produced in the previous step.

# Tensor Expressions Language
There are several core concepts in the Tensor Expression engine, this section defines them and shows how they connect to each other.

## Expr
Expr represents a node in the abstract syntax tree of a tensor expression. Leaf nodes in such tree are either a symbolic variable (`Var`), a constant (`IntImm` or `FloatImm`), `Buffer`, or a `Tensor`. Non-leaf nodes refer to other expressions and represent various operations. E.g. `Add` has two operands: `lhs` and `rhs`, both of which are also `Expr`.

## Tensor
`Tensor` is a bundle of
1) a variable `Var` defining which tensor this `Tensor` expression is describing
2) a list of indices `args` (each of them is `Var`)
3) a list of expressions for dimensions `dims` (each of them is `Expr`)
4) a computational expression `body` (of `Expr` type)

## Buffer
`Buffer`s are essentially `Tensor`s without a `body` - they represent an indexed access to "tensors" that is outside of the tensor-expression system.
`Buffer` is a bundle of
1) a `Var` defining which buffer this `Buffer` expression is defining
2) a list of indices `args` (each of them is `Var`)
3) a list of expressions for dimensions `dims` (each of them is `Expr`)

## Example
Suppose we'd like to represent the following expression:
```
A[i,j] = B[i,j] + 7
```
where both `A` and `B` are 100x100 tensors.
On the top level we would have a single `Tensor` expression with:
1) a variable referring to "A"
2) list of two indices referring to "i" and "j"
3) list of two `IntImm` constants describing sizes (both of them would carry the value of 100)
4) a body expression which is an `Add` with two operands: `Buffer` describing `B[i,j]` access and an `IntImm` constant `7`.

The buffer expression describing `B[i,j]` would have similar properties:
1) a variable referring to "B"
2) list of two indices referring to "i" and "j"
3) list of two `IntImm` constants describing sizes (both of them would carry the value of 100)

In contrast to the tensor expression, the buffer expression would not have a body - it represents a symbolic access.

The code for constructing such an expression could look like this:

```
    Buffer B("B", kFloat32, {100, 100});
    Tensor* A = Compute(
        "A",
        {{100, "i"}, {100, "j"}},
        [&](const VarHandle& i, const VarHandle& j) {
          return B(i, j) + 7;
        });
```

## Function
`Function` represents several tensor computations bundled together. In fact, `Tensor`s are implemented via `Function`s. A function allows us to specify that several different tensor expressions operate over the same set of indices and dimensions.

# Memory model
TBD

# Integration with PyTorch JIT
TBD