File: constant_tensor_cache.md

package info (click to toggle)

onednn 3.7.2%2Bds-2

links: PTS, VCS
area: main
in suites: trixie
size: 87,776 kB
sloc: cpp: 1,016,013; lisp: 16,134; ansic: 12,564; python: 7,219; asm: 831; sh: 69; makefile: 41; javascript: 39

file content (59 lines) | stat: -rw-r--r-- 2,833 bytes

parent folder | download | duplicates (2)

Constant Tensor Cache {#dev_guide_constant_tensor_cache}
========================================================

The oneDNN Graph component supports the constant tensor cache feature, which is
used to cache processed constant tensors such as reordered constant weights
and folded constant scales to reduce redundant computation and improve
performance. The feature is disabled by default. Users can use the graph API or
environment variable to set or get specific cache capacity for different engine
kinds (CPU and GPU).

## Build-Time Controls

Build-time controls to enable or disable the constant tensor cache feature are
not supported. Only run-time controls through the graph API or environment
variables are supported. Refer to the following section.

## Run-Time Controls

### Constant Tensor Cache Capacity Control API

oneDNN Graph provides users with a pair of APIs to control the constant tensor
cache feature. To enable the constant tensor cache and set the capacity to a
specific engine kind, call the `setter` API. The unit of `setter` capacity API
is megabytes (MB). New tensors won't be cached when capacity is reached. To
query the current capacity for a specific engine kind, call the `getter` API.

~~~cpp
// setter API
@ref dnnl_graph_set_constant_tensor_cache_capacity

// getter API
@ref dnnl_graph_get_constant_tensor_cache_capacity
~~~

### Environment Variable

In addition to a programmable API, oneDNN Graph also provides users with an
environment variable named `ONEDNN_GRAPH_CONSTANT_TENSOR_CACHE_CAPACITY` to
control the capacity. It accepts values in the form `engine_kind:size` or
`engine_kind1:size1;engine_kind2:size2`. The first example below means the user
can set capacity for one engine kind (`cpu`). The second example is that the
capacity of `cpu` and `gpu` are set to 1024 MB and 2048 MB separately.

| Environment variable                        | Value(string)         | Description                                                    |
| :------------------------------------------ | :-------------------- | :------------------------------------------------------------- |
| ONEDNN_GRAPH_CONSTANT_TENSOR_CACHE_CAPACITY | "cpu:size1;gpu:size2" | Set cpu constant cache capacity size to size1 and gpu to size2 |

~~~bash
export ONEDNN_GRAPH_CONSTANT_TENSOR_CACHE_CAPACITY="cpu:1024"
export ONEDNN_GRAPH_CONSTANT_TENSOR_CACHE_CAPACITY="cpu:1024;gpu:2048"
~~~

@note
The environment variable API should be set only once before the application
starts; the library will read the variable and cache it inside to reduce string
parsing overhead. Re-setting the environment variable at runtime will not take
effect. Functional APIs have higher priority than environment variables. If
users call the functional APIs, it will overwrite the capacity values specified
through the environment variable.