File: int2.md

package info (click to toggle)
onnx 1.20.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 62,536 kB
  • sloc: python: 77,643; cpp: 60,445; sh: 52; makefile: 50; javascript: 1
file content (43 lines) | stat: -rw-r--r-- 1,554 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<!--
Copyright (c) ONNX Project Contributors

SPDX-License-Identifier: Apache-2.0
-->
(onnx-detail-int2) =

# 2 bit integer types

## Papers

[T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge](https://arxiv.org/abs/2407.00088)

T-MAC, an innovative lookup table(LUT)-based method designed for efficient low-bit LLM (i.e., weight-quantized LLM) inference on CPUs. T-MAC directly supports mpGEMM without dequantization, while simultaneously eliminating multiplications and reducing additions required. Specifically, T-MAC transforms the traditional data-type-centric multiplication to bit-wise table lookup, and enables a unified and scalable mpGEMM solution.

## Cast

Cast from 2 bit to any higher precision type is exact.
Cast to a 2 bit type is done by rounding to the nearest-integer (with ties to even)
nearest-even integer and truncating.


## Packing and Unpacking (2-bit)
All 2-bit types are stored as 4×2-bit values in a single byte. The elements are packed from least significant bits (LSB) to most significant bits (MSB). That is, for consecutive elements x0, x1, x2, x3 in the array:

Packing:
```
pack(x0, x1, x2, x3):
    (x0 & 0x03) |
    ((x1 & 0x03) << 2) |
    ((x2 & 0x03) << 4) |
    ((x3 & 0x03) << 6)
```

Unpacking:
```
x0 = z & 0x03
x1 = (z >> 2) & 0x03
x2 = (z >> 4) & 0x03
x3 = (z >> 6) & 0x03
```
In case the total number of elements is not divisible by 4, zero-padding will be applied in the remaining higher bits of the final byte.
The storage size of a 2-bit tensor of size N is: ceil(N / 4) bytes