File: exp-golomb.rst

package info (click to toggle)
python-bitstring 4.3.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,312 kB
  • sloc: python: 11,397; makefile: 8; sh: 7
file content (122 lines) | stat: -rw-r--r-- 4,173 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
.. currentmodule:: bitstring

.. _exp-golomb:

Exponential-Golomb Codes
========================

As this type of representation of integers isn't as well known as the standard base-2 representation I thought that a short explanation of them might be welcome.

Exponential-Golomb codes represent integers using bit patterns that get longer for larger numbers. For unsigned and signed numbers (the bitstring properties :attr:`~Bits.ue` and :attr:`~Bits.se` respectively) the patterns start like this:

=============  ===========  ===========
Bit pattern    Unsigned     Signed 
=============  ===========  ===========
``1``          0            0
``010``        1            1
``011``        2            -1
``00100``      3            2
``00101``      4            -2
``00110``      5            3
``00111``      6            -3
``0001000``    7            4
``0001001``    8            -4
``0001010``    9            5
``0001011``    10           -5
``0001100``    11           6
``...``        ...          ...
=============  ===========  ===========

They consist of a sequence of n '0' bits, followed by a '1' bit, followed by n more bits. The bits after the first '1' bit count upwards as ordinary base-2 binary numbers until they run out of space and an extra '0' bit needs to get included at the start.

The advantage of this method of representing integers over many other methods is that it can be quite efficient at representing small numbers without imposing a limit on the maximum number that can be represented.


ue
^^

The :attr:`~Bits.ue` property interprets the bitstring as a single unsigned exponential-Golomb code and returns an integer. If the bitstring is not exactly one code then an :exc:`InterpretError` is raised instead. If you instead wish to read the next bits in the stream and interpret them as a code use the read function or unpack with a ``ue`` format string.  ::

    >>> s = BitStream(ue=12)
    >>> s.bin
    '0001101'
    >>> s.append('ue=3')
    >>> print(s.unpack('2*ue'))
    [12, 3]

se
^^

The :attr:`~Bits.se` property does much the same as ``ue`` and the provisos there all apply. The obvious difference is that it interprets the bitstring as a signed exponential-Golomb rather than unsigned. ::

    >>> s = BitStream('0x164b')
    >>> s.se
    InterpretError: Bitstring is not a single exponential-Golomb code.
    >>> while s.pos < len(s):
    ...     print(s.read('se'))
    -5
    2
    0
    -1


Exercise
^^^^^^^^

Using the table above decode this sequence of unsigned Exponential Golomb codes:

``001001101101101011000100100101``

The answer is that it decodes to 3, 0, 0, 2, 2, 1, 0, 0, 8, 4. Note how you don’t need to know how many bits are used for each code in advance - there’s only one way to decode it. To create this bitstring you could have written something like::

    >>> a = Bits().join(f'ue={i}' for i in [3,0,0,2,2,1,0,0,8,4])

and to unpack it again::

    >>> a.unpack('10*ue')
    [3, 0, 0, 2, 2, 1, 0, 0, 8, 4]


The notation ``ue`` and ``se`` for the exponential-Golomb code properties comes from the H.264 video standard, which uses these types of code a lot. There are other ways to map the bitstrings to integers:

Interleaved codes
^^^^^^^^^^^^^^^^^

This type of code is used in the Dirac video standard, and is represented by the attributes :attr:`~Bits.uie` and :attr:`~Bits.sie`. For the interleaved codes the pattern is very similar to before for the unsigned case:

=============  ===========
Bit pattern    Unsigned
=============  ===========
``1``          0
``001``        1
``011``        2
``00001``      3
``00011``      4
``01001``      5
``01011``      6
``0000001``    7
``0000011``    8
``0001001``    9
``...``        ...
=============  ===========

For the signed code it looks a little different:

=============  ===========
Bit pattern    Signed
=============  ===========
``1``          0
``0010``       1
``0011``       -1
``0110``       2
``0111``       -2
``000010``     3
``000011``     -3
``000110``     4
``000111``     -4
``010010``     5
``010011``     -5
``...``        ...
=============  ===========

I'm sure you can work out the pattern yourself from here!