File: compressing.rst

package info (click to toggle)
pydicom 2.4.3-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 11,700 kB
  • sloc: python: 129,337; makefile: 198; sh: 121
file content (205 lines) | stat: -rw-r--r-- 7,747 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
========================
Compressing *Pixel Data*
========================

.. currentmodule:: pydicom

This tutorial is about compressing a dataset's *Pixel Data* and covers

* An introduction to compression
* Using data compressed by third-party packages
* Compressing data using *pydicom*

It's assumed that you're already familiar with the :doc:`dataset basics
<../dataset_basics>`.


**Prerequisites**

This tutorial uses packages in addition to *pydicom* that are not installed
by default, but are required for *RLE Lossless* compression of *Pixel Data*.
For more information on what packages are available to compress a given
transfer syntax see the :ref:`image compression guide
<guide_compression_supported>`.

Installing using pip:

.. code-block:: bash

    python -m pip install -U pydicom>=2.2 numpy pylibjpeg

Installing on conda:

.. code-block:: bash

    conda install numpy
    conda install -c conda-forge pydicom>=2.2
    pip install pylibjpeg


Introduction
------------

DICOM conformant applications are usually required to support the
*Little Endian Implicit VR* transfer syntax, which is an uncompressed (native)
transfer syntax. This means that datasets using *Little Endian Implicit VR* have
no compression of their *Pixel Data*. So if applications are required to
support it, why do we need *Pixel Data* compression?

The answer, of course, is file size. A *CT Image* instance
typically consists of 1024 x 1024 16-bit pixels, and a CT scan may have
hundreds of instances, giving a total series size of hundreds of megabytes.
When you factor in other SOP Classes such as *Whole Slide Microscopy* which
uses even larger full color images, the size of the uncompressed *Pixel Data*
may get into the gigabyte territory. Being able to compress these images can
result in significantly reduced file sizes.

However, with the exception of *RLE Lossless*, *pydicom* doesn't currently
offer any native support for compression of *Pixel Data*. This means that it's
entirely up to you to compress the *Pixel Data* in a manner conformant to
the :dcm:`requirements of the DICOM Standard<part05/sect_8.2.html>`.

.. note::

    We recommend that you use `GDCM
    <http://gdcm.sourceforge.net/wiki/index.php/Main_Page>`_ for *Pixel Data*
    compression as it provides support for all the most commonly used
    *Transfer Syntaxes* and being another DICOM library, should do so in
    a conformant manner.

The general requirements for compressed *Pixel Data* in the DICOM Standard are:

* Each frame of pixel data must be encoded separately
* All the encoded frames must then be :dcm:`encapsulated
  <part05/sect_A.4.html>`.
* When the amount of encoded frame data is very large
  then it's recommended (but not required) that an :dcm:`extended offset table
  <part03/sect_C.7.6.3.html>` also be included with the dataset

Each *Transfer Syntax* has it's own specific requirements, found
in :dcm:`Part 5 of the DICOM Standard<part05/PS3.5.html>`.


Encapsulating data compressed by third-party packages
-----------------------------------------------------

Once you've used a third-party package to compress the *Pixel Data*,
*pydicom* can be used to encapsulate and add it to the
dataset, with either the :func:`~pydicom.encaps.encapsulate` or
:func:`~pydicom.encaps.encapsulate_extended` functions:

.. code-block:: python

    from typing import List, Tuple

    from pydicom import dcmread
    from pydicom.data import get_testdata_file
    from pydicom.encaps import encapsulate, encapsulate_extended
    from pydicom.uid import JPEG2000Lossless

    path = get_testdata_file("CT_small.dcm")
    ds = dcmread(path)

    # Use third-party package to compress
    # Let's assume it compresses to JPEG 2000 (lossless)
    frames: List[bytes] = third_party_compression_func(...)

    # Set the *Transfer Syntax UID* appropriately
    ds.file_meta.TransferSyntaxUID = JPEG2000Lossless
    # For *Samples per Pixel* 1 the *Photometric Interpretation* is unchanged

    # Basic encapsulation
    ds.PixelData = encapsulate(frames)
    ds.save_as("CT_small_compressed_basic.dcm")

    # Extended encapsulation
    result: Tuple[bytes, bytes, bytes] = encapsulate_extended(frames)
    ds.PixelData = result[0]
    ds.ExtendedOffsetTable = result[1]
    ds.ExtendedOffsetTableLength = result[2]
    ds.save_as("CT_small_compressed_ext.dcm")


Compressing using pydicom
-------------------------

Currently, only the *RLE Lossless* transfer syntax is supported for
compressing *Pixel Data* natively using *pydicom*. The easiest method is to
pass the UID for *RLE Lossless* to :func:`Dataset.compress()
<pydicom.dataset.Dataset.compress>`:

.. code-block:: python

    >>> from pydicom import dcmread
    >>> from pydicom.data import get_testdata_file
    >>> from pydicom.uid import RLELossless
    >>> path = get_testdata_file("CT_small.dcm")
    >>> ds = dcmread(path)
    >>> ds.compress(RLELossless)
    >>> ds.save_as("CT_small_rle.dcm")

This will compress the existing *Pixel Data* and update the *Transfer Syntax
UID* before saving the dataset to file as  ``CT_small_rle.dcm``.

If you're creating a dataset from scratch you can instead pass a
:class:`~numpy.ndarray` to be compressed and used as the *Pixel Data*:

.. code-block:: python

    >>> import numpy as np
    >>> arr = np.zeros((ds.Rows, ds.Columns), dtype='<i2')
    >>> ds.compress(RLELossless, arr)

Note that the :attr:`~numpy.ndarray.shape`, :class:`~numpy.dtype` and contents
of `arr` must match the corresponding elements in the dataset, such as *Rows*,
*Columns*, *Samples per Pixel*, etc. If they don't match you'll get an
exception:

.. code-block:: python

    >>> arr = np.zeros((ds.Rows, ds.Columns + 1), dtype='<i2')
    >>> ds.compress(RLELossless, arr)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../pydicom/dataset.py", line 1697, in compress
        encoded = [f for f in frame_iterator]
      File ".../pydicom/dataset.py", line 1697, in <listcomp>
        encoded = [f for f in frame_iterator]
      File ".../pydicom/encoders/base.py", line 382, in iter_encode
        yield self._encode_array(src, idx, encoding_plugin, **kwargs)
      File ".../pydicom/encoders/base.py", line 209, in _encode_array
        src = self._preprocess(arr, **kwargs)
      File ".../pydicom/encoders/base.py", line 533, in _preprocess
        raise ValueError(
    ValueError: Unable to encode as the shape of the ndarray (128, 129) doesn't match the values for the rows, columns and samples per pixel

A specific encoding plugin can be used by passing the plugin name via the
`encoding_plugin` argument:

.. code-block:: python

    >>> ds.compress(RLELossless, encoding_plugin='pylibjpeg')

The plugins available for each encoder are listed in the
:mod:`API reference<pydicom.encoders>` for the encoder type.

Implicitly changing the compression on an already compressed dataset is not
currently supported, however it can still be done explicitly by decompressing
prior to calling :meth:`~pydicom.dataset.Dataset.compress`. In the example
below, a matching :doc:`image data handler</old/image_data_handlers>` for the
original transfer syntax - *JPEG 2000 Lossless* - is required.

.. code-block:: python

    >>> ds = get_testdata_file("US1_J2KR.dcm", read=True)
    >>> ds.SamplesPerPixel
    3
    >>> ds.PhotometricInterpretation
    'YBR_RCT'
    >>> ds.PhotometricInterpretation = "RGB"
    >>> ds.compress(RLELossless)

Note that in this case we also needed to change the *Photometric
Interpretation*, from the original value of ``'YBR_RCT'`` when the dataset
was using *JPEG 2000 Lossless* compression to ``'RGB'``, which for this dataset
will be the correct value after recompressing using *RLE Lossless*.