File: decompressing.rst

package info (click to toggle)
pooch 1.9.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 988 kB
  • sloc: python: 2,672; makefile: 93
file content (56 lines) | stat: -rw-r--r-- 1,834 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
.. _decompressing:

Decompressing
=============

If you have a compressed file that is not an archive (zip or tar), you can use
:class:`pooch.Decompress` to decompress it after download.

For example, large binary files can be compressed with ``gzip`` to reduce
download times but will need to be decompressed before loading, which can be
slow.
You can trade storage space for speed by keeping a decompressed copy of the
file:

.. code:: python

    from pooch import Decompress

    def fetch_compressed_file():
        """
        Load a large binary file that has been gzip compressed.
        """
        # Pass in the processor to decompress the file on download
        fname = GOODBOY.fetch("large-binary-file.npy.gz", processor=Decompress())
        # The file returned is the decompressed version which can be loaded by
        # numpy
        data = numpy.load(fname)
        return data

:class:`pooch.Decompress` returns ``"large-binary-file.npy.gz.decomp"`` as the
decompressed file name by default.
You can change this behaviour by passing a file name instead:

.. code:: python

    import os
    from pooch import Decompress

    def fetch_compressed_file():
        """
        Load a large binary file that has been gzip compressed.
        """
        # Pass in the processor to decompress the file on download
        fname = GOODBOY.fetch("large-binary-file.npy.gz",
            processor=Decompress(name="a-different-file-name.npy"),
        )
        # The file returned is now named "a-different-file-name.npy"
        data = numpy.load(fname)
        return data

.. warning::

    Passing in ``name`` can cause existing data to be lost!
    For example, if a file already exists with the specified name it will be
    overwritten with the new decompressed file content.
    **Use this option with caution.**