File: api.rst

package info (click to toggle)
py7zr 0.22.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 6,300 kB
  • sloc: python: 8,740; makefile: 197; ansic: 35
file content (455 lines) | stat: -rw-r--r-- 15,515 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
.. _api_documentation:

*****************
API Documentation
*****************

:mod:`py7zr` --- 7-Zip archive library
======================================

.. module:: py7zr
   :synopsis: Read and write 7Z-format archive files.

.. moduleauthor:: Hiroshi Miura <miurahr@linux.com>


The module is built upon awesome development effort and knowledge of `pylzma` module
and its `py7zlib.py` program by Joachim Bauch. Great appreciation for Joachim!

The module defines the following items:

.. exception:: Bad7zFile

   The error raised for bad 7z files.


.. class:: SevenZipFile
   :noindex:

   The class for reading 7z files.  See section sevenzipfile-object_


.. class:: FileInfo

   The class used to represent information about a member of an archive file. See section


.. function:: is_7zfile(filename)

   Returns ``True`` if *filename* is a valid 7z file based on its magic number,
   otherwise returns ``False``.  *filename* may be a file or file-like object too.


.. function:: unpack_7zarchive(archive, path, extra=None)

   Helper function to intend to use with :mod:`shutil` module which offers a number of high-level operations on files
   and collections of files. Since :mod:`shutil` has a function to register decompressor of archive, you can register
   an helper function and then you can extract archive by calling :meth:`shutil.unpack_archive`

.. code-block:: python

    shutil.register_unpack_format('7zip', ['.7z'], unpack_7zarchive)
    shutil.unpack_archive(filename, extract_dir)


.. function:: pack_7zarchive(archive, path, extra=None)

   Helper function to intend to use with :mod:`shutil` module which offers a number of high-level operations on files
   and collections of files. Since :mod:`shutil` has a function to register maker of archive, you can register
   an helper function and then you can produce archive by calling :meth:`shutil.make_archive`

.. code-block:: python

    shutil.register_archive_format('7zip', pack_7zarchive, description='7zip archive')
    shutil.make_archive(base_name, '7zip', base_dir)


.. seealso::

   (external link) `shutil`_  :mod:`shutil` module offers a number of high-level operations on files and collections of files.

.. _shutil: https://docs.python.org/3/library/shutil.html


Class description
=================

.. _archiveinfo-object:

ArchiveInfo Object
------------------

.. py:class:: ArchiveInfo(filename, stat, header_size, method_names, solid, blocks, uncompressed)

   Data only python object to hold information of archive.
   The object can be retrieved by `archiveinfo()` method of `SevenZipFile` object.

.. py:attribute:: filename
   :type: str

   filename of 7zip archive. If SevenZipFile object is created from BinaryIO object,
   it becomes None.

.. py:attribute:: stat
   :type: stat_result

   fstat object of 7zip archive. If SevenZipFile object is created from BinaryIO object,
   it becomes None.

.. py:attribute:: header_size
   :type: int

   header size of 7zip archive.

.. py:attribute:: method_names
   :type: List[str]

   list of method names used in 7zip archive. If method is not supported by py7zr,
   name has a postfix asterisk(`*`) mark.

.. py:attribute:: solid
   :type: bool

   Whether is 7zip archive a solid compression or not.

.. py:attribute:: blocks
   :type: int

   number of compression block(s)

.. py:attribute:: uncompressed
   :type: int

   total uncompressed size of files in 7zip archive


.. _sevenzipfile-object:

SevenZipFile Object
-------------------


.. py:class:: SevenZipFile(file, mode='r', filters=None, dereference=False, password=None)

   Open a 7z file, where *file* can be a path to a file (a string), a
   file-like object or a :term:`path-like object`.

   The *mode* parameter should be ``'r'`` to read an existing
   file, ``'w'`` to truncate and write a new file, ``'a'`` to append to an
   existing file, or ``'x'`` to exclusively create and write a new file.
   If *mode* is ``'x'`` and *file* refers to an existing file,
   a :exc:`FileExistsError` will be raised.
   If *mode* is ``'r'`` or ``'a'``, the file should be seekable.

   The *filters* parameter controls the compression algorithms to use when
   writing files to the archive.

   SevenZipFile class has a capability as context manager. It can handle
   'with' statement.

   If dereference is False, add symbolic and hard links to the archive.
   If it is True, add the content of the target files to the archive.
   This has no effect on systems that do not support symbolic links.

   When password given, py7zr handles an archive as an encrypted one.

.. py:method:: SevenZipFile.close()

   Close the archive file and release internal buffers.  You must
   call :meth:`close` before exiting your program or most records will
   not be written.


.. py:method:: SevenZipFile.getnames()

   Return a list of archive files by name.


.. py:method:: SevenZipFile.needs_password()

   Return `True` if the archive is encrypted, or is going to create
   encrypted archive. Otherwise return `False`


.. py:method:: SevenZipFile.extractall(path=None)

   Extract all members from the archive to current working directory.  *path*
   specifies a different directory to extract to.


.. py:method:: SevenZipFile.extract(path=None, targets=None)

   Extract specified pathspec archived files to current working directory.
   'path' specifies a different directory to extract to.

   'targets' is a COLLECTION of archived file names to be extracted.
   py7zr looks for files and directories as same as specified in element
   of 'targets'.

   When the method gets a ``str`` object or another object other than collection
   such as LIST or SET, it will raise :exc:`TypeError`.

   Once extract() called, the ``SevenZipFile`` object become exhausted,
   and an EOF state.
   If you want to call :meth:`read`, :meth:`readall`, :meth:`extract`, :meth:`extractall`
   again, you should call :meth:`reset` before it.

   **CAUTION** when specifying files and not specifying parent directory,
   py7zr will fails with no such directory. When you want to extract file
   'somedir/somefile' then pass a list: ['somedirectory', 'somedir/somefile']
   as a target argument.


.. py:method:: SevenZipFile.extract(path=None, targets=None, recursive=True)

   'recursive' is a BOOLEAN which if set True, helps with simplifying subcontents
   extraction.

   Instead of specifying all files / directories under a parent
   directory by passing a list of 'targets', specifying only the parent directory
   and setting 'recursive' to True forces an automatic extraction of all
   subdirectories and subcontents recursively.

   If 'recursive' is not set, it defaults to False, so the extraction proceeds as
   if the parameter did not exist.

   Please see 'tests/test_basic.py: test_py7zr_extract_and_getnames()' for
   example code.

.. code-block:: python

   filter_pattern = re.compile(r'scripts.*')
   with SevenZipFile('archive.7z', 'r') as zip:
        allfiles = zip.getnames()
        targets = [f if filter_pattern.match(f) for f in allfiles]
   with SevenZipFile('archive.7z', 'r') as zip:
        zip.extract(targets=targets)
   with SevenZipFile('archive.7z', 'r') as zip:
        zip.extract(targets=targets, recursive=True)


.. py:method:: SevenZipFile.readall()

   Extract all members from the archive to memory and returns dictionary object.
   Returned dictionary has a form of Dict[filename: str, BinaryIO: io.BytesIO object].
   Once readall() called, the SevenZipFIle object become exhausted and EOF state.
   If you want to call read(), readall(), extract(), extractall() again,
   you should call reset() before it.
   You can get extracted data from dictionary value as such

.. code-block:: python

   with SevenZipFile('archive.7z', 'r') as zip:
       for fname, bio in zip.readall().items():
           print(f'{fname}: {bio.read(10)}...')


.. py:method:: SevenZipFile.read(targets=None)

   Extract specified list of target archived files to dictionary object.

   'targets' is a COLLECTION of archived file names to be extracted.
   py7zr looks for files and directories as same as specified in element
   of 'targets'.

   When the method get a ``str`` object or another object other than collection
   such as LIST or SET, it will raise :exc:`TypeError`.

   When targets is None, it behave as same as readall().
   Once read() called, the SevenZipFIle object become exhausted and EOF state.
   If you want to call read(), readall(), extract(), extractall() again,
   you should call reset() before it.

.. code-block:: python

   filter_pattern = re.compile(r'scripts.*')
   with SevenZipFile('archive.7z', 'r') as zip:
        allfiles = zip.getnames()
        targets = [f for f in allfiles if filter_pattern.match(f)]
   with SevenZipFile('archive.7z', 'r') as zip:
        for fname, bio in zip.read(targets).items():
            print(f'{fname}: {bio.read(10)}...')


.. py:method:: SevenZipFile.list()

    Return a List[FileInfo].


.. py:method:: SevenZipFile.archiveinfo()

    Return a ArchiveInfo object.

.. py:method:: SevenZipFile.namelist()

    Return a list of archive members by name.

.. py:method:: SevenZipFile.test()

   Read all the archive file and check a packed CRC.
   Return ``True`` if CRC check passed, and return ``False`` when detect defeat,
   or return ``None`` when the archive don't have a CRC record.


.. py:method:: SevenZipFile.testzip()

    Read all the files in the archive and check their CRCs.
    Return the name of the first bad file, or else return ``None``.
    When the archive don't have a CRC record, it return ``None``.


.. py:method:: SevenZipFile.write(filename, arcname=None)

   Write the file named *filename* to the archive, giving it the archive name
   *arcname* (by default, this will be the same as *filename*, but without a drive
   letter and with leading path separators removed).
   The archive must be open with mode ``'w'``


.. py:method:: SevenZipFile.writeall(filename, arcname=None)

   Write the directory and its sub items recursively into the archive, giving
   the archive name *arcname* (by default, this will be the same as *filename*,
   but without a drive letter and with leading path seaprator removed).

   If you want to store directories and files, putting *arcname* is good idea.
   When filename is 'C:/a/b/c' and arcname is 'c', with a file exist as 'C:/a/b/c/d.txt',
   then archive listed as ['c', 'c/d.txt'], the former as directory.


.. py:method:: SevenZipFile.set_encrypted_header(mode)

   Set header encryption mode. When encrypt header, set mode to `True`, otherwise `False`.
   Default is `False`.


.. py:method:: SevenZipFile.set_encoded_header_mode(mode)

   Set header encode mode. When encode header data, set mode to `True`, otherwise `False`.
   Default is `True`.


.. py:attribute:: SevenZipFile.filename

   Name of the SEVEN ZIP file.


Compression Methods
===================

'py7zr' supports algorithms and filters which `lzma module`_ and `liblzma`_ support.
It also support BZip2 and Deflate that are implemented in python core libraries,
and ZStandard with third party libraries.
`py7zr`, python3 core `lzma module`_ and `liblzma` do not support some algorithms
such as PPMd, BCJ2 and Deflate64.

.. _`lzma module`: https://docs.python.org/3/library/lzma.html
.. _`liblzma`: https://tukaani.org/xz/

Here is a table of algorithms.

+---+----------------------+------------+-----------------------------+
|  #|   Category           | Algorithm  | Note                        |
+===+======================+============+=============================+
|  1| - Compression        | LZMA2      |  default (LZMA2+BCJ)        |
+---+ - Decompression      +------------+-----------------------------+
|  2|                      | LZMA       |                             |
+---+                      +------------+-----------------------------+
|  3|                      | Bzip2      |                             |
+---+                      +------------+-----------------------------+
|  4|                      | Deflate    |                             |
+---+                      +------------+-----------------------------+
|  5|                      | COPY       |                             |
+---+                      +------------+-----------------------------+
|  6|                      | PPMd       | depend on pyppmd            |
+---+                      +------------+-----------------------------+
|  7|                      | ZStandard  | depend on pyzstd            |
+---+                      +------------+-----------------------------+
|  8|                      | Brotli     | depend on brotli,brotliCFFI |
+---+----------------------+------------+-----------------------------+
|  9| - Filter             | BCJ        |(X86, ARM, PPC, ARMT, SPARC, |
|   |                      |            | IA64)  depend on bcj-cffi   |
+---+                      +------------+-----------------------------+
| 10|                      | Delta      |                             |
+---+----------------------+------------+-----------------------------+
| 11| - Encryption         | 7zAES      | depend on pycryptodomex     |
|   | - Decryption         |            |                             |
+---+----------------------+------------+-----------------------------+
| 12| - Unsupported        | BCJ2       |                             |
+---+                      +------------+-----------------------------+
| 13|                      | Deflate64  |                             |
+---+----------------------+------------+-----------------------------+

- A feature handling symbolic link is basically compatible with 'p7zip' implementation,
  but not work with original 7-zip because the original does not implement the feature.


Possible filters value
======================

Here is a list of examples for possible filters values.
You can use it when creating SevenZipFile object.

.. code-block:: python

    from py7zr import FILTER_LZMA, SevenZipFile

    filters = [{'id': FILTER_LZMA}]
    archive = SevenZipFile('target.7z', mode='w', filters=filters)


LZMA2 + Delta
    ``[{'id': FILTER_DELTA}, {'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}]``

LZMA2 + BCJ
    ``[{'id': FILTER_X86}, {'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}]``

LZMA2 + ARM
    ``[{'id': FILTER_ARM}, {'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}]``

LZMA + BCJ
    ``[{'id': FILTER_X86}, {'id': FILTER_LZMA}]``

LZMA2
    ``[{'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}]``

LZMA
    ``[{'id': FILTER_LZMA}]``

BZip2
    ``[{'id': FILTER_BZIP2}]``

Deflate
    ``[{'id': FILTER_DEFLATE}]``

ZStandard
    ``[{'id': FILTER_ZSTD, 'level': 3}]``

PPMd
    ``[{'id': FILTER_PPMD, 'order': 6, 'mem': 24}]``

    ``[{'id': FILTER_PPMD, 'order': 6, 'mem': "16m"}]``

Brolti
    ``[{'id': FILTER_BROTLI, 'level': 11}]``

7zAES + LZMA2 + Delta
    ``[{'id': FILTER_DELTA}, {'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}, {'id': FILTER_CRYPTO_AES256_SHA256}]``

7zAES + LZMA2 + BCJ
    ``[{'id': FILTER_X86}, {'id': FILTER_LZMA2, 'preset': PRESET_DEFAULT}, {'id': FILTER_CRYPTO_AES256_SHA256}]``

7zAES + LZMA
    ``[{'id': FILTER_LZMA}, {'id': FILTER_CRYPTO_AES256_SHA256}]``

7zAES + Deflate
    ``[{'id': FILTER_DEFLATE}, {'id': FILTER_CRYPTO_AES256_SHA256}]``

7zAES + BZip2
    ``[{'id': FILTER_BZIP2}, {'id': FILTER_CRYPTO_AES256_SHA256}]``

7zAES + ZStandard
    ``[{'id': FILTER_ZSTD}, {'id': FILTER_CRYPTO_AES256_SHA256}]``


.. rubric:: Footnotes