File: installation.rst

package info (click to toggle)
pymupdf 1.21.1%2Bds1-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 13,404 kB
  • sloc: python: 8,737; makefile: 8
file content (147 lines) | stat: -rw-r--r-- 6,399 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
.. include:: header.rst

Installation
=============

PyMuPDF should be installed using pip with::

  python -m pip install --upgrade pip
  python -m pip install --upgrade pymupdf

This will install from a Python wheel if one is available for your platform.


Installation when a suitable wheel is not available
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a suitable Python wheel is not available, pip will automatically build from
source using a Python sdist.

**This requires C/C++ development tools and SWIG to be installed**:

* On Unix-style systems such as Linux, OpenBSD and FreeBSD,
  use the system package manager to install SWIG.

  * For example on Debian Linux, do: ``sudo apt install swig``

* On Windows:

  * Install Visual Studio 2019. If not installed in a standard location, set
    environmental variable ``PYMUPDF_SETUP_DEVENV`` to the location of the
    ``devenv.com`` binary.

  * Install SWIG by following the instructions at:
    https://swig.org/Doc4.0/Windows.html#Windows_installation

* On MacOS, install MacPorts using the instructions at:
  https://www.macports.org/install.php

  * Then install SWIG with: ``sudo port install swig``
  * You may also need: ``sudo port install swig-python``

As of ``PyMuPDF-1.20.0``, the required MuPDF source code is already in the
sdist and is automatically built into PyMuPDF.


Notes
~~~~~

Wheels are available for Windows (32-bit Intel, 64-bit Intel), Linux (64-bit Intel, 64-bit ARM) and Mac OSX (64-bit Intel, 64-bit ARM), Python versions 3.7 and up.

Wheels are not available for Python installed with `Chocolatey
<https://chocolatey.org/>`_ on Windows. Instead install Python
using the Windows installer from the python.org website, see:
http://www.python.org/downloads

PyMuPDF does not support Python versions prior to 3.7. Older wheels can be found in `this <https://github.com/pymupdf/PyMuPDF-Optional-Material/tree/master/wheels-upto-Py3.5>`_ repository and on `PyPI <https://pypi.org/project/PyMuPDF/>`_.
Please note that we generally follow the official Python release schedules. For Python versions dropping out of official support this means, that generation of wheels will also be ceased for them.

There are no **mandatory** external dependencies. However, some optional feature are available only if additional components are installed:

* `Pillow <https://pypi.org/project/Pillow/>`_ is required for :meth:`Pixmap.pil_save` and :meth:`Pixmap.pil_tobytes`.
* `fontTools <https://pypi.org/project/fonttools/>`_ is required for :meth:`Document.subset_fonts`.
* `pymupdf-fonts <https://pypi.org/project/pymupdf-fonts/>`_ is a collection of nice fonts to be used for text output methods.
* `Tesseract-OCR <https://github.com/tesseract-ocr/tesseract>`_ for optical character recognition in images and document pages. Tesseract is separate software, not a Python package. To enable OCR functions in PyMuPDF, the software must be installed and the system environment variable ``"TESSDATA_PREFIX"`` must be defined and contain the ``tessdata`` folder name of the Tesseract installation location. See below.

.. note:: You can install these additional components at any time -- before or after installing PyMuPDF. PyMuPDF will detect their presence during import or when the respective functions are being used.


Install from source without using an sdist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* First get a PyMuPDF source tree:

  * Clone the git repository at https://github.com/pymupdf/PyMuPDF,
    for example::

      git clone https://github.com/pymupdf/PyMuPDF.git

  * Or download and extract a ``.zip`` or ``.tar.gz`` source release from
    https://github.com/pymupdf/PyMuPDF/releases.

* Install C/C++ development tools and SWIG as described above.

* Build and install PyMuPDF::

    cd PyMuPDF && python setup.py install

  This will automatically download a specific hard-coded MuPDF source release,
  and build it into PyMuPDF.
  
  One can build with a non-default MuPDF (for example one installed on the
  system, or a local checkout) by setting environmental variables. See the
  comments at the start of ``PyMuPDF/setup.py`` for more information.

.. note:: When running Python scripts that use PyMuPDF, make sure that the
  current directory is not the ``PyMuPDF/`` directory.

  Otherwise, confusingly, Python will attempt to import ``fitz`` from the local
  ``fitz/`` directory, which will fail because it only contains source files.


Running tests
~~~~~~~~~~~~~

PyMuPDF has a set of ``pytest`` scripts within the ``tests/`` directory.

Run tests with::

    pip install pytest fontTools
    pytest PyMuPDF/tests

If PyMuPDF has been built with a non-default build of MuPDF (using
environmental variable ``PYMUPDF_SETUP_MUPDF_BUILD``), it is possible that
``tests/test_textbox.py:test_textbox3()`` will fail, because it relies on MuPDF
having been built with PyMuPDF's customized configuration, ``fitz/_config.h``.

One can skip this particular test by adding ``-k 'not test_textbox3'`` to the
``pytest`` command line.


Enabling Integrated OCR Support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you do not intend to use this feature, skip this step. Otherwise, it is required for both installation paths: **from wheels and from sources.**

PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract's language support data, so installation of Tesseract-OCR is still required.

The language support folder location must currently [#f1]_ be communicated via storing it in the environment variable ``"TESSDATA_PREFIX"``.

So for a working OCR functionality, make sure to complete this checklist:

1. Install Tesseract.

2. Locate Tesseract's language support folder. Typically you will find it here:
    - Windows: ``C:\Program Files\Tesseract-OCR\tessdata``
    - Unix systems: ``/usr/share/tesseract-ocr/4.00/tessdata``

3. Set the environment variable ``TESSDATA_PREFIX``
    - Windows: ``set TESSDATA_PREFIX=C:\Program Files\Tesseract-OCR\tessdata``
    - Unix systems: ``export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata``

.. note:: This must happen outside Python -- before starting your script. Just manipulating ``os.environ`` will not work!

.. rubric:: Footnotes

.. [#f1] In the next MuPDF version, it will be possible to pass this value as a parameter -- directly in the OCR invocations.

.. include:: footer.rst