1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
|
.. include:: header.rst
.. _HowToOpenAFile:
==============================
Opening Files
==============================
.. _Supported_File_Types:
Supported File Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|PyMuPDF| can open files other than just |PDF|.
The following file types are supported:
.. include:: supported-files-table.rst
How to Open a File
~~~~~~~~~~~~~~~~~~~~~
To open a file, do the following:
.. code-block:: python
doc = pymupdf.open("a.pdf")
.. note:: The above creates a :ref:`Document`. The instruction `doc = pymupdf.Document("a.pdf")` does exactly the same. So, `open` is just a convenient alias and you can find its full API documented in that chapter.
Opening with :index:`a Wrong File Extension <pair: wrong; file extension>`
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
If you have a document with a wrong file extension for its type, you can still correctly open it.
Assume that *"some.file"* is actually an **XPS**. Open it like so:
.. code-block:: python
doc = pymupdf.open("some.file", filetype="xps")
.. note::
|PyMuPDF| itself does not try to determine the file type from the file contents. **You** are responsible for supplying the file type information in some way -- either implicitly, via the file extension, or explicitly as shown with the `filetype` parameter. There are pure :title:`Python` packages like `filetype <https://pypi.org/project/filetype/>`_ that help you doing this. Also consult the :ref:`Document` chapter for a full description.
If |PyMuPDF| encounters a file with an unknown / missing extension, it will try to open it as a |PDF|. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify `doc=pymupdf.open(stream=mem_area)` to open it as a |PDF| document.
If you attempt to open an unsupported file then |PyMuPDF| will throw a file data error.
----------
Opening Remote Files
~~~~~~~~~~~~~~~~~~~~~~~~~~
For remote files on a server (i.e. non-local files), you will need to *stream* the file data to |PyMuPDF|.
For example use the `requests <https://requests.readthedocs.io/en/latest/>`_ library as follows:
.. code-block:: python
import pymupdf
import requests
r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)
Opening Files from Cloud Services
""""""""""""""""""""""""""""""""""""""
For further examples which deal with files held on typical cloud services please see these `Cloud Interactions code snippets <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/cloud-interactions>`_.
----------
Opening Files as Text
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|PyMuPDF| has the capability to open any plain text file as a document. In order to do this you should provide the `filetype` parameter for the `pymupdf.open` function as `"txt"`.
.. code-block:: python
doc = pymupdf.open("my_program.py", filetype="txt")
In this way you are able to open a variety of file types and perform the typical **non-PDF** specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your `txt` content, then saving as |PDF| or merging with other |PDF| files is no problem.
Examples
""""""""""""""""""
Opening a `C#` file
...........................
.. code-block:: python
doc = pymupdf.open("MyClass.cs", filetype="txt")
Opening an ``XML`` file
...........................
.. code-block:: python
doc = pymupdf.open("my_data.xml", filetype="txt")
Opening a `JSON` file
...........................
.. code-block:: python
doc = pymupdf.open("more_of_my_data.json", filetype="txt")
And so on!
As you can imagine many text based file formats can be *very simply opened* and *interpreted* by |PyMuPDF|. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.
.. include:: footer.rst
|