1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245
|
Support models
**************
Support models are abstracts over "raw" objects within a Pdf. For example, a page
in a PDF is a Dictionary with set to ``/Type`` of ``/Page``. The Dictionary in
that case is the "raw" object. Upon establishing what type of object it is, we
can wrap it with a support model that adds features to ensure consistency with
the PDF specification.
In version 2.x, did not apply support models to "raw" objects automatically.
Version 3.x automatically applies support models to ``/Page`` objects.
.. autoclass:: pikepdf.ObjectHelper
:members:
.. autoclass:: pikepdf.Page
:members:
:inherited-members:
Support model wrapper around a page dictionary object.
.. autoclass:: pikepdf.PdfMatrix
:members:
.. attribute:: a
.. attribute:: b
.. attribute:: c
.. attribute:: d
.. attribute:: e
.. attribute:: f
Return one of the six "active values" of the affine matrix. ``e`` and ``f``
correspond to x- and y-axis translation respectively. The other four
letters are a 2×2 matrix that can express rotation, scaling and skewing;
``a=1 b=0 c=0 d=1`` is the identity matrix.
.. autoclass:: pikepdf.PdfImage
:inherited-members:
.. autoclass:: pikepdf.PdfInlineImage
.. autoclass:: pikepdf.models.PdfMetadata
:members:
.. autoclass:: pikepdf.models.Encryption
:members:
.. autoclass:: pikepdf.models.Outline
:members:
.. autoclass:: pikepdf.models.OutlineItem
:members:
.. autoclass:: pikepdf.Permissions
:members:
.. class:: pikepdf.models.EncryptionMethod
Describes which encryption method was used on a particular part of a
PDF. These values are returned by :class:`pikepdf.EncryptionInfo` but
are not currently used to specify how encryption is requested.
.. attribute:: none
Data was not encrypted.
.. attribute:: unknown
An unknown algorithm was used.
.. attribute:: rc4
The RC4 encryption algorithm was used (obsolete).
.. attribute:: aes
The AES-based algorithm was used as described in the |pdfrm|.
.. attribute:: aesv3
An improved version of the AES-based algorithm was used as described in the
:doc:`Adobe Supplement to the ISO 32000 </references/resources>`, requiring
PDF 1.7 extension level 3. This algorithm still uses AES, but allows both
AES-128 and AES-256, and improves how the key is derived from the password.
.. autoclass:: pikepdf.models.EncryptionInfo
:members:
.. autoclass:: pikepdf.Annotation
:members:
Describes an annotation in a PDF, such as a comment, underline, copy editing marks,
interactive widgets, redactions, 3D objects, sound and video clips.
See the |pdfrm| section 12.5.6 for the full list of annotation types
and definition of terminology.
.. versionadded:: 2.12
.. autoclass:: pikepdf._qpdf.Attachments
:members:
This interface provides access to any files that are attached to this PDF,
exposed as a Python :class:`collections.abc.MutableMapping` interface.
The keys (virtual filenames) are always ``str``, and values are always
:class:`pikepdf.AttachedFileSpec`.
Use this interface through :attr:`pikepdf.Pdf.attachments`.
.. versionadded:: 3.0
.. autoclass:: pikepdf.AttachedFileSpec
:members:
:inherited-members:
:special-members: __init__
In a PDF, a file specification provides name and metadata for a target file.
Most file specifications are *simple* file specifications, and contain only
one attached file. Call :meth:`get_file` to get the attached file:
.. code-block:: python
pdf = Pdf.open(...)
fs = pdf.attachments['example.txt']
stream = fs.get_file()
To attach a new file to a PDF, you may construct a ``AttachedFileSpec``.
.. code-block:: python
pdf = Pdf.open(...)
fs = AttachedFileSpec.from_filepath(pdf, Path('somewhere/spreadsheet.xlsx'))
pdf.attachments['spreadsheet.xlsx'] = fs
PDF supports the concept of having multiple, platform-specialized versions of the
attached file (similar to resource forks on some operating systems). In theory,
this attachment ought to be the same file, but
encoded in different ways. For example, perhaps a PDF includes a text file encoded
with Windows line endings (``\r\n``) and a different one with POSIX line endings
(``\n``). Similarly, PDF allows for the possibility that you need to encode
platform-specific filenames. pikepdf cannot directly create these, because they
are arguably obsolete; it can provide access to them, however.
If you have to deal with platform-specialized versions,
use :meth:`get_all_filenames` to enumerate those available.
Described in the |pdfrm| section 7.11.3.
.. versionadded:: 3.0
.. autoclass:: pikepdf._qpdf.AttachedFile
:members:
:inherited-members:
An object that contains an actual attached file. These objects do not need
to be created manually; they are normally part of an AttachedFileSpec.
.. versionadded:: 3.0
.. autoclass:: pikepdf.NameTree
:members:
An object for managing *name tree* data structures in PDFs.
A name tree is a key-value data structure. The keys are any binary strings
(that is, Python ``bytes``). If ``str`` selected is provided as a key,
the UTF-8 encoding of that string is tested. Name trees are (confusingly)
not indexed by ``pikepdf.Name`` objects. They behave like
``DictMapping[bytes, pikepdf.Object]``.
The keys are sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a name tree can be a fairly complex tree data structure
implemented with many dictionaries and arrays. pikepdf (using libqpdf)
will automatically read, repair and maintain this tree for you. There should not
be any reason to access the internal nodes of a number tree; use this
interface instead.
NameTrees are used to store certain objects like file attachments in a PDF.
Where a more specific interface exists, use that instead, and it will
manipulate the name tree in a semantic correct manner for you.
Do not modify the internal structure of a name tree while you have a
``NameTree`` referencing it. Access it only through the ``NameTree`` object.
Names trees are described in the |pdfrm| section 7.9.6. See section 7.7.4
for a list of PDF objects that are stored in name trees.
.. versionadded:: 3.0
.. autoclass:: pikepdf.NumberTree
:members:
An object for managing *number tree* data structures in PDFs.
A number tree is a key-value data structure, like name trees, except that the
key is an integer. It behaves like ``Dict[int, pikepdf.Object]``.
The keys can be sparse - not all integers positions will be populated. Keys
are also always sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a number tree can be a fairly complex tree data structure
implemented with many dictionaries and arrays. pikepdf (using libqpdf)
will automatically read, repair and maintain this tree for you. There should not
be any reason to access the internal nodes of a number tree; use this
interface instead.
NumberTrees are not used much in PDF. The main thing they provide is a mapping
between 0-based page numbers and user-facing page numbers (which pikepdf
also exposes as ``Page.label``). The ``/PageLabels`` number tree is where the
page numbering rules are defined.
Number trees are described in the |pdfrm| section 7.9.7. See section 12.4.2
for a description of the page labels number tree. Here is an example of modifying
an existing page labels number tree:
.. code-block:: python
pagelabels = NumberTree(pdf.Root.PageLabels)
# Label pages starting at 0 with lowercase Roman numerals
pagelabels[0] = Dictionary(S=Name.r)
# Label pages starting at 6 with decimal numbers
pagelabels[6] = Dictionary(S=Name.D)
# Page labels will now be:
# i, ii, iii, iv, v, 1, 2, 3, ...
Do not modify the internal structure of a name tree while you have a
``NumberTree`` referencing it. Access it only through the ``NumberTree`` object.
.. versionadded:: 5.4
|