1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
|
Content streams
***************
In PDF, drawing operations are all performed in content streams that describe
the positioning and drawing order of all graphics (including text, images and
vector drawing).
.. seealso::
:ref:`working_with_content_streams`
pikepdf (and libqpdf) provide two tools for interpreting content streams:
a parser and filter. The parser returns higher level information, conveniently
grouping all commands with their operands. The parser is useful when one wants
to retrieve information from a content stream, such as determine the position
of an element. The parser should not be used to edit or reconstruct the content
stream because some subtleties are lost in parsing.
The token filter works at a lower level, considering each token including
comments, and distinguishing different types of spaces. This allows modifying
content streams. A TokenFilter must be subclassed; the specialized version
describes how it should transform the stream of tokens.
Content stream parsers
----------------------
.. autofunction:: pikepdf.parse_content_stream
.. autofunction:: pikepdf.unparse_content_stream
Content stream token filters
----------------------------
.. autoclass:: pikepdf.Token
:members:
.. class:: pikepdf.TokenType
When filtering content streams, each token is labeled according to the role
in plays.
**Standard tokens**
.. attribute:: array_open
.. attribute:: array_close
.. attribute:: brace_open
.. attribute:: brace_close
.. attribute:: dict_open
.. attribute:: dict_close
These tokens mark the start and end of an array, text string, and
dictionary, respectively.
.. attribute:: integer
.. attribute:: real
.. attribute:: null
.. attribute:: bool
The token data represents an integer, real number, null or boolean,
respectively.
.. attribute:: name_
The token is the name (pikepdf.Name) of an object. In practice, these
are among the most interesting tokens.
.. versionchanged:: 3.0
In versions older than 3.0, ``.name`` was used instead. This interfered
with semantics of the ``Enum`` object, so this was fixed.
.. attribute:: inline_image
An inline image in the content stream. The whole inline image is
represented by the single token.
**Lexical tokens**
.. attribute:: comment
Signifies a comment that appears in the content stream.
.. attribute:: word
Otherwise uncategorized bytes are returned as ``word`` tokens. PDF
operators are words.
.. attribute:: bad
An invalid token.
.. attribute:: space
Whitespace within the content stream.
.. attribute:: eof
Denotes the end of the tokens in this content stream.
.. autoclass:: pikepdf.TokenFilter
:members:
|