File: filters.rst

package info (click to toggle)
pikepdf 6.0.0%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 6,600 kB
sloc: python: 8,440; cpp: 4,645; makefile: 309; sh: 47
file content (108 lines) | stat: -rw-r--r-- 2,879 bytes
Content streams
***************

In PDF, drawing operations are all performed in content streams that describe
the positioning and drawing order of all graphics (including text, images and
vector drawing).

.. seealso::
    :ref:`working_with_content_streams`

pikepdf (and libqpdf) provide two tools for interpreting content streams:
a parser and filter. The parser returns higher level information, conveniently
grouping all commands with their operands. The parser is useful when one wants
to retrieve information from a content stream, such as determine the position
of an element. The parser should not be used to edit or reconstruct the content
stream because some subtleties are lost in parsing.

The token filter works at a lower level, considering each token including
comments, and distinguishing different types of spaces. This allows modifying
content streams. A TokenFilter must be subclassed; the specialized version
describes how it should transform the stream of tokens.

Content stream parsers
----------------------

.. autofunction:: pikepdf.parse_content_stream

.. autofunction:: pikepdf.unparse_content_stream


Content stream token filters
----------------------------

.. autoclass:: pikepdf.Token
    :members:

.. class:: pikepdf.TokenType

    When filtering content streams, each token is labeled according to the role
    in plays.

    **Standard tokens**

    .. attribute:: array_open

    .. attribute:: array_close

    .. attribute:: brace_open

    .. attribute:: brace_close

    .. attribute:: dict_open

    .. attribute:: dict_close

        These tokens mark the start and end of an array, text string, and
        dictionary, respectively.

    .. attribute:: integer

    .. attribute:: real

    .. attribute:: null

    .. attribute:: bool

        The token data represents an integer, real number, null or boolean,
        respectively.

    .. attribute:: name_

        The token is the name (pikepdf.Name) of an object. In practice, these
        are among the most interesting tokens.

        .. versionchanged:: 3.0
            In versions older than 3.0, ``.name`` was used instead. This interfered
            with semantics of the ``Enum`` object, so this was fixed.

    .. attribute:: inline_image

        An inline image in the content stream. The whole inline image is
        represented by the single token.

    **Lexical tokens**

    .. attribute:: comment

        Signifies a comment that appears in the content stream.

    .. attribute:: word

        Otherwise uncategorized bytes are returned as ``word`` tokens. PDF
        operators are words.

    .. attribute:: bad

        An invalid token.

    .. attribute:: space

        Whitespace within the content stream.

    .. attribute:: eof

        Denotes the end of the tokens in this content stream.

.. autoclass:: pikepdf.TokenFilter
    :members: