File: formatterdevelopment.rst

package info (click to toggle)
pygments 2.0.1%2Bdfsg-1.1%2Bdeb8u1
links: PTS, VCS
area: main
in suites: jessie
size: 3,856 kB
ctags: 4,394
sloc: python: 56,341; makefile: 211; sh: 95
file content (169 lines) | stat: -rw-r--r-- 6,183 bytes
parent folder | download | duplicates (10)
.. -*- mode: rst -*-

========================
Write your own formatter
========================

As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new
formatter for Pygments is easy and straightforward.

A formatter is a class that is initialized with some keyword arguments (the
formatter options) and that must provides a `format()` method.
Additionally a formatter should provide a `get_style_defs()` method that
returns the style definitions from the style in a form usable for the
formatter's output format.


Quickstart
==========

The most basic formatter shipped with Pygments is the `NullFormatter`. It just
sends the value of a token to the output stream:

.. sourcecode:: python

    from pygments.formatter import Formatter

    class NullFormatter(Formatter):
        def format(self, tokensource, outfile):
            for ttype, value in tokensource:
                outfile.write(value)

As you can see, the `format()` method is passed two parameters: `tokensource`
and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
the latter a file like object with a `write()` method.

Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
method.


Styles
======

Styles aren't instantiated but their metaclass provides some class functions
so that you can access the style definitions easily.

Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
is a token and `d` is a dict with the following keys:

``'color'``
    Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
    defined.

``'bold'``
    `True` if the value should be bold

``'italic'``
    `True` if the value should be italic

``'underline'``
    `True` if the value should be underlined

``'bgcolor'``
    Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
    gray) or `None` if not defined.

``'border'``
    Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
    blue) or `None` for no border.

Additional keys might appear in the future, formatters should ignore all keys
they don't support.


HTML 3.2 Formatter
==================

For an more complex example, let's implement a HTML 3.2 Formatter. We don't
use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
style this formatter isn't in the standard library ;-)

.. sourcecode:: python

    from pygments.formatter import Formatter

    class OldHtmlFormatter(Formatter):

        def __init__(self, **options):
            Formatter.__init__(self, **options)

            # create a dict of (start, end) tuples that wrap the
            # value of a token so that we can use it in the format
            # method later
            self.styles = {}

            # we iterate over the `_styles` attribute of a style item
            # that contains the parsed style values.
            for token, style in self.style:
                start = end = ''
                # a style item is a tuple in the following form:
                # colors are readily specified in hex: 'RRGGBB'
                if style['color']:
                    start += '<font color="#%s">' % style['color']
                    end = '</font>' + end
                if style['bold']:
                    start += '<b>'
                    end = '</b>' + end
                if style['italic']:
                    start += '<i>'
                    end = '</i>' + end
                if style['underline']:
                    start += '<u>'
                    end = '</u>' + end
                self.styles[token] = (start, end)

        def format(self, tokensource, outfile):
            # lastval is a string we use for caching
            # because it's possible that an lexer yields a number
            # of consecutive tokens with the same token type.
            # to minimize the size of the generated html markup we
            # try to join the values of same-type tokens here
            lastval = ''
            lasttype = None

            # wrap the whole output with <pre>
            outfile.write('<pre>')

            for ttype, value in tokensource:
                # if the token type doesn't exist in the stylemap
                # we try it with the parent of the token type
                # eg: parent of Token.Literal.String.Double is
                # Token.Literal.String
                while ttype not in self.styles:
                    ttype = ttype.parent
                if ttype == lasttype:
                    # the current token type is the same of the last
                    # iteration. cache it
                    lastval += value
                else:
                    # not the same token as last iteration, but we
                    # have some data in the buffer. wrap it with the
                    # defined style and write it to the output file
                    if lastval:
                        stylebegin, styleend = self.styles[lasttype]
                        outfile.write(stylebegin + lastval + styleend)
                    # set lastval/lasttype to current values
                    lastval = value
                    lasttype = ttype

            # if something is left in the buffer, write it to the
            # output file, then close the opened <pre> tag
            if lastval:
                stylebegin, styleend = self.styles[lasttype]
                outfile.write(stylebegin + lastval + styleend)
            outfile.write('</pre>\n')

The comments should explain it. Again, this formatter doesn't override the
`get_style_defs()` method. If we would have used CSS classes instead of
inline HTML markup, we would need to generate the CSS first. For that
purpose the `get_style_defs()` method exists:


Generating Style Definitions
============================

Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
output inline markup but reference either macros or css classes. Because
the definitions of those are not part of the output, the `get_style_defs()`
method exists. It is passed one parameter (if it's used and how it's used
is up to the formatter) and has to return a string or ``None``.