1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
|
<html>
<head>
<title>mistletoe | version 0.5.2</title>
<meta charset="UTF-8" />
<meta name="description" content="A fast, extensible Markdown parser in Python." />
<meta name="keywords" content="Markdown,Python,LaTeX,HTML" />
<meta name="author" content="Mi Yu" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" href="style.css" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body><h1>mistletoe<img src='https://cdn.rawgit.com/miyuchina/mistletoe/master/resources/logo.svg' align='right'></h1>
<p><a href="https://travis-ci.org/miyuchina/mistletoe"><img src="https://img.shields.io/travis/miyuchina/mistletoe.svg?style=flat-square" title="" alt="Build Status"></a>
<a href="https://coveralls.io/github/miyuchina/mistletoe%3Fbranch%3Dmaster"><img src="https://img.shields.io/coveralls/miyuchina/mistletoe.svg?style=flat-square" title="" alt="Coverage Status"></a>
<a href="https://pypi.python.org/pypi/mistletoe"><img src="https://img.shields.io/pypi/v/mistletoe.svg?style=flat-square" title="" alt="PyPI"></a>
<a href="https://pypi.python.org/pypi/mistletoe"><img src="https://img.shields.io/pypi/wheel/mistletoe.svg?style=flat-square" title="" alt="is wheel"></a>
</p>
<p>mistletoe is a Markdown parser in pure Python, designed to be fast, modular
and fully customizable.
</p>
<p>mistletoe is not simply a Markdown-to-HTML transpiler. It is designed, from
the start, to parse Markdown into an abstract syntax tree. You can swap out
renderers for different output formats, without touching any of the core
components.
</p>
<p>Remember to spell mistletoe in lowercase!
</p>
<h2>Features
</h2>
<ul>
<li><p><strong>Fast</strong>: mistletoe is as fast as the <a href="https://github.com/lepture/mistune">fastest implementation</a>
currently available: that is, over 4 times faster than
<a href="https://github.com/waylan/Python-Markdown">Python-Markdown</a>, and much faster than
<a href="https://github.com/trentm/python-markdown2">Python-Markdown2</a>.
See the <a href="#performance">performance</a> section for details.
</p>
</li>
<li><p><strong>Modular</strong>: mistletoe is designed with modularity in mind. Its initial
goal is to provide a clear and easy API to extend upon.
</p>
</li>
<li><strong>Customizable</strong>: as of now, mistletoe can render Markdown documents to LaTeX, HTML and an abstract syntax tree out of the box. Writing a new renderer for mistletoe is a relatively trivial task.</li>
</ul>
<h2>Installation
</h2>
<p>mistletoe requires Python 3.3 and above, including Python 3.7, the current
development branch. It is also tested on PyPy 5.8.0. Install mistletoe with
pip:
</p>
<pre><code class="lang-sh">pip3 install mistletoe
</code></pre>
<p>Alternatively, clone the repo:
</p>
<pre><code class="lang-sh">git clone https://github.com/miyuchina/mistletoe.git
cd mistletoe
pip3 install -e .
</code></pre>
<p>See the <a href="contributing.html">contributing</a> doc for how to contribute to mistletoe.
</p>
<h2>Usage
</h2>
<h3>Basic usage</h3>
<p>Here's how you can use mistletoe in a Python script:
</p>
<pre><code class="lang-python">import mistletoe
with open('foo.md', 'r') as fin:
rendered = mistletoe.markdown(fin)
</code></pre>
<p><code>mistletoe.markdown()</code> uses mistletoe's default settings: allowing HTML mixins
and rendering to HTML. The function also accepts an additional argument
<code>renderer</code>. To produce LaTeX output:
</p>
<pre><code class="lang-python">import mistletoe
from mistletoe.latex_renderer import LaTeXRenderer
with open('foo.md', 'r') as fin:
rendered = mistletoe.markdown(fin, LaTeXRenderer)
</code></pre>
<p>Finally, here's how you would manually specify extra tokens and a renderer
for mistletoe. In the following example, we use <code>HtmlRenderer</code> to render
the AST, which adds <code>HtmlBlock</code> and <code>HtmlSpan</code> to the normal parsing
process.
</p>
<pre><code class="lang-python">from mistletoe import Document, HtmlRenderer
with open('foo.md', 'r') as fin:
with HtmlRenderer() as renderer:
rendered = renderer.render(Document(fin))
</code></pre>
<h3>From the command-line</h3>
<p>pip installation enables mistletoe's commandline utility. Type the following
directly into your shell:
</p>
<pre><code class="lang-sh">mistletoe foo.md
</code></pre>
<p>This will transpile <code>foo.md</code> into HTML, and dump the output to stdout. To save
the HTML, direct the output into a file:
</p>
<pre><code class="lang-sh">mistletoe foo.md > out.html
</code></pre>
<p>You can pass in custom renderers by including the full path to your renderer
class after a <code>-r</code> or <code>--renderer</code> flag:
</p>
<pre><code class="lang-sh">mistletoe foo.md --renderer custom_renderer.CustomRenderer
</code></pre>
<p>Running <code>mistletoe</code> without specifying a file will land you in interactive
mode. Like Python's REPL, interactive mode allows you to test how your
Markdown will be interpreted by mistletoe:
</p>
<pre><code>mistletoe [version 0.5.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> some **bold text**
... and some *italics*
... ^D
<html>
<body>
<p>some <strong>bold text</strong> and some <em>italics</em></p>
</body>
</html>
>>>
</code></pre>
<p>The interactive mode also accepts the <code>--renderer</code> flag.
</p>
<h2>Performance
</h2>
<p>mistletoe is the fastest Markdown parser implementation available in pure
Python; that is, on par with <a href="https://github.com/lepture/mistune">mistune</a>. Try the benchmarks yourself by
running:
</p>
<pre><code class="lang-sh">python3 test/benchmark.py
</code></pre>
<p>One of the significant bottlenecks of mistletoe compared to mistune, however,
is the function overhead. Because, unlike mistune, mistletoe chooses to split
functionality into modules, function lookups can take significantly longer than
mistune.
</p>
<p>To boost the performance further, it is suggested to use PyPy with mistletoe.
Benchmark results show that on PyPy, mistletoe is about <strong>twice as fast</strong> as
mistune:
</p>
<pre><code class="lang-sh">$ pypy3 test/benchmark.py mistune mistletoe
Test document: test/samples/syntax.md
Test iterations: 1000
Running tests with mistune, mistletoe...
========================================
mistune: 13.524028996936977
mistletoe: 6.477352762129158
</code></pre>
<p>The above result was achieved on PyPy 5.8.0-beta0, on a 13-inch Retina MacBook
Pro (Early 2015).
</p>
<h2>Developer's Guide
</h2>
<p>Here's an example to add GitHub-style wiki links to the parsing process,
and provide a renderer for this new token.
</p>
<h3>A new token</h3>
<p>GitHub wiki links are span-level tokens, meaning that they reside inline,
and don't really look like chunky paragraphs. To write a new span-level
token, all we need to do is make a subclass of <code>SpanToken</code>:
</p>
<pre><code class="lang-python">from mistletoe.span_token import SpanToken
class GithubWiki(SpanToken):
pass
</code></pre>
<p>mistletoe uses regular expressions to search for span-level tokens in the
parsing process. As a refresher, GitHub wiki looks something like this:
<code>[[alternative text | target]]</code>. We define a class variable, <code>pattern</code>,
that stores the compiled regex:
</p>
<pre><code class="lang-python">class GithubWiki(SpanToken):
pattern = re.compile(r"\[\[ *(.+?) *\| *(.+?) *\]\]")
def __init__(self, match_obj):
pass
</code></pre>
<p>For spiritual guidance on regexes, refer to <a href="https://xkcd.com/208/">xkcd</a> classics. For an
actual representation of this author parsing Markdown with regexes, refer
to this brilliant <a href="http://www.greghendershott.com/img/grumpy-regexp-parser.png">meme</a> by <a href="http://www.greghendershott.com/2013/11/markdown-parser-redesign.html">Greg Hendershott</a>.
</p>
<p>mistletoe's span-level tokenizer will search for our pattern. When it finds
a match, it will pass in the match object as argument into our constructor.
We have defined our regex so that the first match group is the alternative
text, and the second one is the link target.
</p>
<p>Note that alternative text can also contain other span-level tokens. For
example, <code>[[*alt*|link]]</code> is a GitHub link with an <code>Emphasis</code> token as its
child. To parse child tokens, simply pass <code>match_obj</code> to the <code>super</code>
constructor (which assumes children to be in <code>match_obj.group(1)</code>),
and save off all the additional attributes we need:
</p>
<pre><code class="lang-python">from mistletoe.span_token import SpanToken
class GithubWiki(SpanToken):
pattern = re.compile(r"\[\[ *(.+?) *\| *(.+?) *\]\]")
def __init__(self, match_obj):
super().__init__(match_obj)
self.target = match_obj.group(2)
</code></pre>
<p>There you go: a new token in 7 lines of code.
</p>
<h3>A new renderer</h3>
<p>Adding a custom token to the parsing process usually involves a lot
of nasty implementation details. Fortunately, mistletoe takes care
of most of them for you. Simply pass your custom token class to
<code>super().__init__()</code> does the trick:
</p>
<pre><code class="lang-python">from mistletoe.html_renderer import HtmlRenderer
class GithubWikiRenderer(HtmlRenderer):
def __init__(self):
super().__init__(GithubWiki)
</code></pre>
<p>We then only need to tell mistletoe how to render our new token:
</p>
<pre><code class="lang-python">def render_github_wiki(self, token):
template = '<a href="{target}">{inner}</a>'
target = token.target
inner = self.render_inner(token)
return template.format(target=target, inner=inner)
</code></pre>
<p>Cleaning up, we have our new renderer class:
</p>
<pre><code class="lang-python">from mistletoe.html_renderer import HtmlRenderer, escape_url
class GithubWikiRenderer(HtmlRenderer):
def __init__(self):
super().__init__(GithubWiki)
def render_github_wiki(self, token):
template = '<a href="{target}">{inner}</a>'
target = escape_url(token.target)
inner = self.render_inner(token)
return template.format(target=target, inner=inner)
</code></pre>
<h3>Take it for a spin?</h3>
<p>It is preferred that all mistletoe's renderers be used as context managers.
This is to ensure that your custom tokens are cleaned up properly, so that
you can parse other Markdown documents with different token types in the
same program.
</p>
<pre><code class="lang-python">from mistletoe import Document
from contrib.github_wiki import GithubWikiRenderer
with open('foo.md', 'r') as fin:
with GithubWikiRenderer() as renderer:
rendered = renderer.render(Document(fin))
</code></pre>
<p>For more info, take a look at the <code>base_renderer</code> module in mistletoe.
The docstrings might give you a more granular idea of customizing mistletoe
to your needs.
</p>
<h2>Why mistletoe?
</h2>
<p>For me, the question becomes: why not <a href="https://github.com/lepture/mistune">mistune</a>? My original
motivation really has nothing to do with starting a competition. Here's a list
of reasons I created mistletoe in the first place:
</p>
<ul>
<li>I am interested in a Markdown-to-LaTeX transpiler in Python.</li>
<li>I want to write more Python.</li>
<li>"How hard could it be?"</li>
<li>"For fun," says David Beazley.</li>
</ul>
<p>Here's two things mistune inspired mistletoe to do:
</p>
<ul>
<li>Markdown parsers should be fast, and other parser implementations in Python leaves much to be desired.</li>
<li>A parser implementation for Markdown does not need to restrict itself to one flavor of Markdown.</li>
</ul>
<p>Here's two things mistletoe does differently from mistune:
</p>
<ul>
<li>Per its <a href="https://github.com/lepture/mistune">readme</a>, mistune will always be a single-file script. mistletoe breaks its functionality into modules.</li>
<li>mistune, as of now, can only render Markdown into HTML. It is relatively trivial to write a new renderer for mistletoe.</li>
<li>Unlike mistune, mistletoe is pushing for some extent of spec compliance with CommonMark.</li>
</ul>
<p>The implications of these are quite profound, and there's no definite
this-is-better-than-that answer. Mistune is near perfect if one wants what
it provides: I have used mistune extensively in the past, and had a great
experience. If you want more control, however, give mistletoe a try.
</p>
<h2>Copyright & License
</h2>
<ul>
<li>mistletoe's logo uses artwork by Daniele De Santis, under <a href="https://creativecommons.org/licenses/by/3.0/us/">CC BY 3.0</a>.</li>
<li>mistletoe is released under <a href="LICENSE">MIT</a>.</li>
</ul>
</body></html>
|