1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<!-- $Id: ptools.html,v 6.3 2012-01-09 14:22:20 deraugla Exp $ -->
<!-- Copyright (c) INRIA 2007-2012 -->
<title>parsing and printing tools</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="styles/base.css"
title="Normal" />
</head>
<body>
<div id="menu">
</div>
<div id="content">
<h1 class="top">Parsing and Printing tools</h1>
<p>Camlp5 provides two parsing tools:</p>
<ul>
<li>stream parsers</li>
<li>extensible grammars</li>
</ul>
<p>The first parsing tool, the stream parsers, is the elementary
system. It is pure syntactic sugar, i.e. the code is directly
converted into basic OCaml statements: essentially functions,
pattern matchings, try. A stream parser is just a function. But the
system does not manage associativity, nor parsing level. Left
recursion results on infinite loops, just like functions whose first
action would be a call to itself.</p>
<p>The second parsing tool, the extensible grammars, are more
sophisticated. A grammar written with them is more readable, and
look like grammars written with tools like "yacc". They take care of
associativity, left recursion, and level of parsing. They are
dynamically extensible, what allows the syntax extensions what
Camlp5 provides for OCaml syntax.</p>
<p>In both cases, the input data are streams.</p>
<p>Camlp5 also provides:</p>
<ul>
<li>a pretty printing module</li>
<li>extensible printers</li>
</ul>
<p>The next sections give an overview of the parsing and printing
tools.</p>
<div id="tableofcontents">
</div>
<h2>Stream parsers</h2>
<p>The stream parsers is a system of recursive descendant
parsing. Streams are actually lazy lists. At each step, the head of
the list is compared against a <em>stream pattern</em>. There are
three kinds of streams parsers:</p>
<ul>
<li>The imperative <a href="parsers.html">streams parsers</a>, where
the elements are removed from the stream as long as they are
parsed. Parsers return either:
<ul>
<li>A value, in case of success,</li>
<li>The exception "<tt>Stream.Failure</tt>" when the parser does
not apply and no elements have been removed from the stream,
indicating that, possibly, other parsers may apply,</li>
<li>The exception "<tt>Stream.Error</tt>" when the parser does
not apply, but one or several elements have been removed from
the stream, indicating that nothing can to be done to make up
the error.</li>
</ul>
</li>
<li>The <a href="fparsers.html">functional stream parsers</a>
where the elements are not removed from the stream during the
parsing. These parsers return a value of type "option", i.e
either:
<ul>
<li>"Some" a value and the remaining stream, in case of
success,</li>
<li>"None", in case of failure.</li>
</ul>
</li>
<li>The <a href="bparsers.html">backtracking stream parsers</a>
which are like the functional stream parsers but with a backtracking
algorithm, testing all possibilities. These parsers also return a
value of type "option" different from the functional stream parsers,
i.e either:
<ul>
<li>"Some" a value, the remaining stream and a continuation, in
case of success,</li>
<li>"None", in case of failure.</li>
</ul>
</li>
</ul>
<p>The differences are about:</p>
<ul>
<li><span class="u">Syntax errors</span>: in the imperative version,
the location of the error is clear, it is at the current position
of the stream, and the system provides a specific error message
(typically, that some "element" was "expected"). On the other
hand, in the functional and backtracking version, the position is
not clear since it returns nothing and the initial stream is
unaffected. The only solution to know where the error happened is
to analyze that stream to see how many elements have be
unfrozen. No clear error message is available, just "syntax error"
(but this could be improved in a future version).</li>
<li><span class="u">Power</span>: in the imperative version, when a
rule raises the exception "<tt>Stream.Error</tt>", the parsing
cannot continue. In the functional version, the parsing can
continue by analyzing the next rule with the initial unaffected
stream: this is <em>limited backtrack</em>. In the backtracking
version, more powerful, the parsing continues by analyzing the
next case of the previous symbol of the rule; moreover it is
possible to get the list of all possible solutions.</li>
<li><span class="u">Neatness</span>: functional streams are neater,
just like functional programming is neater than imperative
programming.</li>
</ul>
<p>The imperative parsers implement what is called "predictive
parsing", i.e. recursive descendant parsing without backtrack.</p>
<p>In the imperative version, there also exist
<a href="lexers.html">lexers</a>, a shorter syntax when the stream
elements are of the specific type '<tt>char</tt>'.</p>
<h2>Extensible grammars</h2>
<p>Extensible grammars manipulate <em>grammar entries</em>. Grammar
entries are abstract values internally containing mutable stream
parsers. When a grammar entry is created, its internal parser is
empty, i.e. it always fails when used. A specific syntactic
construction, with the keyword "<tt>EXTEND</tt>" allows one to
extend grammar entries with new grammar rules.</p>
<p>In opposition to stream parsers, grammar entries manage
associativity, left factorization, and levels. Moreover, these
grammars allow optional calls, lists and lists with separators. They
are not however functions and hence cannot have parameters.</p>
<p>Since the internal system is stream parsers, extensible grammars
use recursive descendant parsing.</p>
<p>The parsers of the OCaml language in Camlp5 are written with
extensible grammars.</p>
<h2>Pretty module</h2>
<p>The "<tt>Pretty</tt>" module is an original tool allowing control
over the displaying of lines. The user must specify two functions
where:</p>
<ul>
<li>the data is printed on a single line</li>
<li>the data is printed on several lines</li>
</ul>
<p>The system first tries to print on a single line. At any time, if
the line overflows, i.e. if its size is greater than some "line
length" specified in the module interface, or if it contains
newlines, the function is aborted and control is given to the second
function, to print on several lines.</p>
<p>This is a basic, but powerful, system. It supposes that the programmer
takes care of the current indentation, and the beginning and the end of
its lines.</p>
<p>The module will be extended in the future to hide the management of
indendations and line continuations, and by the supply of functions
combinating the two cases above, in which the programmer can specify
the possible places where newlines can be inserted.</p>
<h2>Extensible printers</h2>
<p>The extensible printers are symmetric to the extensible grammars.
The extensible grammars take syntax rules and return syntax trees.
The extensible printers are actually extensible functions taking
syntax trees as parameters and returning the pretty printed
statements in strings.</p>
<p>The extensible printers can have printing levels, just like
grammars have parsing levels, and it is possible to take the
associativity into account by provided functions to call either the
current level or the next level.</p>
<p>The printers of the OCaml language are written with extensible
printers.</p>
<div class="trailer">
</div>
</div>
</body>
</html>
|