1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta http-equiv="CONTENT-TYPE"
content="text/html; charset=windows-1252">
<title>Processing Result Trees</title>
<meta name="GENERATOR" content="OpenOffice.org 641 (Win32)">
<meta name="AUTHOR" content="Mike Fletcher">
<meta name="CREATED" content="20020705;13585200">
<meta name="CHANGEDBY" content="Mike Fletcher">
<meta name="CHANGED" content="20020706;20023400">
<link rel="stylesheet" type="text/css" href="sitestyle.css">
</head>
<body lang="en-US">
<h1>Processing Result Trees</h1>
<p>SimpleParse parsers generate tree structures describing the structure
of your parsed content. This document briefly describes the structures, a
simple mechanism for processing the structures, and ways to alter the structures
as they are generated to accomplish specific goals.</p>
<p>Prerequisites:</p>
<ul>
<li> Python 2.x programming </li>
<li> Familiarity with creating SimpleParse 2.0 Parsers
(see: <a href="scanning_with_simpleparse.html">Scanning with SimpleParse</a>)
</li>
</ul>
<h2>Standard Result Trees</h2>
<p>SimpleParse uses the same result format as is used for the underlying
mx.TextTools engine. The engine returns a three-item tuple from the parsing
of the top-level (root) production like so:</p>
<pre>success, resultTrees, nextCharacter = myParser.parse( someText, processor=None)</pre>
<p> Success is a Boolean value indicating whether the production (by default
the root production) matched (was satisfied) at all. If success is
true, nextCharacter is an integer value indicating the next character
to be parsed in the text (i.e. someText[ startCharacter:nextCharacter
] was parsed).<br>
</p>
<p><i>[New in 2.0.0b2]</i> Note: If success is false, then nextCharacter is
set to the (very ill-defined) "error position", which is the position reached
by the last TextTools command in the top-level production before the entire
table failed. This is a lower-level value than is usefully predictable within
SimpleParse (for instance, negative results which cause a failure will actually
report the position after the positive version of the element token succeeds).
You might, I suppose, use it as a hint to your users of where the error
occured, but using error-on-fail SyntaxErrors is <b>by far</b> the prefered
method. Basically, if success is false, consider nextCharacter to contain
garbage data.<br>
</p>
<p>When the processor argument to parse is false (or a non-callable object),
the system does not attempt to use the default processing mechanism,
and returns the result trees directly. The standard format for result-tree
nodes is as follows:</p>
<pre>(production_name, start, stop, children_trees)</pre>
<p> Where start and stop represent indexes in the source text such that sourcetext
[ start: stop] is the text which matched this production. The <b>list
of children </b> is the list of a list of the result-trees for the child
productions within the production, or <b>None</b> (Note: that last is
important, you can't automatically do a "for" over the children_trees).<br>
</p>
<p>Expanded productions, as well as unreported productions (and the children
of unreported productions), will not appear in the result trees, neither
will the root production. See <a href="simpleparse_grammars.html">Understanding
SimpleParse Grammars</a> for details. However, LookAhead productions where
the non-lookahead value would normally return results, will return their
results in the position where the LookAhead is included in the grammar.</p>
<p>If the processor argument to parse is true and callable, the processor
object will be called with (success, resultTrees, nextCharacter) on completion
of parsing. The processor can then take whatever processing steps desired,
the return value from calling the processor with the results is returned
directly to the caller of parse.<br>
</p>
<h2>DispatchProcessor</h2>
<p>SimpleParse 2.0 provides a simple mechanism for processing result trees,
a recursive series of calls to attributes of a Processor object with
functions to automate the call-by-name dispatching. This processor implementation
is available for examination in the simpleparse.dispatchprocessor module.
The main functions are:</p>
<pre>def dispatch( source, tag, buffer ):<br> """Dispatch on source for tag with buffer<br><br> Find the attribute or key "tag-object" (tag[0]) of source,<br> then call it with (tag, buffer)<br> """<br>def dispatchList( source, taglist, buffer ):<br> """Dispatch on source for each tag in taglist with buffer"""<br><br>def multiMap( taglist, source=None, buffer=None ):<br> """Convert a taglist to a mapping from tag-object:[list-of-tags]<br> <br> For instance, if you have items of 3 different types, in any order,<br> you can retrieve them all sorted by type with multimap( childlist)<br> then access them by tagobject key.<br><br> If source and buffer are specified, call dispatch on all items.<br> """<br><br>def singleMap( taglist, source=None, buffer=None ):<br> """Convert a taglist to a mapping from tag-object:tag, <br> overwritting early with late tags. If source and buffer<br> are specified, call dispatch on all items."""<br><br>def getString( (tag, left, right, sublist), buffer):<br> """Return the string value of the tag passed"""<br><br>def lines( start=None, end=None, buffer=None ):<br> """Return number of lines in buffer[start:end]"""<br></pre>
<p>With a class <b>DispatchProcessor</b>, which provides a __call__ implementation
to trigger dispatching for both "called as root processor" and "called
to process an individual result element" cases.<br>
</p>
<p>You define a DispatchProcessor sub-class with methods named for each production
that will be processed by the processor, with signatures of:<br>
</p>
<pre>from simpleparse.dispatchprocessor import *<br>class MyProcessorClass( DispatchProcessor ):<br> def production_name( self, (tag,start,stop,subtags), buffer ):<br> """Process the given production and it's children"""<br></pre>
<p>Within those production-handling methods, you can call the dispatch functions
to process the sub-tags of the current production (keep in mind that the
sub-tags "list" may be a None object). You can see examples of this processing
methodology in simpleparse.simpleparsegrammar, simpleparse.common.iso_date
and simpleparse.common.strings (among others).<br>
</p>
<p>For real-world Parsers, where you normally use the same processing class
for all runs of the parser, you can define a default Processor class like
so:<br>
</p>
<p></p>
<pre>class MyParser( Parser ):<br> def buildProcessor( self ):<br> return MyProcessorClass()</pre>
<p>so that if no processor is explicitly specified in the parse call, your
"MyProcessorClass" instance will be used for processing the results.<br>
</p>
<h2><a name="nonstandardresulttrees"></a>Non-standard Result Trees (AppendMatch,
AppendToTagobj, AppendTagobj, CallTag)</h2>
<p>SimpleParse 2.0 introduced features which expose certain of the mx.TextTool
library's features for producing non-standard result trees. Although
not generally recommended for use in normal parsers, these features
are useful for certain types of text processing, and their exposure was
requested. Each flag has a different effect on the result tree, the particular
effects are discussed below.</p>
<p>The exposure is through the Processor (or more precisely, a super-class
of Processor called MethodSource) object. To specify the use of one
of the flags, you set an attribute in your MethodSource object (your
Processor object) with the name _m_productionname (for the method to
use, which is either an actual callable object for use with CallTag,
or one of the other mx.TextTools flag constants above). In the case
of AppendTagobj , you will likely want to specify a particular tagobj
object to be appended, you do that by setting an attribute named _o_productionname
in your MethodSource. For AppendToTagobj, you <b>must</b><span
style=""> specify an _o_productionname object with an append method.<br>
</span></p>
<p><span style="">Note: you can use MethodSource as your direct ancestor
if you want to define a non-standard result tree, but don't want to do any
processing of the results (this is the reason for having seperate classes).
MethodSource does not define a __call__ method.<br>
</span></p>
<h3>CallTag</h3>
<pre>_m_productionname = callableObject<code>(</code><br><code> taglist,</code><br><code> text,</code><br><code> left,</code><br><code> right,</code><br><code> subtags</code><br><code>)</code></pre>
<p> The given object/method is called on a successful match with the values
shown. The text argument is the entire text buffer being parsed, the
rest of the values are what you're accustomed to seeing in result tuples.</p>
<p>Notes:</p>
<ul>
<li> Nothing is (necessarily) added to the results
list when CallTag is specified! If you want something added, call taglist.append(
item ). </li>
<li> Raising an error in the CallTag method will
halt parsing. </li>
<li> The callableObject is accessed from
the MethodSource object using standard getattr, so if you are using
a function, it will need to define a self parameter for the first
position. </li>
</ul>
<h3>AppendToTagobj</h3>
<pre>_m_productionname = AppendToTagobj<br>_o_productionname = objectWithAppendMethod</pre>
<p> On a successful match, the system will call _o_productionname.append((None,l,r,subtags))
method. For some processing tasks, it's conceivable you might want
to use this method to pull out all instances of a production from a larger
(already-written) grammar where going through the whole results tree
to find the deeply nested productions is considered too involved.</p>
<p>Notes:</p>
<ul>
<li> Nothing is added to the results list when AppendToTagobj
is specified! </li>
<li> Raising an error in the AppendToTagobj method will
halt parsing. </li>
</ul>
<h3>AppendMatch</h3>
<pre>_m_productionname = AppendMatch</pre>
<p> On a successful match, the system will append the matched text to the
result tree, rather than a tuple of results. In situations where you
just want to extract the text, this can be useful. The downside is that
your results tree has a non-standard format that you need to explicitly
watch out for while processing the results.</p>
<h3>AppendTagobj</h3>
<pre>_m_productionname = AppendTagobj<br>_o_productionname = any object<br># object is optional, if omitted, the production name string is used</pre>
<p> On a successful match, the system will append the tagobject to the result
tree, rather than a tuple of results. In situations where you just
want notification that the production has matched (and it doesn't matter
what it matched), this can be useful. The downside, again, is that your
results tree has a non-standard format that you need to explicitly watch
out for while processing the results.</p>
<a href="index.html">Up to index...</a><br>
<br>
<br>
<br>
<br>
</body>
</html>
|