1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="language" content="fr">
<meta name="author" content="Logilab">
<meta name="organization" content="Logilab S.A.">
<meta name="generator" content="Logilab Powerful Stylesheets (v3)">
<title>XmlDiff API</title>
<meta name="keywords" content="logilab">
<link rel="stylesheet" href="http://www.logilab.fr/lglb-publi-content.css" type="text/css">
<link rel="stylesheet" href="http://www.logilab.fr/lglb-publi-structure.css" type="text/css">
</head>
<body>
<table class="header" cellspacing="0"><tbody><tr>
<td class="logo"><a href="http://www.logilab.fr/"><img src="http://www.logilab.fr/images/logilab.png" alt="Logilab" height="75"></a></td>
<td class="text"><div class="header-title">XmlDiff API</div></td>
</tr></tbody></table>
<div class="header-sep"></div>
<table class="main" cellspacing="0"><tbody><tr>
<td class="left-margin"></td>
<td class="body">
<div class="component-title-block"><div class="component-title">XmlDiff API</div></div>
<div class="sect1-title">
<a name="id2321560"></a>1. Contents</div>
<ul class="list">
<li class="listitem">
<div class="para">
<span class="error-message"> Link or reference ("mydifflib-py") to an inexistant target. </span>mydifflib.py</div>
</li>
<li class="listitem">
<div class="para">
<span class="error-message"> Link or reference ("input-py") to an inexistant target. </span>input.py</div>
</li>
<li class="listitem">
<div class="para">
<span class="error-message"> Link or reference ("fmes-py") to an inexistant target. </span>fmes.py</div>
</li>
<li class="listitem">
<div class="para">
<span class="error-message"> Link or reference ("ezs-py-depricated") to an inexistant target. </span>ezs.py ** DEPRICATED **</div>
</li>
<li class="listitem">
<div class="para">
<span class="error-message"> Link or reference ("format-py") to an inexistant target. </span>format.py</div>
</li>
</ul>
<div class="para">To use this package as a librarie, you need the provided python's
modules described below.</div>
<div class="sect1-title">
<a name="id2321641"></a>2. mydifflib.py</div>
<div class="para">provides functions for Longest Common Subsequence calculation.</div>
<div class="variablelist">
<div class="varlistentry">
<div class="varterm">
<span class="varname">lcs2(X, Y, equal):</span>
</div>
<div class="varlistitem">
<div class="para">apply the greedy lcs/ses algorithm between X and Y sequence
(should be any Python's sequence)
equal is a function to compare X and Y which must return 0 (or
a Python false value) if X and Y are different, 1 (or Python
true value) if they are identical
return a list of matched pairs in tuples</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">lcsl(X, Y, equal):</span>
</div>
<div class="varlistitem">
<div class="para">same as above but return the length of the lcs</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">quick_ratio(a,b):</span>
</div>
<div class="varlistitem">
<div class="para">optimized version of the standard difflib.py quick_ratio
(without junk and class)
return an upper bound on ratio() relatively quickly.</div>
</div>
</div>
</div>
<div class="sect1-title">
<a name="id2289679"></a>3. input.py</div>
<div class="para">provides functions for converting DOM tree or xml file in order to
process it with xmldiff functions.</div>
<div class="variablelist">
<div class="varlistentry">
<div class="varterm">
<span class="varname">tree_from_stream(stream, norm_sp=1, ext_ges=0, ext_pes=0, include_comment=1, encoding='UTF-8'):</span>
</div>
<div class="varlistitem">
<div class="para">create and return internal tree from xml stream (open file or
IOString)
if norm_sp = 1, normalize space and new line
if ext_ges = 1, include all external general (text) entities.
if ext_pes = 1, include all external parameter entities, including the external DTD subset.
if include_comment = 1, include comment nodes
encoding specify the encoding to use</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">tree_from_dom(root):</span>
</div>
<div class="varlistitem">
<div class="para">create and return internal tree from DOM subtree</div>
</div>
</div>
</div>
<div class="sect1-title">
<a name="id2289732"></a>4. fmes.py</div>
<div class="para">Fast match/ Edit script algorithm (not sure to obtain the minimum edit
cost, but accept big documents).</div>
<div class="para">Warning, the process(oldtree, newtree) function has a side effect:
after call it, oldtree == newtree.</div>
<div class="variablelist">
<div class="varlistentry">
<div class="varterm">
<span class="varname">class FmesCorrector(self, formatter, f=0.6, t=0.5):</span>
</div>
<div class="varlistitem">
<div class="para">class which contains the fmes algorithm
formatter is a class instance which handle the edit script
formatting (see format.py)
f and t are algorithm parameter, 0 < f < 1 and 0.5 < t < 1
in xmldiff, f = 0.59 and t = 0.5</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">FmesCorrector.process_trees(self, tree1, tree2):</span>
</div>
<div class="varlistitem">
<div class="para">launch diff between internal tree tree1 (old xmltree) and
tree2 (new xml tree)
return an actions list</div>
</div>
</div>
</div>
<div class="sect1-title">
<a name="id2289792"></a>5. ezs.py ** DEPRICATED **</div>
<div class="para">Extended Zhang and Shasha algorithm (provide the minimum edit cost,
but too complex to be used with big documents).</div>
<div class="variablelist">
<div class="varlistentry">
<div class="varterm">
<span class="varname">class EzsCorrector(self):</span>
</div>
<div class="varlistitem">
<div class="para">class which contains the ezs algorithm</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">EzsCorrector.process_trees(self, tree1, tree2):</span>
</div>
<div class="varlistitem">
<div class="para">launch diff between internal tree tree1 (old xmltree) and
tree2 (new xml tree)
return an actions list</div>
</div>
</div>
</div>
<div class="sect1-title">
<a name="id2289840"></a>6. format.py</div>
<div class="para">provides classes for converting xmldiff algorithms output to DOM
tree or printing it in native format or xml xupdate format. The
formatter interface is the following :</div>
<div class="variablelist">
<div class="varlistentry">
<div class="varterm">
<span class="varname">class AbstractFormatter:</span>
</div>
<div class="varlistitem">
<div class="para">abstract class designed to be overrinden by concrete
formatters</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">AbstractFormatter.init(self):</span>
</div>
<div class="varlistitem">
<div class="para">method called before the begining of the tree 2 tree
correction</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">AbstractFormatter.add_action(self, action):</span>
</div>
<div class="varlistitem">
<div class="para">method called when an action is added to the edit script</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">AbstractFormatter.format_action(self, action):</span>
</div>
<div class="varlistitem">
<div class="para">method called by end() to format each action in the edit
script
at least this method should be overriden</div>
</div>
</div>
<div class="varlistentry">
<div class="varterm">
<span class="varname">AbstractFormatter.end(self):</span>
</div>
<div class="varlistitem">
<div class="para">method called at the end of the tree 2 tree correction</div>
</div>
</div>
</div>
<div class="para">the concrete classes are InternalPrinter, XUpdatePrinter and
DOMXUpdateFormatter</div>
<div class="para">See xmldiff.py for an use example.</div>
</td>
</tr></tbody></table>
<div class="footer">Tous droits rservs la socit Logilab S.A.- 10, Rue Louis Vicat- F-75015 PARIS.</div>
</body>
</html>
|