1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388
|
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_getting_started">
<title>Getting started</title>
<left>
<boxes-toc/>
<p>
You can cut and paste the code on this page and
test it on the <a href="http://reglisse.ens.fr/cgi-bin/cduce">online interpreter</a>.
</p>
</left>
<box title="Key concepts" link="concepts">
<p>
CDuce is a strongly-typed functional programming language adapted
to the manipulation of XML documents. Its syntax is reminiscent
of the ML family, but CDuce has a completely different type system.
</p>
<p>
Let us introduce directly some key concepts:
</p>
<ul>
<li><b>Values</b> are the objects manipulated by
CDuce programs; we can distinguish several kind of values:
<ul>
<li>Basic values: integers, characters.</li>
<li>XML documents and fragments: elements, tag names, strings.</li>
<li>Constructed values: pairs, records, sequences.</li>
<li>Functional values.</li>
</ul>
</li>
<li><b>Types</b> denote sets of values that share common
structural and/or behavioral properties. For instance,
<code>Int</code> denotes the sets of all integers,
and <code><a href=String>[]</code> denotes XML elements
with tag <code>a</code> that have an attribute <code>href</code>
(whose content is a string), and with no sub-element.
</li>
<li><b>Expressions</b> are fragments of CDuce programs
that <em>produce</em> values. For instance, the expression <code>1 + 3</code>
evaluates to the value <code>4</code>. Note that values can
be seen either as special cases of expressions, or as
the result of evaluating expressions.</li>
<li><b>Patterns</b> are ``types + capture variables''. They allow
to extract from an input value some sub-values, which can then be
used in the rest of the program. For instance, the pattern
<code><a href=x>[]</code> extracts the value of the
<code>href</code> attribute and binds it to the <em>value
identifier</em> <code> x</code>.
</li>
</ul>
<section title="A first example">
<sample><![CDATA[
let x = "Hello, " in
let y = "world!" in
x @ y
]]></sample>
<p>
The expression binds two strings to value identifiers <code>x</code>
and <code>y</code>, and then concatenates them. The general form
of the local binding is:
</p>
<sample><![CDATA[
let %%p%% = %%e%% in %%e'%%
]]></sample>
</section>
<p>
where <code>%%p%%</code> is a pattern and <code>%%e%%</code>,
<code>%%e'%%</code> are expressions.
</p>
<note>
A small aside about the examples in this tutorial and their usage. The
first program that prints "Hello word" can be tried directly on the on-line
prototype: just select and copy it, click on the link to the on-line
interpreter in the side bar (we suggest you open it in a new window), paste it in the execution window and run it. The
second example instead cannot be run. This is visually signaled by the fact
that it contains text in italics. We use italics for meta notation, that is
<code>%%e%%</code> and <code>%%e'%%</code> stand for generic expressions, therefore it is useless to run
this code (you would just obtain an error signaling that <code>e</code> is
not bound or that the quote in <code>e'</code> is not closed). This is true also in general in what follows: code without
italicized text can be copied and pasted in the on-line prototype as they are
(of course you must first paste the declarations of the types they use);
this is not possible whenever the code contains italicized text.
</note>
<p>
Patterns are much more than simple variables. They can be used to decompose
values. For instance, if the words <tt>Hello</tt> and <tt>world</tt> are in the two elements of a pair, we can capture each of them and concatenate them as follows:
</p>
<sample><![CDATA[
let (x,y) = ("Hello, " , "world!") in x @ y
]]></sample>
<p>
Patterns can also check types. So for instance
</p>
<sample><![CDATA[
let (x & String, y) = %%e%% in x
]]></sample>
<p>
would return a (static) type error if the first projection of <code>%%e%%</code> has not the static type <code>String</code>.
</p>
<p>
The form <code>let x&%%t%% = %%e%% in %%e'%%</code> is used so often that we introduced a special syntax for it:
</p>
<sample><![CDATA[
let x : %%t%% = %%e%% in %%e'%%
]]></sample>
Note the blank spaces around the colons
<footnote>
Actually only the first blank is necessary. CDuce accepts <code>let x :%%t%% = %%e%% in %%e'%%</code>,
as well
</footnote>.
This is because the XML recommendation allows colons to occur in identifiers: see the User's Manual section on <a href="namespaces.html">namespaces</a>. (the same holds true for the functional arrow symbol <code>-></code> which must be surrounded by blanks and by colons in the formal parameters of a function: see <a
href="manual_expressions.html#bnote1">this paragraph</a> of the User's manual).
</box>
<box title="XML documents" link="xmldoc">
<p>
CDuce uses its own notation to denote XML documents. In the next table we
present an XML document on the left and the same document in CDuce notation on
the right (in the rest of this tutorial we visually distinguish XML code from CDuce one by putting the former in light yellow boxes):
</p>
<two-columns>
<left>
<xmlsample><![CDATA[
<?xml version="1.0"?>
<parentbook>
<person gender="F">
<name>Clara</name>
<children>
<person gender="M">
<name>Pl Andr</name>
<children/>
</person>
</children>
<email>clara@lri.fr</email>
<tel>314-1592654</tel>
</person>
<person gender="M">
<name> Bob </name>
<children>
<person gender="F">
<name>Alice</name>
<children/>
</person>
<person gender="M">
<name>Anne</name>
<children>
<person gender="M">
<name>Charlie</name>
<children/>
</person>
</children>
</person>
</children>
<tel kind="work">271828</tel>
<tel kind="home">66260</tel>
</person>
</parentbook>
]]></xmlsample>
</left>
<right>
<sample><![CDATA[
let parents : ParentBook =
<parentbook>[
<person gender="F">[
<name>"Clara"
<children>[
<person gender="M">[
<name>['Pl ' 'Andr']
<children>[]
]
]
<email>['clara@lri.fr']
<tel>"314-1592654"
]
<person gender="M">[
<name>"Bob"
<children>[
<person gender="F">[
<name>"Alice"
<children>[]
]
<person gender="M">[
<name>"Anne"
<children>[
<person gender="M">[
<name>"Charlie"
<children>[]
]
]
]
]
<tel kind="work">"271828"
<tel kind="home">"66260"
]
]
]]></sample>
</right>
</two-columns>
<p> Note the straightforward correspondence between the two notations:
instead of using an closing tag, we enclose the content of each
element in square brackets. In CDuce square brackets denote sequences,
that is, heterogeneous (ordered) lists of blank-separated elements. In
CDuce strings are not a primitive data-type but are sequences of
characters.</p>
<p>To the purpose of the example we used different notations to
denote strings as in CDuce <code>"xyz"</code>, <code> ['xyz']</code>,
<code> ['x' 'y' 'z']</code>, <code> [ 'xy' 'z' ]</code>, and <code> [
'x' 'yz' ]</code> define the same string literal. Note also that the
<code>"Pl Andr"</code> string is accepted as CDuce supports Unicode
characters.</p>
</box>
<box title="Loading XML files" link="loading">
<p> The program on the right hand-side in the previous section starts
by binding the variable <code>parents</code> to the XML document. It
also specifies that parents has the type <a
href="#type_decl"><code>ParentBook</code></a>: this is optional but it
usually allows earlier detection of type errors.
</p>
<p>
If the file XML on the left hand-side is stored in a file, say,
<tt>parents.xml</tt> then it can be loaded from the file by <code>%%load_xml%%
"parents.xml"</code> as the builtin function <code>load_xml</code> converts and
XML document stored in a file into the CDuce expression representing it. However
<code>load_xml</code> has type <code>String->Any</code>, where
<code>Any</code> is the type of all values. Therefore if we try to reproduce the
same binding as the above by writing the following declaration
</p>
<sample><![CDATA[
let parents : ParentBook = {{load_xml}} "parents.xml"
]]></sample>
<p>
we would obtain a type error as we were trying to use an expression of type
<code>Any</code> where an expression of type <code>ParentBook</code> is expected.
The right way to reproduce the binding above is:
</p>
<sample><![CDATA[
let parents : ParentBook =
match load_xml "parents.xml" with
x & ParentBook -> x
| _ -> raise "parents.xml is not a document of type ParentBook"
]]></sample>
<p>
what this expression does is that before assigning the result of the load_xml expression to the
variable <code>parents</code> it matches it against the type
<code>ParentBook</code>. If it succeeds (i.e., if the XML file in the document has
type <code>ParentBook</code>) then it performs the assignment (the variable
<code>x</code> is bound to the result of the load_xml expression by the pattern
<code>x&ParentBook</code>) otherwise it raises an exception.
</p>
<p>
Of course an exception such as "parents.xml is not a document of type
ParentBook" it is not very informative about why the document failed the match
an where the error might be. In CDuce it is possible to ask the program to
perform this check and raise an informative exception (a string that describes
and localize the problem) by using the dynamic type check construction
<code>(%%e%%:?%%t%%)</code> which checks whether the expression
<code>%%exp%%</code> has type <code>%%t%%</code> and it either returns the
result of <code>%%exp%%</code> or raise an informative exception.
</p>
<sample><![CDATA[
let parents = load_xml "parents.xml" :? ParentBook
]]></sample>
<p>
which perform the same test as the previous program but in case of failure give
information to the programmer on the reasons why the type check failed.
The dynamic type check can be also used in a let construction as follows
</p>
<sample><![CDATA[
let parents :? ParentBook = load_xml "parents.xml"
]]></sample>
<p>
which is completely equivalent to the previous one.
</p>
<p>
The command <code>load_xml "parents.xml"</code> is just an abbreviated form for
<code>load_xml "{{file://}}parents.xml"</code>. If CDuce is compiled with
netclient or curl support, then it is also possible to use other URI schemes such as
http:// or ftp://. A special scheme string: is always supported: the string
following the scheme is parsed as it is.
<footnote>
All these schemes are available for <code>load_html</code> and <code>load_file</code> as well.
</footnote>
So, for instance, <code>load_xml
"string:%%exp%%"</code>
parses litteral XML code <code>%%exp%%</code> (it corresponds to XQuery's <code>{ %%exp%% }</code>), while <code>load_xml
("string:" @ x)</code> parses the XML code associated to the string variable <code>x</code>. Thus the following definition of <code>x</code>
</p>
<sample><![CDATA[
let x : Any = <person>[ <name>"Alice" <children>[] ]
]]></sample>
<p>
is completely equivalent to this one
</p>
<sample><![CDATA[
let x = load_xml "string:<person><name>Alice</name> <children/></person>"
]]></sample>
</box>
<box title="Type declarations" link="type_decl">
<p>
First, we declare some types:
</p>
<sample><![CDATA[
type ParentBook = <parentbook>[Person*]
type Person = FPerson | MPerson
type FPerson = <person gender="F">[ Name Children (Tel | Email)*]
type MPerson = <person gender="M">[ Name Children (Tel | Email)*]
type Name = <name>[ PCDATA ]
type Children = <children>[Person*]
type Tel = <tel kind=?"home"|"work">['0'--'9'+ '-'? '0'--'9'+]
type Echar = 'a'--'z' | 'A'--'Z' | '_' | '0'--'9'
type Email= <email>[ Echar+ ('.' Echar+)* '@' Echar+ ('.' Echar+)+ ]
]]></sample>
<p> The type ParentBook describes XML documents that store information
of persons. A tag <code><tag attr1=... attr2=... ...></code>
followed by a sequence type denotes an XML document type. Sequence
types classify ordered lists of heterogeneous elements and they are
denoted by square brackets that enclose regular expressions over types
(note that a regular expression over types <i>is not</i> a type, it
just describes the content of a sequence type, therefore if it is not
enclosed in square brackets it is meaningless). The definitions above
state that a ParentBook element is formed by a possibly empty sequence
of persons. A person is either of type <code>FPerson</code> or
<code>MPerson</code> according to the value of the gender attribute.
An equivalent definition for Person would thus be:
</p>
<sample><![CDATA[
<person gender={{"F"|"M"}}>[ Name Children (Tel | Email)*]
]]></sample>
<p> A person element is composed by a sequence formed of a name
element, a children element, and zero or more telephone and e-mail
elements, in this order. </p>
<p> Name elements contain strings. These are encoded as sequences of
characters. The <code>PCDATA</code> keyword is equivalent to the
regexp <code>Char*</code>, then <code>String</code>,
<code>[Char*]</code>, <code>[PCDATA]</code>, <code>[PCDATA*
PCDATA]</code>, ..., are all equivalent notations. Children are
composed of zero or more Person elements. Telephone elements have an
optional (as indicated by <code>=?</code>) string attribute whose
value is either ``home'' or ``work'' and they are formed by a single
string of two non-empty sequences of numeric characters separated by
an optional dash character. Had we wanted to state that a phone number
is an integer with at least, say, 5 digits (of course this is
meaningful only if no phone number starts by 0) we would have used an
interval type such as <code><tel kind=?"home"|"work">[10000--*]</code>,
where <code>*</code> here denotes plus infinity, while on the lefthand side of <code>--</code> (as in <code>*--100</code>) it denotes minus infinity. </p>
<p>
Echar is the type of characters in e-mails
addresses. It is used in the regular expression defining Email to
precisely constrain the form of the addresses. An XML document satisfying
these constraints is shown
</p>
</box>
</page>
|