File: getting_started.xml

package info (click to toggle)
cduce 0.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 3,180 kB
  • ctags: 3,176
  • sloc: ml: 20,028; xml: 5,546; makefile: 427; sh: 133
file content (388 lines) | stat: -rw-r--r-- 14,247 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_getting_started">

<title>Getting started</title>

<left>
<boxes-toc/>
<p>
You can cut and paste the code on this page and 
test it on the <a href="http://reglisse.ens.fr/cgi-bin/cduce">online interpreter</a>.
</p>
</left>

<box title="Key concepts" link="concepts">

<p>
CDuce is a strongly-typed functional programming language adapted
to the manipulation of XML documents. Its syntax is reminiscent
of the ML family, but CDuce has a completely different type system.
</p>

<p>
Let us introduce directly some key concepts:
</p>

<ul>
<li><b>Values</b> are the objects manipulated by
CDuce programs; we can distinguish several kind of values:
 <ul>
 <li>Basic values: integers, characters.</li>
 <li>XML documents and fragments: elements, tag names, strings.</li>
 <li>Constructed values: pairs, records, sequences.</li>
 <li>Functional values.</li>
 </ul>
</li>

<li><b>Types</b> denote sets of values that share common
structural and/or behavioral properties. For instance,
<code>Int</code> denotes the sets of all integers,
and <code>&lt;a href=String>[]</code> denotes XML elements
with tag <code>a</code> that have an attribute <code>href</code>
(whose content is a string), and with no sub-element.
</li>

<li><b>Expressions</b> are fragments of CDuce programs
that <em>produce</em> values. For instance, the expression <code>1 + 3</code>
evaluates to the value <code>4</code>. Note that values can 
be seen either as special cases of expressions, or as
the result of evaluating expressions.</li>

<li><b>Patterns</b> are ``types + capture variables''. They allow
to extract from an input value some sub-values, which can then be
used in the rest of the program. For instance, the pattern
<code>&lt;a href=x>[]</code> extracts the value of the
<code>href</code> attribute and binds it to the <em>value
identifier</em> <code> x</code>.
</li>
</ul>

<section title="A first example">
<sample><![CDATA[
let x = "Hello, " in
let y = "world!" in
x @ y
]]></sample>

<p>
The expression binds two strings to value identifiers <code>x</code>
and <code>y</code>, and then concatenates them. The general form
of the local binding is:
</p>

<sample><![CDATA[
let %%p%% = %%e%% in %%e'%%
]]></sample>
</section>

<p>
where <code>%%p%%</code> is a pattern and <code>%%e%%</code>, 
<code>%%e'%%</code> are expressions.
</p>
<note>
  A small aside about the examples in this tutorial and their usage. The
  first program that prints "Hello word" can be tried directly on the on-line
  prototype: just select and copy it, click on the link to the on-line
  interpreter in the side bar (we suggest you open it in a new window), paste it in the execution window and run it. The
  second example instead cannot be run. This is visually signaled by the fact
  that it contains text in italics. We use italics for meta notation, that is
  <code>%%e%%</code> and <code>%%e'%%</code> stand for  generic expressions, therefore it is useless to run
  this code (you would just obtain an error signaling that <code>e</code> is
  not bound or that the quote in <code>e'</code> is not closed). This is true also in general in what follows: code without
  italicized text can be copied and pasted in the on-line prototype as they are
  (of course you must first paste the declarations of the types they use);
  this is not possible whenever the code contains italicized text.
</note>
<p>
Patterns are much more than simple variables. They can be used to decompose 
values. For instance, if the words <tt>Hello</tt> and <tt>world</tt> are in the two elements of a pair, we can capture each of them and concatenate them as follows:
</p>
<sample><![CDATA[
let (x,y) = ("Hello, " , "world!") in x @ y
]]></sample>
<p>
Patterns can also check types. So for instance 
</p>
<sample><![CDATA[
let (x & String, y) = %%e%% in x 
]]></sample>
<p>
would return a (static) type error if the first projection of <code>%%e%%</code> has not the static type <code>String</code>.
</p>
<p>
The form <code>let x&amp;%%t%% = %%e%% in %%e'%%</code> is used so often that we introduced a special syntax for it:
</p>
<sample><![CDATA[
let x : %%t%% = %%e%% in %%e'%%
]]></sample>
Note the blank spaces around the colons
<footnote>
Actually only the first blank is necessary. CDuce accepts <code>let x :%%t%% = %%e%% in %%e'%%</code>,
as well
</footnote>.
 This is because the XML recommendation allows colons to occur in identifiers: see the User's Manual section on <a href="namespaces.html">namespaces</a>. (the same holds true for the functional arrow symbol <code>-></code> which must be surrounded by blanks and by colons in the formal parameters of a function: see  <a
href="manual_expressions.html#bnote1">this paragraph</a> of the User's manual).
</box>

<box title="XML documents" link="xmldoc">
<p>
CDuce uses its own notation to denote XML documents. In the next table we
present an XML document on the left and the same document in CDuce notation on
the right (in the rest of this tutorial we visually distinguish XML code from CDuce one by putting the former in light yellow boxes):
</p>

<two-columns>

<left>

<xmlsample><![CDATA[
<?xml version="1.0"?>
<parentbook>
  <person gender="F">
    <name>Clara</name>
    <children>
      <person gender="M">
        <name>Pl Andr</name>
        <children/>
      </person>
    </children>
    <email>clara@lri.fr</email>
    <tel>314-1592654</tel>
  </person>
  <person gender="M">
    <name> Bob </name>
    <children>
      <person gender="F">
        <name>Alice</name>
        <children/>
      </person>
      <person gender="M">
        <name>Anne</name>
        <children>
          <person gender="M">
            <name>Charlie</name>
            <children/>
          </person>
        </children>
      </person>
    </children>
    <tel kind="work">271828</tel>
    <tel kind="home">66260</tel>
  </person>
</parentbook>
]]></xmlsample>

</left>

<right>

<sample><![CDATA[
let parents : ParentBook =
<parentbook>[
  <person gender="F">[
    <name>"Clara"
    <children>[
      <person gender="M">[
        <name>['Pl ' 'Andr'] 
        <children>[]
      ]
    ]
    <email>['clara@lri.fr']
    <tel>"314-1592654"
  ] 
  <person gender="M">[
    <name>"Bob"
    <children>[
      <person gender="F">[
        <name>"Alice" 
        <children>[]
      ]
      <person gender="M">[
        <name>"Anne"
        <children>[
          <person gender="M">[
            <name>"Charlie"
            <children>[]
          ] 
        ] 
      ] 
    ] 
    <tel kind="work">"271828"
    <tel kind="home">"66260"
  ] 
] 
]]></sample>

</right>
</two-columns>

<p> Note the straightforward correspondence between the two notations:
instead of using an closing tag, we enclose the content of each
element in square brackets. In CDuce square brackets denote sequences,
that is, heterogeneous (ordered) lists of blank-separated elements. In
CDuce strings are not a primitive data-type but are sequences of
characters.</p>

<p>To the purpose of the example we used different notations to
denote strings as in CDuce <code>"xyz"</code>, <code> ['xyz']</code>,
<code> ['x' 'y' 'z']</code>, <code> [ 'xy' 'z' ]</code>, and <code> [
'x' 'yz' ]</code> define the same string literal. Note also that the
<code>"Pl Andr"</code> string is accepted as CDuce supports Unicode
characters.</p>
</box>


<box title="Loading XML files" link="loading">

<p> The program on the right hand-side in the previous section starts
by binding the variable <code>parents</code> to the XML document. It
also specifies that parents has the type <a
href="#type_decl"><code>ParentBook</code></a>: this is optional but it
usually allows earlier detection of type errors. 
</p>
<p>
If the file XML on the left hand-side is stored in a file, say,
<tt>parents.xml</tt> then it can be loaded from the file by <code>%%load_xml%%
"parents.xml"</code> as the builtin function <code>load_xml</code> converts and
XML document stored in a file into the CDuce expression representing it. However
<code>load_xml</code> has type <code>String->Any</code>, where
<code>Any</code> is the type of all values. Therefore if we try to reproduce the
same binding as the above by writing the following declaration
</p>
<sample><![CDATA[
let parents : ParentBook = {{load_xml}} "parents.xml" 
]]></sample> 
<p>
we would obtain a type error as we were trying to use an expression of type 
<code>Any</code> where an expression of type <code>ParentBook</code> is expected. 
The right way to reproduce the binding above is: 
</p> 
<sample><![CDATA[
let parents : ParentBook =
     match load_xml "parents.xml" with
          x & ParentBook -> x
       |  _ -> raise "parents.xml is not a document of type ParentBook"
]]></sample> 
<p>
what this expression does is that before assigning the result of the load_xml expression to the
variable <code>parents</code> it matches it against the type
<code>ParentBook</code>.  If it succeeds (i.e., if the XML file in the document has
type <code>ParentBook</code>) then it performs the assignment (the variable
<code>x</code> is bound to the result of the load_xml expression by the pattern
<code>x&amp;ParentBook</code>) otherwise it raises an exception.
</p>
<p>
Of course an exception such as "parents.xml is not a document of type
ParentBook" it is not very informative about why the document failed the match
an where the error might be. In CDuce it is possible to ask the program to
perform this check and raise an informative exception (a string that describes
and localize the problem) by using the dynamic type check construction
<code>(%%e%%:?%%t%%)</code> which checks whether the expression
<code>%%exp%%</code> has type <code>%%t%%</code> and it either returns the
result of <code>%%exp%%</code> or raise an informative exception. 
</p>
<sample><![CDATA[
let parents  = load_xml "parents.xml" :? ParentBook
]]></sample>
<p>
which perform the same test as the previous program but in case of failure give
information to the programmer on the reasons why the type check failed.
The dynamic type check can be also used in a let construction as follows
</p>
<sample><![CDATA[
let parents :? ParentBook = load_xml "parents.xml"
]]></sample>
<p>
which is completely equivalent to the previous one.
</p>
<p>
The command <code>load_xml "parents.xml"</code> is just an abbreviated form for
<code>load_xml "{{file://}}parents.xml"</code>. If CDuce is compiled with
netclient or curl support, then it is also possible to use other URI schemes such as
http:// or ftp://. A special scheme string: is always supported: the string
following the scheme is parsed as it is.
<footnote>
All these schemes are available for <code>load_html</code> and <code>load_file</code> as well.
</footnote>
 So, for instance, <code>load_xml
"string:%%exp%%"</code> 
parses litteral XML code <code>%%exp%%</code> (it corresponds to XQuery's <code>{ %%exp%% }</code>), while <code>load_xml
("string:" @ x)</code> parses the XML code associated to the string variable <code>x</code>. Thus the following definition of <code>x</code>
</p>
<sample><![CDATA[
let x : Any = <person>[ <name>"Alice" <children>[] ]
]]></sample> 
<p>
is completely equivalent to this one
</p>
<sample><![CDATA[
let x = load_xml "string:<person><name>Alice</name> <children/></person>"
]]></sample> 


</box>

<box title="Type declarations" link="type_decl">
<p>
First, we declare some types:
</p>

<sample><![CDATA[
type ParentBook = <parentbook>[Person*]
type Person = FPerson | MPerson 
type FPerson = <person gender="F">[ Name Children (Tel | Email)*] 
type MPerson = <person gender="M">[ Name Children (Tel | Email)*] 
type Name = <name>[ PCDATA ]
type Children = <children>[Person*] 
type Tel = <tel kind=?"home"|"work">['0'--'9'+ '-'? '0'--'9'+]
type Echar = 'a'--'z' | 'A'--'Z' | '_' | '0'--'9'
type Email= <email>[ Echar+ ('.' Echar+)* '@' Echar+ ('.' Echar+)+ ]
]]></sample>

<p> The type ParentBook describes XML documents that store information
of persons. A tag <code>&lt;tag attr1=... attr2=... ...&gt;</code>
followed by a sequence type denotes an XML document type. Sequence
types classify ordered lists of heterogeneous elements and they are
denoted by square brackets that enclose regular expressions over types
(note that a regular expression over types <i>is not</i> a type, it
just describes the content of a sequence type, therefore if it is not
enclosed in square brackets it is meaningless). The definitions above
state that a ParentBook element is formed by a possibly empty sequence
of persons. A person is either of type <code>FPerson</code> or
<code>MPerson</code> according to the value of the gender attribute.
An equivalent definition for Person would thus be:

</p>

<sample><![CDATA[
<person gender={{"F"|"M"}}>[ Name Children (Tel | Email)*] 
]]></sample>

<p> A person element is composed by a sequence formed of a name
element, a children element, and zero or more telephone and e-mail
elements, in this order.  </p>

<p> Name elements contain strings. These are encoded as sequences of
characters. The <code>PCDATA</code> keyword is equivalent to the
regexp <code>Char*</code>, then <code>String</code>,
<code>[Char*]</code>, <code>[PCDATA]</code>, <code>[PCDATA*
PCDATA]</code>, ..., are all equivalent notations. Children are
composed of zero or more Person elements.  Telephone elements have an
optional (as indicated by <code>=?</code>) string attribute whose
value is either ``home'' or ``work'' and they are formed by a single
string of two non-empty sequences of numeric characters separated by
an optional dash character. Had we wanted to state that a phone number
is an integer with at least, say, 5 digits (of course this is
meaningful only if no phone number starts by 0) we would have used an
interval type such as <code>&lt;tel kind=?"home"|"work"&gt;[10000--*]</code>,
where <code>*</code> here denotes plus infinity, while on the lefthand side of <code>--</code> (as in <code>*--100</code>) it denotes minus infinity.  </p>

<p>
Echar is the type of characters in e-mails
addresses. It is used in the regular expression defining Email to
precisely constrain the form of the addresses. An XML document satisfying
these constraints is shown
</p>

</box>
</page>