File: goodies.tex

package info (click to toggle)
haskell98-tutorial 200006-2-1
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 864 kB
  • ctags: 59
  • sloc: haskell: 2,125; makefile: 66; sh: 9
file content (515 lines) | stat: -rw-r--r-- 26,538 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
%**<title>A Gentle Introduction to Haskell: Values and Types</title>

%**~header

\section{Values, Types, and Other Goodies}
\label{tut-values-etc}

Because Haskell is a purely functional language, all computations are
done via the evaluation of {\em expressions} (syntactic terms) to
yield {\em values} (abstract entities that we regard as answers).
Every value has an associated 
{\em type}.  (Intuitively, we can think of types as sets of values.)
Examples of expressions include atomic values such as the integer \mbox{\tt 5},
the character \mbox{\tt 'a'}, and the function \mbox{\tt {\char'134}x\ ->\ x+1}, as well as
structured values such as the list \mbox{\tt [1,2,3]} and the pair \mbox{\tt ('b',4)}.

Just as expressions denote values, type expressions are
syntactic terms that denote type values (or just {\em types}).
Examples of type expressions include the atomic types \mbox{\tt Integer}
(infinite-precision integers), \mbox{\tt Char} (characters), \mbox{\tt Integer->Integer}
(functions mapping \mbox{\tt Integer} to \mbox{\tt Integer}), as well as the structured types
\mbox{\tt [Integer]} (homogeneous lists of integers) and \mbox{\tt (Char,Integer)}
(character, integer pairs).

All Haskell values are ``first-class''---they may be passed as
arguments to functions, returned as results, placed in data
structures, etc.  Haskell types, on the other hand, are {\em not}
first-class.  Types in a sense describe values, and the
association of a value with its type is called a {\em typing}.  Using
the examples of values and types above, we write typings as follows:
\bprog
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 5\ \ ::\ Integer}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 'a'\ ::\ Char}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ inc\ ::\ Integer\ ->\ Integer}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ [1,2,3]\ ::\ [Integer]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ('b',4)\ ::\ (Char,Integer)}
\eprog
The ``\mbox{\tt ::}'' can be read ``has type.''

Functions in Haskell are normally defined by a series of {\em
equations}.  For example, the function \mbox{\tt inc} can be
defined by the single equation:
\bprog
\mbox{\tt inc\ n\ \ \ \ \ \ \ \ \ \ =\ n+1}
\eprog 
An equation is an example of a {\em declaration}.  Another kind of
declaration is a {\em type signature declaration}
(\see{type-signatures}), with which we can declare an explicit typing
for \mbox{\tt inc}:
\bprog
\mbox{\tt inc\ \ \ \ \ \ \ \ \ \ \ \ ::\ Integer\ ->\ Integer}
\eprog 
We will have much more to say about function definitions in Section
\ref{tut-functions}.

For pedagogical purposes, when we wish to indicate that an expression
$e_1$ evaluates, or ``reduces,'' to another expression or value $e_2$,
we will write:
\[ e_1\ \ \ \ \red\ \ \ \ e_2 \]
For example, note that:
\[ \mbox{\tt inc\ (inc\ 3)}\ \ \ \ \red\ \ \ \ \mbox{\tt 5} \]

Haskell's static type system defines the formal relationship
between types and values (\see{type-semantics}).  The static type
system ensures that Haskell programs are {\em type safe}; that is,
that the programmer has not mismatched types in some way.  For
example, we cannot generally add together two characters, so the
expression \mbox{\tt 'a'+'b'} is ill-typed.  The main advantage of statically
typed languages is well-known: All type errors are detected at
compile-time.  Not all errors are caught by the type system; an
expression such as \mbox{\tt 1/0} is typable but its evaluation will result in
an error at execution time.  Still, the type system finds many
program errors at compile time, aids the user in reasoning about
programs, and also permits a compiler to generate more efficient code
(for example, no run-time type tags or tests are required).

The type system also ensures that user-supplied type signatures are
correct.  In fact, Haskell's type system is powerful enough to allow
us to avoid writing any type signatures at all;\footnote{With a few
exceptions to be described later.} we say that the type
system {\em infers} the correct types for us.  Nevertheless, judicious
placement of type signatures such as that we gave for \mbox{\tt inc} is a good idea,
since type signatures are a very effective form of documentation and
help bring programming errors to light.

\syn{The reader will note that we have capitalized identifiers that
denote specific types, such as \mbox{\tt Integer} and \mbox{\tt Char}, but not identifiers
that denote values, such as \mbox{\tt inc}.  This is not just a convention: it
is enforced by Haskell's lexical syntax.  In fact, the case of the
other characters matters, too: \mbox{\tt foo}, \mbox{\tt fOo}, and \mbox{\tt fOO} are all
distinct identifiers.}

\subsection{Polymorphic Types}
\label{tut-polymorphism}

Haskell also incorporates {\em polymorphic} types---types that are
universally quantified in some way over all types.  Polymorphic
type expressions essentially describe families of types.  For
example, $(\forall$\mbox{\tt a}$)$\mbox{\tt [a]} is the family of types consisting of,
for every type \mbox{\tt a}, the type of lists of \mbox{\tt a}.  Lists of integers
(e.g.~\mbox{\tt [1,2,3]}), lists of characters (\mbox{\tt ['a','b','c']}), even lists of
lists of integers, etc., are all members of this family.  (Note,
however, that \mbox{\tt [2,'b']} is {\em not} a valid example, since there is
no single type that contains both \mbox{\tt 2} and \mbox{\tt 'b'}.)

\syn{Identifiers such as \mbox{\tt a} above are called {\em type variables},
and are uncapitalized to distinguish them from specific types such as
\mbox{\tt Int}.  Furthermore, since Haskell has only universally quantified
types, there is no need to explicitly write out the symbol for
universal quantification, and thus we simply write \mbox{\tt [a]} in the
example above.  In other words, all type variables are implicitly
universally quantified.}

Lists are a commonly used data structure in functional languages, and
are a good vehicle for explaining the principles of polymorphism.  The
list \mbox{\tt [1,2,3]} in Haskell is actually shorthand for the list
\mbox{\tt 1:(2:(3:[]))}, where \mbox{\tt []} is the empty list and \mbox{\tt :} is the infix
operator that adds its first argument to the front of its second
argument (a list).\footnote{\mbox{\tt :} and \mbox{\tt []} are like Lisp's \mbox{\tt cons} and
\mbox{\tt nil}, respectively.} Since \mbox{\tt :} is right associative, we can also
write this list as \mbox{\tt 1:2:3:[]}.

As an example of a user-defined function that operates on lists,
consider the problem of counting the number of elements in a list:
\bprog
\mbox{\tt length\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ Integer}\\
\mbox{\tt length\ []\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ 0}\\
\mbox{\tt length\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ =\ \ 1\ +\ length\ xs}
\eprog 
This definition is almost self-explanatory.  We can read the equations
as saying: ``The length of the empty list is 0, and the length of a
list whose first element is \mbox{\tt x} and remainder is \mbox{\tt xs} is 1 plus the
length of \mbox{\tt xs}.'' (Note the naming convention used here; \mbox{\tt xs} is the
plural of \mbox{\tt x}, and should be read that way.)

Although intuitive, this example highlights an important aspect of
Haskell that is yet to be explained: {\em pattern matching}.  The
left-hand sides of the equations contain patterns such as \mbox{\tt []}
and \mbox{\tt x:xs}.  In a function application these patterns are 
matched against actual parameters in a fairly intuitive way (\mbox{\tt []}
only matches the empty list, and \mbox{\tt x:xs} will successfully match any
list with at least one element, binding \mbox{\tt x} to the first element and
\mbox{\tt xs} to the rest of the list).  If the match succeeds, the right-hand
side is evaluated and returned as the result of the application.  If
it fails, the next equation is tried, and if all equations fail, an
error results.

Defining functions by pattern matching is quite common in Haskell, and
the user should become familiar with the various kinds of patterns
that are allowed; we will return to this issue in
Section~\ref{tut-pattern-matching}. 

The \mbox{\tt length} function is also an example of a polymorphic
function.  It can 
be applied to a list containing elements of any type, for example
\mbox{\tt [Integer]}, \mbox{\tt [Char]}, or \mbox{\tt [[Integer]]}.
\[\ba{lcl}
  \mbox{\tt length\ [1,2,3]}      &\ \ \ \ \red\ \ \ \ & 3\\
  \mbox{\tt length\ ['a','b','c']}&\ \ \ \ \red\ \ \ \ & 3\\
  \mbox{\tt length\ [[1],[2],[3]]}   &\ \ \ \ \red\ \ \ \ & 3
\ea\]

Here are two other useful polymorphic functions on lists that will be
used later.  Function \mbox{\tt head} returns the first element of a list,
function \mbox{\tt tail} returns all but the first.
\bprog
\mbox{\tt head\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ a}\\
\mbox{\tt head\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ x}\\
\mbox{\tt }\\[-8pt]
\mbox{\tt tail\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ [a]}\\
\mbox{\tt tail\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ xs}
\eprog 
Unlike \mbox{\tt length}, these functions are not defined for all possible
values of their argument.  A runtime error occurs when these functions
are applied to an empty list.  

With polymorphic types, we find that some types are in a sense
strictly more general than others in the sense that the set of
values they define is larger.  For example, the type \mbox{\tt [a]}
is more general than \mbox{\tt [Char]}.  In other words, the latter type can be
derived from the former by a suitable substitution for \mbox{\tt a}.  With
regard to this generalization ordering, Haskell's type system
possesses two important properties: First, every well-typed expression
is guaranteed to have a unique principal type (explained below),
and second, the principal type can be inferred automatically
(\see{type-semantics}).  In comparison to a monomorphically
typed language such as C, the reader will find that 
polymorphism improves expressiveness, and type inference
lessens the burden of types on the programmer.

An expression's or function's principal type is the least general type
that, intuitively, ``contains all instances of the expression''.  For
example, the principal type of \mbox{\tt head} is \mbox{\tt [a]->a}; \mbox{\tt [b]->a},
\mbox{\tt a->a}, or even \mbox{\tt a} are correct types, but too general, whereas something
like \mbox{\tt [Integer]->Integer} is too specific.  The existence of unique principal
types is the hallmark feature of the {\em Hindley-Milner type system},
which forms the basis of the type systems of Haskell, ML,
Miranda,\footnote{``Miranda'' is a trademark of Research Software,
Ltd.} and several other (mostly functional) languages.

\subsection{User-Defined Types}
\label{tut-user-types}

We can define our own types in Haskell using a \mbox{\tt data} declaration,
which we introduce via a series of examples (\see{datatype-decls}).

An important predefined type in Haskell is that of truth values:
\bprog
\mbox{\tt data\ Bool\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ False\ |\ True}
\eprog
The type being defined here is \mbox{\tt Bool}, and it has exactly two values:
\mbox{\tt True} and \mbox{\tt False}.  Type \mbox{\tt Bool} is an example of a (nullary) {\em type
constructor}, and \mbox{\tt True} and \mbox{\tt False} are (also nullary) {\em data
constructors} (or just {\em constructors}, for short).

Similarly, we might wish to define a color type:
\bprog
\mbox{\tt data\ Color\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ Red\ |\ Green\ |\ Blue\ |\ Indigo\ |\ Violet}
\eprog
Both \mbox{\tt Bool} and \mbox{\tt Color} are examples of enumerated types, since
they consist of a finite number of nullary data constructors.

Here is an example of a type with just one data constructor:
\bprog
\mbox{\tt data\ Point\ a\ \ \ \ \ \ \ \ \ \ \ \ =\ Pt\ a\ a}
\eprog
Because of the single constructor, a type like \mbox{\tt Point} is often
called a {\em tuple type}, since it is essentially just a cartesian
product (in this case binary) of other types.\footnote{Tuples are
somewhat like records in other languages.}
In contrast, multi-constructor types, such as \mbox{\tt Bool} and
\mbox{\tt Color}, are called (disjoint) union or sum types.

More importantly, however, \mbox{\tt Point} is an example of a 
polymorphic type: for any type $t$, it defines the type of cartesian
points that use $t$ as the coordinate type.  The \mbox{\tt Point} type can now be seen
clearly as a unary type constructor, since from the type $t$ it
constructs a new type \mbox{\tt Point\ }$t$.  (In the same sense, using the list
example given earlier, \mbox{\tt []} is also a type constructor.  Given any type $t$
we can ``apply'' \mbox{\tt []} to yield a new type \mbox{\tt [}$t$\mbox{\tt ]}.  The Haskell
syntax allows \mbox{\tt []\ }$t$ to be written as \mbox{\tt [}$t$\mbox{\tt ]}.  Similarly,
\mbox{\tt ->} is a type constructor: given two types $t$ and $u$,
$t$\mbox{\tt ->}$u$ is the type of functions mapping elements of type $t$ to
elements of type $u$.)

Note that the type of the binary data constructor \mbox{\tt Pt} is \mbox{\tt a\ ->\ a\ ->\ Point\ a}, 
and thus the following typings are valid:
\bprog
\mbox{\tt Pt\ \ 2.0\ \ 3.0\ \ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Float}\\
\mbox{\tt Pt\ \ 'a'\ \ 'b'\ \ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Char}\\
\mbox{\tt Pt\ True\ False\ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Bool}
\eprog
On the other hand, an expression such as \mbox{\tt Pt\ 'a'\ 1} is ill-typed
because \mbox{\tt 'a'} and \mbox{\tt 1} are of different types.

It is important to distinguish between applying a {\em data constructor} to
yield a {\em value}, and applying a {\em type constructor} to yield a
{\em type}; the former happens at run-time and is how we compute
things in Haskell, whereas the latter happens at compile-time and is
part of the type system's process of ensuring type safety.

\syn{Type constructors such as \mbox{\tt Point} and data constructors such as
\mbox{\tt Pt} are in separate namespaces.  This allows the same name to be used
for both a type constructor and data constructor, as in the following:
\bprog
\mbox{\tt data\ Point\ a\ =\ Point\ a\ a}
\eprog
While this may seem a little confusing at first, it serves to make the
link between a type and its data constructor more obvious.}

\subsubsection{Recursive Types}
\label{tut-recursive-types}

Types can also be recursive, as in the type of binary trees:
\bprog
\mbox{\tt data\ Tree\ a\ \ \ \ \ \ \ \ \ \ \ \ \ =\ Leaf\ a\ |\ Branch\ (Tree\ a)\ (Tree\ a)\ }
\eprog
Here we have defined a polymorphic binary tree type whose elements
are either leaf nodes containing a value of type \mbox{\tt a}, or internal
nodes (``branches'') containing (recursively) two sub-trees.

When reading data declarations such as this, remember again that \mbox{\tt Tree} is a
type constructor, whereas \mbox{\tt Branch} and \mbox{\tt Leaf} are data constructors.
Aside from establishing a connection between these constructors, the
above declaration is essentially defining the following types for
\mbox{\tt Branch} and \mbox{\tt Leaf}:
\bprog
\mbox{\tt Branch\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ Tree\ a\ ->\ Tree\ a\ ->\ Tree\ a}\\
\mbox{\tt Leaf\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ a\ ->\ Tree\ a}
\eprog 

With this example we have defined a type sufficiently rich to
allow defining some interesting (recursive) functions that use it.
For example, suppose we wish to define a function \mbox{\tt fringe} that
returns a list of all the elements in the leaves of a tree from left
to right.  It's usually helpful to write down the type of new
functions first; in this case we see that the type should be 
\mbox{\tt Tree\ a\ ->\ [a]}.  That is, \mbox{\tt fringe} is a polymorphic function that,
for any type \mbox{\tt a}, maps trees of \mbox{\tt a} into lists of \mbox{\tt a}.  A suitable
definition follows:
\bprog
\mbox{\tt fringe\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ Tree\ a\ ->\ [a]}\\
\mbox{\tt fringe\ (Leaf\ x)\ \ \ \ \ \ \ \ \ \ \ \ =\ \ [x]}\\
\mbox{\tt fringe\ (Branch\ left\ right)\ =\ \ fringe\ left\ ++\ fringe\ right}
\eprog
Here \mbox{\tt ++} is the infix operator that concatenates two lists (its full
definition will be given in Section \ref{tut-monadic-classes}).  As
with the \mbox{\tt length} example given earlier, the \mbox{\tt fringe} function is
defined using pattern matching, except that here we see patterns involving
user-defined constructors: \mbox{\tt Leaf} and \mbox{\tt Branch}.  \syn{Note that the
formal parameters are easily identified as the ones beginning with
lower-case letters.}


\subsection{Type Synonyms}
\label{tut-type-synonyms}

For convenience, Haskell provides a way to define {\em type synonyms};
i.e.~names for commonly used types.  Type synonyms are created using a
\mbox{\tt type} declaration (\see{type-synonym-decls}).  Here are several
examples:
\bprog
\mbox{\tt type\ String\ \ \ \ \ \ \ \ \ \ \ \ \ =\ [Char]}\\
\mbox{\tt type\ Person\ \ \ \ \ \ \ \ \ \ \ \ \ =\ (Name,Address)}\\
\mbox{\tt type\ Name\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ String}\\
\mbox{\tt data\ Address\ \ \ \ \ \ \ \ \ \ \ \ =\ None\ |\ Addr\ String}
\eprog

Type synonyms do not define new types, but simply give new names
for existing types.  For example, the type \mbox{\tt Person\ ->\ Name} is
precisely equivalent to \mbox{\tt (String,Address)\ ->\ String}.  The new names
are often shorter than the types they are synonymous with, but this is
not the only purpose of type synonyms: they can also improve
readability of programs by being more mnemonic; indeed, the above
examples highlight this.  We can even give new names to polymorphic
types:
\bprog
\mbox{\tt type\ AssocList\ a\ b\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ [(a,b)]}
\eprog 
This is the type of ``association lists'' which associate values of
type \mbox{\tt a} with those of type \mbox{\tt b}.

\subsection{Built-in Types Are Not Special}
\label{tut-built-ins}

Earlier we introduced several ``built-in'' types such as lists,
tuples, integers, and characters.  We have also shown how new
user-defined types can be defined.  Aside from special syntax, are
the built-in types in any way more special than the user-defined ones?
The answer is {\em no}.  The special syntax is for convenience and for
consistency with historical convention, but has no semantic
consequences.

We can emphasize this point by considering what the type
declarations would look like for these built-in types if in fact we
were allowed to use the special syntax in defining them.  For example,
the \mbox{\tt Char} type might be written as:
\bprog
\mbox{\tt data\ Char\ \ \ \ \ \ \ =\ 'a'\ |\ 'b'\ |\ 'c'\ |\ ...\ \ \ \ \ \ \ \ \ --\ This\ is\ not\ valid}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ 'A'\ |\ 'B'\ |\ 'C'\ |\ ...\ \ \ \ \ \ \ \ \ --\ Haskell\ code!}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ '1'\ |\ '2'\ |\ '3'\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ...}
\eprog
These constructor names are not syntactically valid; to fix them we
would have to write something like:
\bprog
\mbox{\tt data\ Char\ \ \ \ \ \ \ =\ Ca\ |\ Cb\ |\ Cc\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ CA\ |\ CB\ |\ CC\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ C1\ |\ C2\ |\ C3\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ...}
\eprog
Even though these constructors are more concise, they are quite
unconventional for representing characters.

In any case, writing ``pseudo-Haskell'' code in this way helps us to
see through the special syntax.  We see now that \mbox{\tt Char} is just an
enumerated type consisting of a large number of nullary constructors.
Thinking of \mbox{\tt Char} in this way makes it clear that we can
pattern-match against characters in function definitions, just as we
would expect to be able to do so for any of a type's constructors.

\syn{This example also demonstrates the use of {\em comments} in
Haskell; the characters \mbox{\tt --} and all subsequent characters to the end
of the line are ignored.  Haskell also permits {\em nested} comments
which have the form \mbox{\tt {\char'173}-}$\ldots$\mbox{\tt -{\char'175}} and can appear anywhere
(\see{lexemes}).}

Similarly, we could define \mbox{\tt Int} (fixed precision integers) and
\mbox{\tt Integer} by: 
\bprog
\mbox{\tt data\ Int\ \ \ \ \ =\ -65532\ |\ ...\ |\ -1\ |\ 0\ |\ 1\ |\ ...\ |\ 65532\ \ --\ more\ pseudo-code}\\
\mbox{\tt data\ Integer\ =\ \ \ \ \ \ \ ...\ -2\ |\ -1\ |\ 0\ |\ 1\ |\ 2\ ...}
\eprog
where \mbox{\tt -65532} and \mbox{\tt 65532}, say, are the maximum and minimum fixed
precision integers for a given implementation.  \mbox{\tt Int} is a much larger
enumeration than \mbox{\tt Char}, but it's still finite!  In contrast, the
pseudo-code for \mbox{\tt Integer} 
is intended to convey an {\em infinite} enumeration.

Tuples are also easy to define playing this game:
\bprog
\mbox{\tt data\ (a,b)\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ (a,b)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ --\ more\ pseudo-code}\\
\mbox{\tt data\ (a,b,c)\ \ \ \ \ \ \ \ \ \ \ \ =\ (a,b,c)}\\
\mbox{\tt data\ (a,b,c,d)\ \ \ \ \ \ \ \ \ \ =\ (a,b,c,d)}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}
\eprog
Each declaration above defines a tuple type of a particular length,
with \mbox{\tt (...)} playing a role in both the expression syntax (as data
constructor) and type-expression syntax (as type constructor).  The
vertical dots after the last declaration are intended to convey an
infinite number of such declarations, reflecting the fact that tuples
of all lengths are allowed in Haskell.

Lists are also easily handled, and more interestingly, they are recursive:
\bprog
\mbox{\tt \ data\ [a]\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ []\ |\ a\ :\ [a]\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ --\ more\ pseudo-code}
\eprog
We can now see clearly what we described about lists earlier: \mbox{\tt []} is
the empty list, and \mbox{\tt :} is the infix list constructor; thus \mbox{\tt [1,2,3]}
must be equivalent to the list \mbox{\tt 1:2:3:[]}.  (\mbox{\tt :} is right
associative.)  The type of \mbox{\tt []} is \mbox{\tt [a]}, and the type of \mbox{\tt :} is
\mbox{\tt a->[a]->[a]}.

\syn{The way ``\mbox{\tt :}'' is defined here is actually legal syntax---infix
constructors are permitted in \mbox{\tt data} declarations, and are
distinguished from infix operators (for pattern-matching purposes) by
the fact that they must begin with a ``\mbox{\tt :}'' (a property trivially
satisfied by ``\mbox{\tt :}'').}

At this point the reader should note carefully the differences between
tuples and lists, which the above definitions make abundantly clear.
In particular, note the recursive nature of the list type whose
elements are homogeneous and of arbitrary length, and the
non-recursive nature of a (particular) tuple type whose elements are
heterogeneous and of fixed length.  The typing rules for tuples and
lists should now also be clear:

For $\mbox{\tt (}e_1\mbox{\tt ,}e_2\mbox{\tt ,}\ldots\mbox{\tt ,}e_n\mbox{\tt )},\ n\geq2$, if $t_i$ is the
type of $e_i$, then the type of the tuple is 
$\mbox{\tt (}t_1\mbox{\tt ,}t_2\mbox{\tt ,}\ldots\mbox{\tt ,}t_n\mbox{\tt )}$.

For $\mbox{\tt [}e_1\mbox{\tt ,}e_2\mbox{\tt ,}\ldots\mbox{\tt ,}e_n\mbox{\tt ]},\ n\geq0$, each $e_i$ must have
the same type $t$, and the type of the list is \mbox{\tt [}$t$\mbox{\tt ]}.

\subsubsection{List Comprehensions and Arithmetic Sequences}
\label{tut-list-comps}

As with Lisp dialects, lists are pervasive in Haskell, and as with
other functional languages, there is yet more syntactic sugar to aid
in their creation.  Aside from the constructors for lists just
discussed, Haskell provides an expression known as a {\em list
comprehension} that is best explained by example:
\bprog
\mbox{\tt [\ f\ x\ |\ x\ <-\ xs\ ]}
\eprog 
This expression can intuitively be read as ``the list of all \mbox{\tt f\ x} 
such that \mbox{\tt x} is drawn from \mbox{\tt xs}.''  The similarity to set notation is
not a coincidence.  The phrase \mbox{\tt x\ <-\ xs} is called a {\em generator},
of which more than one is allowed, as in:
\bprog
\mbox{\tt [\ (x,y)\ |\ x\ <-\ xs,\ y\ <-\ ys\ ]}
\eprog 
This list comprehension forms the cartesian product of the two lists
\mbox{\tt xs} and \mbox{\tt ys}.  The elements are selected as if the generators were
``nested'' from left to right (with the rightmost generator varying
fastest); thus, if \mbox{\tt xs} is \mbox{\tt [1,2]} and \mbox{\tt ys} is \mbox{\tt [3,4]}, the result is
\mbox{\tt [(1,3),(1,4),(2,3),(2,4)]}.

Besides generators, boolean expressions called {\em guards} are
permitted.  Guards place constraints on the elements generated.  For
example, here is a concise definition of everybody's favorite sorting
algorithm:
\bprog
\mbox{\tt quicksort\ \ []\ \ \ \ \ \ \ \ \ \ \ =\ \ []}\\
\mbox{\tt quicksort\ (x:xs)\ \ \ \ \ \ \ \ =\ \ quicksort\ [y\ |\ y\ <-\ xs,\ y<x\ ]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ++\ [x]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ++\ quicksort\ [y\ |\ y\ <-\ xs,\ y>=x]}
\eprog

To further support the use of lists, Haskell has special syntax for
{\em arithmetic sequences}, which are best explained by a series of
examples:
\[\ba{lcl}
\mbox{\tt [1..10]}  \ \ \ &\red&\ \ \ \mbox{\tt [1,2,3,4,5,6,7,8,9,10]}\\
\mbox{\tt [1,3..10]}\ \ \ &\red&\ \ \ \mbox{\tt [1,3,5,7,9]}\\
% \mbox{\tt [1..]}    \ \ \ &\red&\ \ \ \mbox{\tt [1,2,3,4,5,\ ...}\ \ \ \ \ 
%                              \mbox{(infinite sequence)}\\
\mbox{\tt [1,3..]}  \ \ \ &\red&\ \ \ \mbox{\tt [1,3,5,7,9,\ ...}\ \ \ \ \ 
                             \mbox{(infinite sequence)}\\
\ea\]
More will be said about arithmetic sequences in Section
\ref{tut-enum-classes}, and ``infinite lists'' in Section
\ref{tut-infinite}.

\subsubsection{Strings}
\label{tut-strings}

As another example of syntactic sugar for built-in types, we
note that the literal string \mbox{\tt "hello"} is actually shorthand for the
list of characters \mbox{\tt ['h','e','l','l','o']}.  Indeed, the type of
\mbox{\tt "hello"} is \mbox{\tt String}, where \mbox{\tt String} is a predefined type synonym
(that we gave as an earlier example):
\bprog
\mbox{\tt type\ String\ \ \ \ \ \ \ \ \ \ \ \ \ =\ [Char]}
\eprog 
This means we can use predefined polymorphic list functions to operate
on strings.  For example:
\[
\mbox{\tt "hello"\ ++\ "\ world"}\ \ \ \ \red\ \ \ \ \mbox{\tt "hello\ world"}
\]

%**~footer