1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515
|
%**<title>A Gentle Introduction to Haskell: Values and Types</title>
%**~header
\section{Values, Types, and Other Goodies}
\label{tut-values-etc}
Because Haskell is a purely functional language, all computations are
done via the evaluation of {\em expressions} (syntactic terms) to
yield {\em values} (abstract entities that we regard as answers).
Every value has an associated
{\em type}. (Intuitively, we can think of types as sets of values.)
Examples of expressions include atomic values such as the integer \mbox{\tt 5},
the character \mbox{\tt 'a'}, and the function \mbox{\tt {\char'134}x\ ->\ x+1}, as well as
structured values such as the list \mbox{\tt [1,2,3]} and the pair \mbox{\tt ('b',4)}.
Just as expressions denote values, type expressions are
syntactic terms that denote type values (or just {\em types}).
Examples of type expressions include the atomic types \mbox{\tt Integer}
(infinite-precision integers), \mbox{\tt Char} (characters), \mbox{\tt Integer->Integer}
(functions mapping \mbox{\tt Integer} to \mbox{\tt Integer}), as well as the structured types
\mbox{\tt [Integer]} (homogeneous lists of integers) and \mbox{\tt (Char,Integer)}
(character, integer pairs).
All Haskell values are ``first-class''---they may be passed as
arguments to functions, returned as results, placed in data
structures, etc. Haskell types, on the other hand, are {\em not}
first-class. Types in a sense describe values, and the
association of a value with its type is called a {\em typing}. Using
the examples of values and types above, we write typings as follows:
\bprog
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 5\ \ ::\ Integer}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 'a'\ ::\ Char}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ inc\ ::\ Integer\ ->\ Integer}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ [1,2,3]\ ::\ [Integer]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ('b',4)\ ::\ (Char,Integer)}
\eprog
The ``\mbox{\tt ::}'' can be read ``has type.''
Functions in Haskell are normally defined by a series of {\em
equations}. For example, the function \mbox{\tt inc} can be
defined by the single equation:
\bprog
\mbox{\tt inc\ n\ \ \ \ \ \ \ \ \ \ =\ n+1}
\eprog
An equation is an example of a {\em declaration}. Another kind of
declaration is a {\em type signature declaration}
(\see{type-signatures}), with which we can declare an explicit typing
for \mbox{\tt inc}:
\bprog
\mbox{\tt inc\ \ \ \ \ \ \ \ \ \ \ \ ::\ Integer\ ->\ Integer}
\eprog
We will have much more to say about function definitions in Section
\ref{tut-functions}.
For pedagogical purposes, when we wish to indicate that an expression
$e_1$ evaluates, or ``reduces,'' to another expression or value $e_2$,
we will write:
\[ e_1\ \ \ \ \red\ \ \ \ e_2 \]
For example, note that:
\[ \mbox{\tt inc\ (inc\ 3)}\ \ \ \ \red\ \ \ \ \mbox{\tt 5} \]
Haskell's static type system defines the formal relationship
between types and values (\see{type-semantics}). The static type
system ensures that Haskell programs are {\em type safe}; that is,
that the programmer has not mismatched types in some way. For
example, we cannot generally add together two characters, so the
expression \mbox{\tt 'a'+'b'} is ill-typed. The main advantage of statically
typed languages is well-known: All type errors are detected at
compile-time. Not all errors are caught by the type system; an
expression such as \mbox{\tt 1/0} is typable but its evaluation will result in
an error at execution time. Still, the type system finds many
program errors at compile time, aids the user in reasoning about
programs, and also permits a compiler to generate more efficient code
(for example, no run-time type tags or tests are required).
The type system also ensures that user-supplied type signatures are
correct. In fact, Haskell's type system is powerful enough to allow
us to avoid writing any type signatures at all;\footnote{With a few
exceptions to be described later.} we say that the type
system {\em infers} the correct types for us. Nevertheless, judicious
placement of type signatures such as that we gave for \mbox{\tt inc} is a good idea,
since type signatures are a very effective form of documentation and
help bring programming errors to light.
\syn{The reader will note that we have capitalized identifiers that
denote specific types, such as \mbox{\tt Integer} and \mbox{\tt Char}, but not identifiers
that denote values, such as \mbox{\tt inc}. This is not just a convention: it
is enforced by Haskell's lexical syntax. In fact, the case of the
other characters matters, too: \mbox{\tt foo}, \mbox{\tt fOo}, and \mbox{\tt fOO} are all
distinct identifiers.}
\subsection{Polymorphic Types}
\label{tut-polymorphism}
Haskell also incorporates {\em polymorphic} types---types that are
universally quantified in some way over all types. Polymorphic
type expressions essentially describe families of types. For
example, $(\forall$\mbox{\tt a}$)$\mbox{\tt [a]} is the family of types consisting of,
for every type \mbox{\tt a}, the type of lists of \mbox{\tt a}. Lists of integers
(e.g.~\mbox{\tt [1,2,3]}), lists of characters (\mbox{\tt ['a','b','c']}), even lists of
lists of integers, etc., are all members of this family. (Note,
however, that \mbox{\tt [2,'b']} is {\em not} a valid example, since there is
no single type that contains both \mbox{\tt 2} and \mbox{\tt 'b'}.)
\syn{Identifiers such as \mbox{\tt a} above are called {\em type variables},
and are uncapitalized to distinguish them from specific types such as
\mbox{\tt Int}. Furthermore, since Haskell has only universally quantified
types, there is no need to explicitly write out the symbol for
universal quantification, and thus we simply write \mbox{\tt [a]} in the
example above. In other words, all type variables are implicitly
universally quantified.}
Lists are a commonly used data structure in functional languages, and
are a good vehicle for explaining the principles of polymorphism. The
list \mbox{\tt [1,2,3]} in Haskell is actually shorthand for the list
\mbox{\tt 1:(2:(3:[]))}, where \mbox{\tt []} is the empty list and \mbox{\tt :} is the infix
operator that adds its first argument to the front of its second
argument (a list).\footnote{\mbox{\tt :} and \mbox{\tt []} are like Lisp's \mbox{\tt cons} and
\mbox{\tt nil}, respectively.} Since \mbox{\tt :} is right associative, we can also
write this list as \mbox{\tt 1:2:3:[]}.
As an example of a user-defined function that operates on lists,
consider the problem of counting the number of elements in a list:
\bprog
\mbox{\tt length\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ Integer}\\
\mbox{\tt length\ []\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ 0}\\
\mbox{\tt length\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ =\ \ 1\ +\ length\ xs}
\eprog
This definition is almost self-explanatory. We can read the equations
as saying: ``The length of the empty list is 0, and the length of a
list whose first element is \mbox{\tt x} and remainder is \mbox{\tt xs} is 1 plus the
length of \mbox{\tt xs}.'' (Note the naming convention used here; \mbox{\tt xs} is the
plural of \mbox{\tt x}, and should be read that way.)
Although intuitive, this example highlights an important aspect of
Haskell that is yet to be explained: {\em pattern matching}. The
left-hand sides of the equations contain patterns such as \mbox{\tt []}
and \mbox{\tt x:xs}. In a function application these patterns are
matched against actual parameters in a fairly intuitive way (\mbox{\tt []}
only matches the empty list, and \mbox{\tt x:xs} will successfully match any
list with at least one element, binding \mbox{\tt x} to the first element and
\mbox{\tt xs} to the rest of the list). If the match succeeds, the right-hand
side is evaluated and returned as the result of the application. If
it fails, the next equation is tried, and if all equations fail, an
error results.
Defining functions by pattern matching is quite common in Haskell, and
the user should become familiar with the various kinds of patterns
that are allowed; we will return to this issue in
Section~\ref{tut-pattern-matching}.
The \mbox{\tt length} function is also an example of a polymorphic
function. It can
be applied to a list containing elements of any type, for example
\mbox{\tt [Integer]}, \mbox{\tt [Char]}, or \mbox{\tt [[Integer]]}.
\[\ba{lcl}
\mbox{\tt length\ [1,2,3]} &\ \ \ \ \red\ \ \ \ & 3\\
\mbox{\tt length\ ['a','b','c']}&\ \ \ \ \red\ \ \ \ & 3\\
\mbox{\tt length\ [[1],[2],[3]]} &\ \ \ \ \red\ \ \ \ & 3
\ea\]
Here are two other useful polymorphic functions on lists that will be
used later. Function \mbox{\tt head} returns the first element of a list,
function \mbox{\tt tail} returns all but the first.
\bprog
\mbox{\tt head\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ a}\\
\mbox{\tt head\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ x}\\
\mbox{\tt }\\[-8pt]
\mbox{\tt tail\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ [a]\ ->\ [a]}\\
\mbox{\tt tail\ (x:xs)\ \ \ \ \ \ \ \ \ \ \ \ \ =\ \ xs}
\eprog
Unlike \mbox{\tt length}, these functions are not defined for all possible
values of their argument. A runtime error occurs when these functions
are applied to an empty list.
With polymorphic types, we find that some types are in a sense
strictly more general than others in the sense that the set of
values they define is larger. For example, the type \mbox{\tt [a]}
is more general than \mbox{\tt [Char]}. In other words, the latter type can be
derived from the former by a suitable substitution for \mbox{\tt a}. With
regard to this generalization ordering, Haskell's type system
possesses two important properties: First, every well-typed expression
is guaranteed to have a unique principal type (explained below),
and second, the principal type can be inferred automatically
(\see{type-semantics}). In comparison to a monomorphically
typed language such as C, the reader will find that
polymorphism improves expressiveness, and type inference
lessens the burden of types on the programmer.
An expression's or function's principal type is the least general type
that, intuitively, ``contains all instances of the expression''. For
example, the principal type of \mbox{\tt head} is \mbox{\tt [a]->a}; \mbox{\tt [b]->a},
\mbox{\tt a->a}, or even \mbox{\tt a} are correct types, but too general, whereas something
like \mbox{\tt [Integer]->Integer} is too specific. The existence of unique principal
types is the hallmark feature of the {\em Hindley-Milner type system},
which forms the basis of the type systems of Haskell, ML,
Miranda,\footnote{``Miranda'' is a trademark of Research Software,
Ltd.} and several other (mostly functional) languages.
\subsection{User-Defined Types}
\label{tut-user-types}
We can define our own types in Haskell using a \mbox{\tt data} declaration,
which we introduce via a series of examples (\see{datatype-decls}).
An important predefined type in Haskell is that of truth values:
\bprog
\mbox{\tt data\ Bool\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ False\ |\ True}
\eprog
The type being defined here is \mbox{\tt Bool}, and it has exactly two values:
\mbox{\tt True} and \mbox{\tt False}. Type \mbox{\tt Bool} is an example of a (nullary) {\em type
constructor}, and \mbox{\tt True} and \mbox{\tt False} are (also nullary) {\em data
constructors} (or just {\em constructors}, for short).
Similarly, we might wish to define a color type:
\bprog
\mbox{\tt data\ Color\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ Red\ |\ Green\ |\ Blue\ |\ Indigo\ |\ Violet}
\eprog
Both \mbox{\tt Bool} and \mbox{\tt Color} are examples of enumerated types, since
they consist of a finite number of nullary data constructors.
Here is an example of a type with just one data constructor:
\bprog
\mbox{\tt data\ Point\ a\ \ \ \ \ \ \ \ \ \ \ \ =\ Pt\ a\ a}
\eprog
Because of the single constructor, a type like \mbox{\tt Point} is often
called a {\em tuple type}, since it is essentially just a cartesian
product (in this case binary) of other types.\footnote{Tuples are
somewhat like records in other languages.}
In contrast, multi-constructor types, such as \mbox{\tt Bool} and
\mbox{\tt Color}, are called (disjoint) union or sum types.
More importantly, however, \mbox{\tt Point} is an example of a
polymorphic type: for any type $t$, it defines the type of cartesian
points that use $t$ as the coordinate type. The \mbox{\tt Point} type can now be seen
clearly as a unary type constructor, since from the type $t$ it
constructs a new type \mbox{\tt Point\ }$t$. (In the same sense, using the list
example given earlier, \mbox{\tt []} is also a type constructor. Given any type $t$
we can ``apply'' \mbox{\tt []} to yield a new type \mbox{\tt [}$t$\mbox{\tt ]}. The Haskell
syntax allows \mbox{\tt []\ }$t$ to be written as \mbox{\tt [}$t$\mbox{\tt ]}. Similarly,
\mbox{\tt ->} is a type constructor: given two types $t$ and $u$,
$t$\mbox{\tt ->}$u$ is the type of functions mapping elements of type $t$ to
elements of type $u$.)
Note that the type of the binary data constructor \mbox{\tt Pt} is \mbox{\tt a\ ->\ a\ ->\ Point\ a},
and thus the following typings are valid:
\bprog
\mbox{\tt Pt\ \ 2.0\ \ 3.0\ \ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Float}\\
\mbox{\tt Pt\ \ 'a'\ \ 'b'\ \ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Char}\\
\mbox{\tt Pt\ True\ False\ \ \ \ \ \ \ \ \ \ \ ::\ Point\ Bool}
\eprog
On the other hand, an expression such as \mbox{\tt Pt\ 'a'\ 1} is ill-typed
because \mbox{\tt 'a'} and \mbox{\tt 1} are of different types.
It is important to distinguish between applying a {\em data constructor} to
yield a {\em value}, and applying a {\em type constructor} to yield a
{\em type}; the former happens at run-time and is how we compute
things in Haskell, whereas the latter happens at compile-time and is
part of the type system's process of ensuring type safety.
\syn{Type constructors such as \mbox{\tt Point} and data constructors such as
\mbox{\tt Pt} are in separate namespaces. This allows the same name to be used
for both a type constructor and data constructor, as in the following:
\bprog
\mbox{\tt data\ Point\ a\ =\ Point\ a\ a}
\eprog
While this may seem a little confusing at first, it serves to make the
link between a type and its data constructor more obvious.}
\subsubsection{Recursive Types}
\label{tut-recursive-types}
Types can also be recursive, as in the type of binary trees:
\bprog
\mbox{\tt data\ Tree\ a\ \ \ \ \ \ \ \ \ \ \ \ \ =\ Leaf\ a\ |\ Branch\ (Tree\ a)\ (Tree\ a)\ }
\eprog
Here we have defined a polymorphic binary tree type whose elements
are either leaf nodes containing a value of type \mbox{\tt a}, or internal
nodes (``branches'') containing (recursively) two sub-trees.
When reading data declarations such as this, remember again that \mbox{\tt Tree} is a
type constructor, whereas \mbox{\tt Branch} and \mbox{\tt Leaf} are data constructors.
Aside from establishing a connection between these constructors, the
above declaration is essentially defining the following types for
\mbox{\tt Branch} and \mbox{\tt Leaf}:
\bprog
\mbox{\tt Branch\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ Tree\ a\ ->\ Tree\ a\ ->\ Tree\ a}\\
\mbox{\tt Leaf\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ a\ ->\ Tree\ a}
\eprog
With this example we have defined a type sufficiently rich to
allow defining some interesting (recursive) functions that use it.
For example, suppose we wish to define a function \mbox{\tt fringe} that
returns a list of all the elements in the leaves of a tree from left
to right. It's usually helpful to write down the type of new
functions first; in this case we see that the type should be
\mbox{\tt Tree\ a\ ->\ [a]}. That is, \mbox{\tt fringe} is a polymorphic function that,
for any type \mbox{\tt a}, maps trees of \mbox{\tt a} into lists of \mbox{\tt a}. A suitable
definition follows:
\bprog
\mbox{\tt fringe\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ::\ Tree\ a\ ->\ [a]}\\
\mbox{\tt fringe\ (Leaf\ x)\ \ \ \ \ \ \ \ \ \ \ \ =\ \ [x]}\\
\mbox{\tt fringe\ (Branch\ left\ right)\ =\ \ fringe\ left\ ++\ fringe\ right}
\eprog
Here \mbox{\tt ++} is the infix operator that concatenates two lists (its full
definition will be given in Section \ref{tut-monadic-classes}). As
with the \mbox{\tt length} example given earlier, the \mbox{\tt fringe} function is
defined using pattern matching, except that here we see patterns involving
user-defined constructors: \mbox{\tt Leaf} and \mbox{\tt Branch}. \syn{Note that the
formal parameters are easily identified as the ones beginning with
lower-case letters.}
\subsection{Type Synonyms}
\label{tut-type-synonyms}
For convenience, Haskell provides a way to define {\em type synonyms};
i.e.~names for commonly used types. Type synonyms are created using a
\mbox{\tt type} declaration (\see{type-synonym-decls}). Here are several
examples:
\bprog
\mbox{\tt type\ String\ \ \ \ \ \ \ \ \ \ \ \ \ =\ [Char]}\\
\mbox{\tt type\ Person\ \ \ \ \ \ \ \ \ \ \ \ \ =\ (Name,Address)}\\
\mbox{\tt type\ Name\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ String}\\
\mbox{\tt data\ Address\ \ \ \ \ \ \ \ \ \ \ \ =\ None\ |\ Addr\ String}
\eprog
Type synonyms do not define new types, but simply give new names
for existing types. For example, the type \mbox{\tt Person\ ->\ Name} is
precisely equivalent to \mbox{\tt (String,Address)\ ->\ String}. The new names
are often shorter than the types they are synonymous with, but this is
not the only purpose of type synonyms: they can also improve
readability of programs by being more mnemonic; indeed, the above
examples highlight this. We can even give new names to polymorphic
types:
\bprog
\mbox{\tt type\ AssocList\ a\ b\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ [(a,b)]}
\eprog
This is the type of ``association lists'' which associate values of
type \mbox{\tt a} with those of type \mbox{\tt b}.
\subsection{Built-in Types Are Not Special}
\label{tut-built-ins}
Earlier we introduced several ``built-in'' types such as lists,
tuples, integers, and characters. We have also shown how new
user-defined types can be defined. Aside from special syntax, are
the built-in types in any way more special than the user-defined ones?
The answer is {\em no}. The special syntax is for convenience and for
consistency with historical convention, but has no semantic
consequences.
We can emphasize this point by considering what the type
declarations would look like for these built-in types if in fact we
were allowed to use the special syntax in defining them. For example,
the \mbox{\tt Char} type might be written as:
\bprog
\mbox{\tt data\ Char\ \ \ \ \ \ \ =\ 'a'\ |\ 'b'\ |\ 'c'\ |\ ...\ \ \ \ \ \ \ \ \ --\ This\ is\ not\ valid}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ 'A'\ |\ 'B'\ |\ 'C'\ |\ ...\ \ \ \ \ \ \ \ \ --\ Haskell\ code!}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ '1'\ |\ '2'\ |\ '3'\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ...}
\eprog
These constructor names are not syntactically valid; to fix them we
would have to write something like:
\bprog
\mbox{\tt data\ Char\ \ \ \ \ \ \ =\ Ca\ |\ Cb\ |\ Cc\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ CA\ |\ CB\ |\ CC\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ C1\ |\ C2\ |\ C3\ |\ ...}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ...}
\eprog
Even though these constructors are more concise, they are quite
unconventional for representing characters.
In any case, writing ``pseudo-Haskell'' code in this way helps us to
see through the special syntax. We see now that \mbox{\tt Char} is just an
enumerated type consisting of a large number of nullary constructors.
Thinking of \mbox{\tt Char} in this way makes it clear that we can
pattern-match against characters in function definitions, just as we
would expect to be able to do so for any of a type's constructors.
\syn{This example also demonstrates the use of {\em comments} in
Haskell; the characters \mbox{\tt --} and all subsequent characters to the end
of the line are ignored. Haskell also permits {\em nested} comments
which have the form \mbox{\tt {\char'173}-}$\ldots$\mbox{\tt -{\char'175}} and can appear anywhere
(\see{lexemes}).}
Similarly, we could define \mbox{\tt Int} (fixed precision integers) and
\mbox{\tt Integer} by:
\bprog
\mbox{\tt data\ Int\ \ \ \ \ =\ -65532\ |\ ...\ |\ -1\ |\ 0\ |\ 1\ |\ ...\ |\ 65532\ \ --\ more\ pseudo-code}\\
\mbox{\tt data\ Integer\ =\ \ \ \ \ \ \ ...\ -2\ |\ -1\ |\ 0\ |\ 1\ |\ 2\ ...}
\eprog
where \mbox{\tt -65532} and \mbox{\tt 65532}, say, are the maximum and minimum fixed
precision integers for a given implementation. \mbox{\tt Int} is a much larger
enumeration than \mbox{\tt Char}, but it's still finite! In contrast, the
pseudo-code for \mbox{\tt Integer}
is intended to convey an {\em infinite} enumeration.
Tuples are also easy to define playing this game:
\bprog
\mbox{\tt data\ (a,b)\ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ (a,b)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ --\ more\ pseudo-code}\\
\mbox{\tt data\ (a,b,c)\ \ \ \ \ \ \ \ \ \ \ \ =\ (a,b,c)}\\
\mbox{\tt data\ (a,b,c,d)\ \ \ \ \ \ \ \ \ \ =\ (a,b,c,d)}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}\\
\mbox{\tt \ .\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ .}
\eprog
Each declaration above defines a tuple type of a particular length,
with \mbox{\tt (...)} playing a role in both the expression syntax (as data
constructor) and type-expression syntax (as type constructor). The
vertical dots after the last declaration are intended to convey an
infinite number of such declarations, reflecting the fact that tuples
of all lengths are allowed in Haskell.
Lists are also easily handled, and more interestingly, they are recursive:
\bprog
\mbox{\tt \ data\ [a]\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\ []\ |\ a\ :\ [a]\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ --\ more\ pseudo-code}
\eprog
We can now see clearly what we described about lists earlier: \mbox{\tt []} is
the empty list, and \mbox{\tt :} is the infix list constructor; thus \mbox{\tt [1,2,3]}
must be equivalent to the list \mbox{\tt 1:2:3:[]}. (\mbox{\tt :} is right
associative.) The type of \mbox{\tt []} is \mbox{\tt [a]}, and the type of \mbox{\tt :} is
\mbox{\tt a->[a]->[a]}.
\syn{The way ``\mbox{\tt :}'' is defined here is actually legal syntax---infix
constructors are permitted in \mbox{\tt data} declarations, and are
distinguished from infix operators (for pattern-matching purposes) by
the fact that they must begin with a ``\mbox{\tt :}'' (a property trivially
satisfied by ``\mbox{\tt :}'').}
At this point the reader should note carefully the differences between
tuples and lists, which the above definitions make abundantly clear.
In particular, note the recursive nature of the list type whose
elements are homogeneous and of arbitrary length, and the
non-recursive nature of a (particular) tuple type whose elements are
heterogeneous and of fixed length. The typing rules for tuples and
lists should now also be clear:
For $\mbox{\tt (}e_1\mbox{\tt ,}e_2\mbox{\tt ,}\ldots\mbox{\tt ,}e_n\mbox{\tt )},\ n\geq2$, if $t_i$ is the
type of $e_i$, then the type of the tuple is
$\mbox{\tt (}t_1\mbox{\tt ,}t_2\mbox{\tt ,}\ldots\mbox{\tt ,}t_n\mbox{\tt )}$.
For $\mbox{\tt [}e_1\mbox{\tt ,}e_2\mbox{\tt ,}\ldots\mbox{\tt ,}e_n\mbox{\tt ]},\ n\geq0$, each $e_i$ must have
the same type $t$, and the type of the list is \mbox{\tt [}$t$\mbox{\tt ]}.
\subsubsection{List Comprehensions and Arithmetic Sequences}
\label{tut-list-comps}
As with Lisp dialects, lists are pervasive in Haskell, and as with
other functional languages, there is yet more syntactic sugar to aid
in their creation. Aside from the constructors for lists just
discussed, Haskell provides an expression known as a {\em list
comprehension} that is best explained by example:
\bprog
\mbox{\tt [\ f\ x\ |\ x\ <-\ xs\ ]}
\eprog
This expression can intuitively be read as ``the list of all \mbox{\tt f\ x}
such that \mbox{\tt x} is drawn from \mbox{\tt xs}.'' The similarity to set notation is
not a coincidence. The phrase \mbox{\tt x\ <-\ xs} is called a {\em generator},
of which more than one is allowed, as in:
\bprog
\mbox{\tt [\ (x,y)\ |\ x\ <-\ xs,\ y\ <-\ ys\ ]}
\eprog
This list comprehension forms the cartesian product of the two lists
\mbox{\tt xs} and \mbox{\tt ys}. The elements are selected as if the generators were
``nested'' from left to right (with the rightmost generator varying
fastest); thus, if \mbox{\tt xs} is \mbox{\tt [1,2]} and \mbox{\tt ys} is \mbox{\tt [3,4]}, the result is
\mbox{\tt [(1,3),(1,4),(2,3),(2,4)]}.
Besides generators, boolean expressions called {\em guards} are
permitted. Guards place constraints on the elements generated. For
example, here is a concise definition of everybody's favorite sorting
algorithm:
\bprog
\mbox{\tt quicksort\ \ []\ \ \ \ \ \ \ \ \ \ \ =\ \ []}\\
\mbox{\tt quicksort\ (x:xs)\ \ \ \ \ \ \ \ =\ \ quicksort\ [y\ |\ y\ <-\ xs,\ y<x\ ]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ++\ [x]}\\
\mbox{\tt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ++\ quicksort\ [y\ |\ y\ <-\ xs,\ y>=x]}
\eprog
To further support the use of lists, Haskell has special syntax for
{\em arithmetic sequences}, which are best explained by a series of
examples:
\[\ba{lcl}
\mbox{\tt [1..10]} \ \ \ &\red&\ \ \ \mbox{\tt [1,2,3,4,5,6,7,8,9,10]}\\
\mbox{\tt [1,3..10]}\ \ \ &\red&\ \ \ \mbox{\tt [1,3,5,7,9]}\\
% \mbox{\tt [1..]} \ \ \ &\red&\ \ \ \mbox{\tt [1,2,3,4,5,\ ...}\ \ \ \ \
% \mbox{(infinite sequence)}\\
\mbox{\tt [1,3..]} \ \ \ &\red&\ \ \ \mbox{\tt [1,3,5,7,9,\ ...}\ \ \ \ \
\mbox{(infinite sequence)}\\
\ea\]
More will be said about arithmetic sequences in Section
\ref{tut-enum-classes}, and ``infinite lists'' in Section
\ref{tut-infinite}.
\subsubsection{Strings}
\label{tut-strings}
As another example of syntactic sugar for built-in types, we
note that the literal string \mbox{\tt "hello"} is actually shorthand for the
list of characters \mbox{\tt ['h','e','l','l','o']}. Indeed, the type of
\mbox{\tt "hello"} is \mbox{\tt String}, where \mbox{\tt String} is a predefined type synonym
(that we gave as an earlier example):
\bprog
\mbox{\tt type\ String\ \ \ \ \ \ \ \ \ \ \ \ \ =\ [Char]}
\eprog
This means we can use predefined polymorphic list functions to operate
on strings. For example:
\[
\mbox{\tt "hello"\ ++\ "\ world"}\ \ \ \ \red\ \ \ \ \mbox{\tt "hello\ world"}
\]
%**~footer
|