File: io.verb

package info (click to toggle)
haskell98-tutorial 200006-2-3
  • links: PTS
  • area: main
  • in suites: bullseye
  • size: 972 kB
  • sloc: haskell: 2,125; makefile: 68; sh: 9
file content (448 lines) | stat: -rw-r--r-- 17,157 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
%**<title>A Gentle Introduction to Haskell: IO</title>

%**~header

\section{Input/Output}
\label{tut-io}

The I/O system in Haskell is purely functional, yet has all of the
expressive power found in conventional programming languages.  In
imperative languages, programs proceed via "actions" which examine and
modify the current state of the world.  Typical actions include
reading and setting global variables, writing files, reading input,
and opening windows.  Such actions are also a part of Haskell but are
cleanly separated from the purely functional core of the language.

Haskell's I/O system is built around a somewhat daunting mathematical
foundation: the "monad".  However, understanding of the underlying
monad theory is not necessary to program using the I/O system.
Rather, monads are a conceptual structure into which I/O happens to fit.
It is no more necessary to understand monad theory to perform Haskell
I/O than it is to understand group theory to do simple arithmetic.  A
detailed explanation of monads is found in Section \ref{tut-monads}.

The monadic operators that the I/O system 
is built upon are also used for other purposes; we will look
more deeply into monads later.  For now, we will avoid the term monad
and concentrate on the use of the I/O system.  It's best to think of
the I/O monad as simply an abstract data type.   

Actions are defined rather than invoked within the expression
language of Haskell.
Evaluating the definition of an action doesn't actually
cause the action to happen.  Rather, the invocation of actions takes
place outside of the expression evaluation we have considered up to
this point.

Actions are either atomic, as defined in system primitives, or
are a sequential composition of other actions.  
The I/O monad contains primitives which
build composite actions, a process similar to joining
statements in sequential order using `@;@' in other languages.  Thus
the monad serves as the glue which binds together the actions in a
program. 

\subsection{Basic I/O Operations}

Every I/O action returns a value.  In the type system, the return value is
`tagged' with @IO@ type, distinguishing actions from other
values.  For example, the type 
of the function @getChar@ is:
\bprog
@
getChar                 ::   IO Char
@
\eprog
The @IO Char@ indicates that @getChar@, when invoked, performs
some action which returns a character.  Actions which return no
interesting values use the unit type, @()@.  For example, the
@putChar@ function: 
\bprog
@
putChar                 ::    Char -> IO ()
@
\eprog
takes a character as an argument but returns nothing useful. 
The unit type is similar to @void@ in other languages.

Actions are sequenced using an operator that has a
rather cryptic name: @>>=@ (or `bind').   Instead of using this
operator directly, we choose some syntactic sugar, the @do@
notation,  to hide these sequencing operators under a syntax resembling
more conventional languages.
The @do@ notation can be trivially expanded to @>>=@, 
as described in \see{do-expressions}.

The keyword @do@ introduces a sequence of statements
which are executed in order.  A statement is either an action,
a pattern bound to the result of an action using @<-@, or
a set of local definitions introduced using @let@.  The @do@ notation
uses layout in the same manner as @let@ or @where@ so we
can omit braces and semicolons with proper indentation.  Here is a
simple program to read and then print a character:
\bprog
@
main                    :: IO ()
main                    =  do c <- getChar
                              putChar c
@
\eprog
The use of the name @main@ is important: @main@ 
is defined to be the entry point of a Haskell program (similar
to the @main@ function in C), and
must have an @IO@ type, usually @IO ()@.  (The name @main@ is special
only in the module @Main@; we will have more to say about modules
later.)  This 
program performs two actions in 
sequence: first it reads in a character, binding the result to the
variable c, and then prints the character.  Unlike a @let@ expression
where variables are scoped over all definitions, the
variables defined by @<-@ are only in scope in the following statements.

There is still one missing piece.  We can invoke actions and examine
their results using @do@, but how do we return a value from a sequence
of actions?  For example, consider the @ready@ function that reads a
character and returns @True@ if the character was a `y':
\bprog
@
ready                   :: IO Bool
ready                   =  do c <- getChar
                              c == 'y'  -- Bad!!!
@
\eprog
This doesn't work because the second statement in the `do' is just a
boolean value, not an action.  We need to take this boolean and create
an action that does nothing but return the boolean as its result.
The @return@ function does just that:
\bprog
@
return                  ::   a -> IO a
@
\eprog
The @return@ function completes the set of sequencing primitives.  The
last line of @ready@ should read @return (c == 'y')@.

We are now ready to look at more complicated I/O functions.  First,
the function @getLine@:
\bprog
@
getLine     :: IO String
getLine     =  do c <- getChar
                  if c == '\n'
                       then return ""
                       else do l <- getLine
                               return (c:l)
@
\eprog
Note the second @do@ in the else clause.  Each @do@ introduces a single
chain of statements.   Any intervening
construct, such as the @if@, must use a new @do@ to initiate further
sequences of actions.

The @return@ function admits an ordinary value such as a boolean to
the realm of I/O actions. 
What about the other direction?  Can we invoke some I/O actions within an
ordinary expression?  For example, how can we say @x + print y@ 
in an expression so that @y@ is printed out as the
expression evaluates?  The answer is that we can't!  It is "not" possible to
sneak into the imperative universe while in the midst of purely
functional code.  Any value `infected' by the imperative world must be
tagged as such.  A function such as 
\bprog
@
f    ::  Int -> Int -> Int
@
\eprog
absolutely cannot do any I/O since @IO@ does not
appear in the returned type.
This fact is often quite distressing to
programmers used to placing print statements liberally throughout
their code during debugging.  There are, in fact, some unsafe
functions available to get around this problem but these are
better left to advanced programmers.  Debugging packages (like @Trace@)
often make liberal use of these `forbidden functions' in an entirely safe
manner.  

\subsection{Programming With Actions}
I/O actions are ordinary Haskell
values: they may be passed to functions, placed in structures, and
used as any other Haskell value.  Consider this list of actions:
\bprog
@
todoList :: [IO ()]

todoList = [putChar 'a',
            do putChar 'b'
               putChar 'c',
            do c <- getChar
               putChar c]
@
\eprog
This list doesn't actually invoke any actions---it simply holds them.
To join these actions into a single action, a function such as
@sequence_@ is needed:
\bprog
@
sequence_        :: [IO ()] -> IO ()
sequence_ []     =  return ()
sequence_ (a:as) =  do a
                       sequence as
@
\eprog
This can be simplified by noting that @do x;y@ is expanded to
@x >> y@ (see Section \ref{tut-monadic-classes}).  This pattern of
recursion is captured by the @foldr@ function (see the Prelude for a
definition of @foldr@); a better definition of @sequence_@ is:
\bprog
@
sequence_        :: [IO ()] -> IO ()
sequence_        =  foldr (>>) (return ())
@
\eprog
The @do@ notation is a useful tool but in this case the underlying
monadic operator, @>>@, is more appropriate.  An understanding of the
operators upon which @do@ is built is quite useful to the Haskell
programmer. 

The @sequence_@ function can be used to construct @putStr@ from
@putChar@: 
\bprog
@
putStr                  :: String -> IO ()
putStr s                =  sequence_ (map putChar s)
@
\eprog
One of the differences between Haskell and conventional
imperative programming can be seen in @putStr@.  In an imperative
language, mapping an imperative version of @putChar@ over the string
would be sufficient to print it.  In Haskell, however, the @map@
function does not perform any action.  Instead it creates a list of
actions, one for each character in the string.  The folding operation
in @sequence_@
uses the @>>@ function to combine all of the individual actions into a
single action.  The @return ()@ used here is  
quite necessary -- @foldr@ needs a null action at the end of the chain
of actions it creates (especially if there are no characters in the
string!). 

The Prelude and the libraries  contains many functions which are
useful for sequencing I/O actions.  These are usually generalized to
arbitrary monads; any function with a context including @Monad m =>@
works with the @IO@ type. 

\subsection{Exception Handling}

So far, we have avoided the issue of exceptions during I/O operations.
What would happen if @getChar@ encounters an end of
file?\footnote{We use the term "error" for "\bot": a condition which
cannot be recovered from such as non-termination or pattern match
failure.  Exceptions, on the other hand, can be caught and handled
within the I/O monad.}
To deal with exceptional conditions such as `file not found' within
the I/O monad, a handling mechanism is used, similar in functionality
to the one in standard ML. 
No special syntax or semantics are used; exception handling is part of
the definition of the I/O sequencing operations.

Errors are encoded using a special data type, @IOError@.  This type
represents all possible exceptions that may occur within the I/O monad.
This is an abstract type: no constructors for @IOError@ are available
to the user.  Predicates allow @IOError@ values to be
queried.  For example, the function
\bprog
@
isEOFError       :: IOError -> Bool
@
\eprog
determines whether an error was caused by an end-of-file condition.
By making @IOError@ abstract, new sorts of errors may be added to the
system without a noticeable change to the data type.  The function
@isEOFError@ is defined in a separate library, @IO@, and must be
explicitly imported into a program.

An {\em exception handler} has type @IOError -> IO a@.
The @catch@ function associates an exception handler with an action or
set of actions:
\bprog
@
catch                     :: IO a -> (IOError -> IO a) -> IO a
@
\eprog
The arguments to @catch@ are an action and a handler.  If the action
succeeds, its result is returned without invoking the handler.  If an
error occurs, it is passed to the handler as a value of type
@IOError@ and the action associated with the handler is then invoked.
For example, this version of @getChar@ returns a newline when an error
is encountered:
\bprog
@
getChar'                :: IO Char
getChar'                =  getChar `catch` (\e -> return '\n')
@
\eprog
This is rather crude since it treats all errors in the same manner.  If
only end-of-file is to be recognized, the error value must be queried:
\bprog
@
getChar'                :: IO Char
getChar'                =  getChar `catch` eofHandler where
    eofHandler e = if isEofError e then return '\n' else ioError e
@
\eprog
The @ioError@ function used here throws an exception on to the next
exception handler.  The type of @ioError@ is
\bprog
@
ioError                 :: IOError -> IO a
@
\eprog
It is similar to
@return@ except that it transfers control to the exception handler
instead of proceeding to the next 
I/O action.  Nested calls to @catch@ are
permitted, and produce nested exception handlers.

Using @getChar'@, we can redefine @getLine@ to demonstrate the use of
nested handlers:
\bprog
@
getLine'        :: IO String
getLine'        = catch getLine'' (\err -> return ("Error: " ++ show err))
        where
                   getLine'' = do c <- getChar'
 	                       if c == '\n' then return ""
                                            else do l <- getLine'
                                                    return (c:l)
@
\eprog

The nested error handlers allow @getChar'@ to catch end of file
while any other error results in a string starting with @"Error: "@
from @getLine'@.

For convenience, Haskell provides a default exception handler at the
topmost level of a program that prints out the
exception and terminates the program.

\subsection{Files, Channels, and Handles}

Aside from the I/O monad and the exception handling mechanism it
provides, I/O facilities in Haskell are for the most part quite
similar to those in other languages.  Many of these functions are in
the @IO@ library instead of the Prelude and thus must be explicitly
imported to be in scope (modules and importing are discussed in
Section~\ref{tut-modules}).  Also, many of these functions are
discussed in the Library Report instead of the main report.

Opening a file creates a "handle" (of type @Handle@) for use in I/O
transactions.  Closing the handle closes the associated file:
\bprog
@
type FilePath         =  String  -- path names in the file system
openFile              :: FilePath -> IOMode -> IO Handle
hClose                :: Handle -> IO () 
data IOMode           =  ReadMode | WriteMode | AppendMode | ReadWriteMode
@
\eprog
Handles can also be associated with "channels": communication ports
not directly attached to files.  A few channel handles are predefined,
including @stdin@ (standard input), @stdout@ (standard output), and
@stderr@ (standard error).  Character level I/O operations include
@hGetChar@ and @hPutChar@, which take a handle as an argument.  The
@getChar@ function used previously can be defined as:
\bprog
@
getChar                = hGetChar stdin
@
\eprog
Haskell also allows the entire contents of a file or channel to be
returned as a single string:
\bprog
@
getContents            :: Handle -> IO String
@
\eprog
Pragmatically, it may seem that @getContents@ must immediately read an
entire file or channel, resulting in poor space and time performance
under certain conditions.  However, this is not the case.  The key
point is that @getContents@ returns a ``lazy'' (i.e. non-strict) list
of characters (recall that strings are just lists of characters in
Haskell), whose elements are read ``by demand'' just like any other
list.  An implementation can be expected to implement this
demand-driven behavior by reading one character at a time from the
file as they are required by the computation.

In this example, a Haskell program copies one file to another:
\bprog
@
main = do fromHandle <- getAndOpenFile "Copy from: " ReadMode
          toHandle   <- getAndOpenFile "Copy to: " WriteMode 
          contents   <- hGetContents fromHandle
          hPutStr toHandle contents
          hClose toHandle
          putStr "Done."

getAndOpenFile          :: String -> IOMode -> IO Handle

getAndOpenFile prompt mode =
    do putStr prompt
       name <- getLine
       catch (openFile name mode)
             (\_ -> do putStrLn ("Cannot open "++ name ++ "\n")
                       getAndOpenFile prompt mode)
         
@
\eprog
By using the lazy @getContents@ function, the entire contents of the
file need not be read into memory all at once.  If @hPutStr@ chooses
to buffer the output by writing the string in fixed sized blocks of
characters, only one block of the input file needs to be in memory at
once.  The input file is closed implicitly when the last character has
been read.

\subsection{Haskell and Imperative Programming}

As a final note, I/O programming raises an important issue: this
style looks suspiciously like ordinary imperative programming.  For
example, the @getLine@ function:
\bprog
@
getLine         = do c <- getChar
                     if c == '\n'
                          then return ""
                          else do l <- getLine
                                  return (c:l)
@
\eprog
bears a striking similarity to imperative code (not in any real language) :
\bprog
@

function getLine() {
  c := getChar();
  if c == `\n` then return ""
               else {l := getLine();
                     return c:l}}
@
\eprog
So, in the end, has Haskell simply re-invented the imperative wheel?

In some sense, yes.  The I/O monad constitutes a small imperative
sub-language inside Haskell, and thus the I/O component of a program
may appear similar to ordinary imperative code.  But there is one
important difference: There is no special semantics that the user
needs to deal with.  In particular, equational reasoning in Haskell is
not compromised.  The imperative feel of the monadic code in a program
does not detract from the functional aspect of Haskell.  An
experienced functional programmer should be able to minimize the
imperative component of the program, only using the I/O monad for a
minimal amount of top-level sequencing.  The
monad cleanly separates the functional and imperative
program components.  In contrast, imperative languages with functional
subsets do not generally have any well-defined barrier between the
purely functional and imperative worlds.


%**~footer