1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586
|
\chapter{Overview of Scheme}
\label{semanticchapter}
This chapter gives an overview of Scheme's semantics.
The purpose of this overview is to explain
enough about the basic concepts of the language to facilitate
understanding of the subsequent chapters of the report, which are
organized as a reference manual. Consequently, this overview is
not a complete introduction to the language, nor is it precise
in all respects or normative in any way.
\vest Following Algol, Scheme is a statically scoped programming
language. Each use of a variable is associated with a lexically
apparent binding of that variable.
\vest Scheme has latent as opposed to manifest types
\cite{WaiteGoos}. Types
are associated with objects\mainindex{object} (also called values) rather than
with variables. (Some authors refer to languages with latent types as
untyped, weakly typed or dynamically typed languages.) Other languages with
latent types are Python, Ruby, Smalltalk, and other dialects of Lisp. Languages
with manifest types (sometimes referred to as strongly typed or
statically typed languages) include Algol 60, C, C\#, Java, Haskell, and ML.
\vest All objects created in the course of a Scheme computation, including
procedures and continuations, have unlimited extent.
No Scheme object is ever destroyed. The reason that
implementations of Scheme do not (usually!)\ run out of storage is that
they are permitted to reclaim the storage occupied by an object if
they can prove that the object cannot possibly matter to any future
computation. Other languages in which most objects have unlimited
extent include C\#, Java, Haskell, most Lisp dialects, ML, Python,
Ruby, and Smalltalk.
Implementations of Scheme must be properly tail-recursive.
This allows the execution of an iterative computation in constant space,
even if the iterative computation is described by a syntactically
recursive procedure. Thus with a properly tail-recursive implementation,
iteration can be expressed using the ordinary procedure-call
mechanics, so that special iteration constructs are useful only as
syntactic sugar.
\vest Scheme was one of the first languages to support procedures as
objects in their own right. Procedures can be created dynamically,
stored in data structures, returned as results of procedures, and so
on. Other languages with these properties include Common Lisp,
Haskell, ML, Ruby, and Smalltalk.
\vest One distinguishing feature of Scheme is that continuations, which
in most other languages only operate behind the scenes, also have
``first-class'' status. First-class continuations are useful for implementing a
wide variety of advanced control constructs, including non-local exits,
backtracking, and coroutines.
In Scheme, the argument expressions of a procedure call are evaluated
before the procedure gains control, whether the procedure needs the
result of the evaluation or not. C, C\#, Common Lisp, Python,
Ruby, and Smalltalk are other languages that always evaluate argument
expressions before invoking a procedure. This is distinct from the
lazy-evaluation semantics of Haskell, or the call-by-name semantics of
Algol 60, where an argument expression is not evaluated unless its
value is needed by the procedure.
Scheme's model of arithmetic provides a rich set of numerical types
and operations on them. Furthermore, it distinguishes \textit{exact}
and \textit{inexact} number objects: Essentially, an exact number
object corresponds to a number exactly, and an inexact number object
is the result of a computation that involved rounding or other errors.
\section{Basic types}
Scheme programs manipulate \textit{objects}, which are also referred
to as \textit{values}.
Scheme objects are organized into sets of values called \textit{types}.
This section gives an overview of the fundamentally important types of the
Scheme language. More types are described in later chapters.
\begin{note}
As Scheme is latently typed, the use of the term \textit{type} in
this report differs from the use of the term in the context of other
languages, particularly those with manifest typing.
\end{note}
\paragraph{Booleans}
\mainindex{boolean}A boolean is a truth value, and can be either
true or false. In Scheme, the object for ``false'' is written
\schfalse{}. The object for ``true'' is written \schtrue{}. In
most places where a truth value is expected, however, any object different from
\schfalse{} counts as true.
\paragraph{Numbers}
\mainindex{number}Scheme supports a rich variety of numerical data types, including
objects representing integers of arbitrary precision, rational numbers, complex numbers, and
inexact numbers of various kinds. Chapter~\ref{numbertypeschapter} gives an
overview of the structure of Scheme's numerical tower.
\paragraph{Characters}
\mainindex{character}Scheme characters mostly correspond to textual characters.
More precisely, they are isomorphic to the \textit{scalar values} of
the Unicode standard.
\paragraph{Strings}
\mainindex{string}Strings are finite sequences of characters with fixed length and thus
represent arbitrary Unicode texts.
\paragraph{Symbols}
\mainindex{symbol}A symbol is an object representing a string,
the symbol's \textit{name}.
Unlike strings, two symbols whose names are spelled the same
way are never distinguishable. Symbols are useful for many applications;
for instance, they may be used the way enumerated values are used in
other languages.
\paragraph{Pairs and lists}
\mainindex{pair}\mainindex{list}
A pair is a data structure with two components. The most common use
of pairs is to represent (singly linked) lists, where the first
component (the ``car'') represents the first element of the list, and
the second component (the ``cdr'') the rest of the list. Scheme also
has a distinguished empty list, which is the last cdr in a chain of
pairs that form a list.
\paragraph{Vectors}
\mainindex{vector}Vectors, like lists, are linear data structures
representing finite sequences of arbitrary objects.
Whereas the elements of a list are accessed
sequentially through the chain of pairs representing it,
the elements of a vector are addressed by integer indices.
Thus, vectors are more appropriate than
lists for random access to elements.
\paragraph{Procedures}
\mainindex{procedure}Procedures are values in Scheme.
\section{Expressions}
The most important elements of Scheme code are
\mainindex{expression}\textit{expressions}. Expressions can be
\textit{evaluated}, producing a \textit{value}. (Actually, any number
of values---see section~\ref{multiplereturnvaluessection}.) The most
fundamental expressions are literal expressions:
\begin{scheme}
\schtrue{} \ev \schtrue
23 \ev 23%
\end{scheme}
This notation means that the expression \schtrue{} evaluates to
\schtrue{}, that is, the value for ``true'', and that the expression
{\cf 23} evaluates to a number object representing the number 23.
Compound expressions are formed by placing parentheses around their
subexpressions. The first subexpression identifies an operation; the
remaining subexpressions are operands to the operation:
%
\begin{scheme}
(+ 23 42) \ev 65
(+ 14 (* 23 42)) \ev 980%
\end{scheme}
%
In the first of these examples, {\cf +} is the name of
the built-in operation for addition, and {\cf 23} and {\cf 42} are the
operands. The expression {\cf (+ 23 42)} reads as ``the sum of 23 and
42''. Compound expressions can be nested---the second example reads
as ``the sum of 14 and the product of 23 and 42''.
As these examples indicate, compound expressions in Scheme are always
written using the same prefix notation\mainindex{prefix notation}. As
a consequence, the parentheses are needed to indicate structure.
Consequently, ``superfluous'' parentheses, which are often permissible in
mathematical notation and also in many programming languages, are not
allowed in Scheme.
As in many other languages, whitespace (including line endings) is not
significant when it separates subexpressions of an expression, and
can be used to indicate structure.
\section{Variables and binding}
\mainindex{variable}\mainindex{binding}\mainindex{identifier}Scheme
allows identifiers to stand for locations containing values.
These identifiers are called variables. In many cases, specifically
when the location's value is never modified after its creation, it is
useful to think of the variable as standing for the value directly.
\begin{scheme}
(let ((x 23)
(y 42))
(+ x y)) \ev 65%
\end{scheme}
In this case, the expression starting with {\cf let} is a binding
construct. The parenthesized structure following the {\cf let} lists
variables alongside expressions: the variable {\cf x} alongside {\cf
23}, and the variable {\cf y} alongside {\cf 42}. The {\cf let}
expression binds {\cf x} to 23, and {\cf y} to 42. These bindings are
available in the \textit{body} of the {\cf let} expression, {\cf (+ x
y)}, and only there.
\section{Definitions}
\index{definition}The variables bound by a {\cf let} expression
are \textit{local}, because their bindings are visible only in the
{\cf let}'s body. Scheme also allows creating top-level bindings for
identifiers as follows:
\begin{scheme}
(define x 23)
(define y 42)
(+ x y) \ev 65%
\end{scheme}
(These are actually ``top-level'' in the body of a top-level program or library;
see section~\ref{librariesintrosection} below.)
The first two parenthesized structures are \textit{definitions}; they
create top-level bindings, binding {\cf x} to 23 and {\cf y} to 42.
Definitions are not expressions, and cannot appear in all places
where an expression can occur. Moreover, a definition has no value.
Bindings follow the lexical structure of the program: When several
bindings with the same name exist, a variable refers to the binding
that is closest to it, starting with its occurrence in the program
and going from inside to outside, and referring to a top-level
binding if no
local binding can be found along the way:
\begin{scheme}
(define x 23)
(define y 42)
(let ((y 43))
(+ x y)) \ev 66
(let ((y 43))
(let ((y 44))
(+ x y))) \ev 67%
\end{scheme}
\section{Forms}
While definitions are not expressions, compound expressions and
definitions exhibit similar syntactic structure:
%
\begin{scheme}
(define x 23)
(* x 2)%
\end{scheme}
%
While the first line contains a definition, and the second an
expression, this distinction depends on the bindings for {\cf define}
and {\cf *}. At the purely syntactical level, both are
\textit{forms}\index{form}, and \textit{form} is the general name for
a syntactic part of a Scheme program. In particular, {\cf 23} is a
\textit{subform}\index{subform} of the form {\cf (define x 23)}.
\section{Procedures}
\label{proceduressection}
\index{procedure}Definitions can also be used to define
procedures:
\begin{scheme}
(define (f x)
(+ x 42))
(f 23) \ev 65%
\end{scheme}
A procedure is, slightly simplified, an abstraction of an
expression over objects. In the example, the first definition defines a procedure
called {\cf f}. (Note the parentheses around {\cf f x}, which
indicate that this is a procedure definition.) The expression {\cf (f
23)} is a \index{procedure call}procedure call, meaning,
roughly, ``evaluate {\cf (+ x 42)} (the body of the procedure) with
{\cf x} bound to 23''.
As procedures are objects, they can be passed to other
procedures:
%
\begin{scheme}
(define (f x)
(+ x 42))
(define (g p x)
(p x))
(g f 23) \ev 65%
\end{scheme}
In this example, the body of {\cf g} is evaluated with {\cf p}
bound to {\cf f} and {\cf x} bound to 23, which is equivalent
to {\cf (f 23)}, which evaluates to 65.
In fact, many predefined operations of Scheme are provided not by
syntax, but by variables whose values are procedures.
The {\cf +} operation, for example, which receives
special syntactic treatment in many other languages, is just a regular
identifier in Scheme, bound to a procedure that adds number objects. The
same holds for {\cf *} and many others:
\begin{scheme}
(define (h op x y)
(op x y))
(h + 23 42) \ev 65
(h * 23 42) \ev 966%
\end{scheme}
Procedure definitions are not the only way to create procedures. A
{\cf lambda} expression creates a new procedure as an object, with no
need to specify a name:
\begin{scheme}
((lambda (x) (+ x 42)) 23) \ev 65%
\end{scheme}
The entire expression in this example is a procedure call; {\cf
(lambda (x) (+ x 42))}, evaluates to a procedure that takes a single
number object and adds 42 to it.
\section{Procedure calls and syntactic keywords}
Whereas {\cf (+ 23 42)}, {\cf (f 23)}, and {\cf ((lambda (x) (+ x 42))
23)} are all examples of procedure calls, {\cf lambda} and {\cf
let} expressions are not. This is because {\cf let}, even though
it is an identifier, is not a variable, but is instead a \textit{syntactic
keyword}\index{syntactic keyword}. A form that has a
syntactic keyword as its first subexpression obeys special rules determined by
the keyword. The {\cf define} identifier in a definition is also a
syntactic keyword. Hence, definitions are also not procedure calls.
The rules for the {\cf lambda} keyword specify that the first
subform is a list of parameters, and the remaining subforms are the body of
the procedure. In {\cf let} expressions, the first subform is a list
of binding specifications, and the remaining subforms constitute a body of
expressions.
Procedure calls can generally be distinguished from these
\textit{special forms}\mainindex{special form} by
looking for a syntactic keyword in the first position of an
form: if the first position does not contain a syntactic keyword, the expression
is a procedure call.
(So-called \textit{identifier macros} allow creating other kinds of
special forms, but are comparatively rare.)
The set of syntactic keywords of Scheme is
fairly small, which usually makes this task fairly simple.
It is possible, however, to create new bindings for syntactic keywords; see
section~\ref{macrosintrosection} below.
\section{Assignment}
Scheme variables bound by definitions or {\cf let} or {\cf lambda}
expressions are not actually bound directly to the objects specified in the
respective bindings, but to locations containing these objects. The
contents of these locations can subsequently be modified destructively
via \textit{assignment}\index{assignment}:
%
\begin{scheme}
(let ((x 23))
(set! x 42)
x) \ev 42%
\end{scheme}
In this case, the body of the {\cf let} expression consists of two
expressions which are evaluated sequentially, with the value of the
final expression becoming the value of the entire {\cf let}
expression. The expression {\cf (set! x 42)} is an assignment, saying
``replace the object in the location referenced by {\cf x} with 42''.
Thus, the previous value of {\cf x}, 23, is replaced by 42.
\section{Derived forms and macros}
\label{macrosintrosection}
Many of the special forms specified in this report
can be translated into more basic special forms.
For example, a {\cf let} expression can be translated
into a procedure call and a {\cf lambda} expression. The following two
expressions are equivalent:
%
\begin{scheme}
(let ((x 23)
(y 42))
(+ x y)) \ev 65
((lambda (x y) (+ x y)) 23 42) \lev 65%
\end{scheme}
Special forms like {\cf let} expressions are called \textit{derived
forms}\index{derived form} because their semantics can be
derived from that of other kinds of forms by a syntactic
transformation. Some procedure definitions are also derived forms. The
following two definitions are equivalent:
\begin{scheme}
(define (f x)
(+ x 42))
(define f
(lambda (x)
(+ x 42)))%
\end{scheme}
In Scheme, it is possible for a program to create its own derived
forms by binding syntactic keywords to macros\index{macro}:
\begin{scheme}
(define-syntax def
(syntax-rules ()
((def f (p ...) body)
(define (f p ...)
body))))
(def f (x)
(+ x 42))%
\end{scheme}
The {\cf define-syntax} construct specifies that a parenthesized
structure matching the pattern {\cf (def f (p ...) body)}, where {\cf
f}, {\cf p}, and {\cf body} are pattern variables, is translated to
{\cf (define (f p ...) body)}. Thus, the {\cf def} form appearing in
the example gets translated to:
\begin{scheme}
(define (f x)
(+ x 42))%
\end{scheme}
The ability to create new syntactic keywords makes Scheme extremely
flexible and expressive, allowing many of the features
built into other languages to be derived forms in Scheme.
\section{Syntactic data and datum values}
A subset of the Scheme objects is called \textit{datum
values}\index{datum value}.
These include booleans, number objects, characters, symbols,
and strings as well as lists and vectors whose elements are data. Each
datum value may be represented in textual form as a
\textit{syntactic datum}\index{syntactic datum}, which can be written out
and read back in without loss of information.
A datum value may be represented by several different syntactic data.
Moreover, each datum value
can be trivially translated to a literal expression in a program by
prepending a {\cf\singlequote} to a corresponding syntactic datum:
\begin{scheme}
'23 \ev 23
'\schtrue{} \ev \schtrue{}
'foo \ev foo
'(1 2 3) \ev (1 2 3)
'\#(1 2 3) \ev \#(1 2 3)%
\end{scheme}
The {\cf\singlequote} shown in the previous examples
is not needed for representations of number objects or booleans.
The syntactic datum {\cf foo} represents a
symbol with name ``foo'', and {\cf 'foo} is a literal expression with
that symbol as its value. {\cf (1 2 3)} is a syntactic datum that
represents a list with elements 1, 2, and 3, and {\cf '(1 2 3)} is a literal
expression with this list as its value. Likewise, {\cf \#(1 2 3)}
is a syntactic datum that represents a vector with elements 1, 2 and 3, and
{\cf '\#(1 2 3)} is the corresponding literal.
The syntactic data are a superset of the Scheme forms. Thus, data
can be used to represent Scheme forms as data objects. In
particular, symbols can be used to represent identifiers.
\begin{scheme}
'(+ 23 42) \ev (+ 23 42)
'(define (f x) (+ x 42)) \lev (define (f x) (+ x 42))%
\end{scheme}
This facilitates writing programs that operate on Scheme source code,
in particular interpreters and program transformers.
\section{Continuations}
Whenever a Scheme expression is evaluated there is a
\textit{continuation}\index{continuation} wanting the result of the
expression. The continuation represents an entire (default) future
for the computation. For example, informally the continuation of {\cf 3}
in the expression
%
\begin{scheme}
(+ 1 3)%
\end{scheme}
%
adds 1 to it. Normally these ubiquitous continuations are hidden
behind the scenes and programmers do not think much about them. On
rare occasions, however, a programmer may need to deal with
continuations explicitly. The {\cf call-with-current-continuation}
procedure (see section~\ref{call-with-current-continuation}) allows
Scheme programmers to do that by creating a procedure that reinstates
the current continuation. The {\cf call-with-current-continuation}
procedure accepts a procedure, calls it immediately with an argument
that is an \textit{escape procedure}\index{escape procedure}. This
escape procedure can then be called with an argument that becomes the
result of the call to {\cf call-with-current-continuation}. That is,
the escape procedure abandons its own continuation, and reinstates the
continuation of the call to {\cf call-with-current-continuation}.
In the following example, an escape procedure representing the
continuation that adds 1 to its argument is bound to {\cf escape}, and
then called with 3 as an argument. The continuation of the call to
{\cf escape} is abandoned, and instead the 3 is passed to the
continuation that adds 1:
%
\begin{scheme}
(+ 1 (call-with-current-continuation
(lambda (escape)
(+ 2 (escape 3))))) \lev 4%
\end{scheme}
%
An escape procedure has unlimited extent: It can be called after the
continuation it captured has been invoked, and it can be called
multiple times. This makes {\cf call-with-current-continuation}
significantly more powerful than typical non-local control constructs
such as exceptions in other languages.
\section{Libraries}
\label{librariesintrosection}
Scheme code can be organized in components called
\textit{libraries}\index{library}. Each library contains
definitions and expressions. It can import definitions
from other libraries and export definitions to other libraries.
The following library called {\cf (hello)} exports a definition called
{\cf hello-world}, and imports the base library (see
chapter~\ref{baselibrarychapter}) and the simple I/O library (see
library section~\extref{lib:simpleiosection}{Simple I/O}). The {\cf
hello-world} export is a procedure that displays {\cf Hello World}
on a separate line:
%
\begin{scheme}
(library (hello)
(export hello-world)
(import (rnrs base)
(rnrs io simple))
(define (hello-world)
(display "Hello World")
(newline)))%
\end{scheme}
\section{Top-level programs}
A Scheme program is invoked via a \textit{top-level
program}\index{top-level program}. Like a library, a top-level
program contains imports, definitions and expressions, and specifies
an entry point for execution. Thus a top-level program defines, via
the transitive closure of the libraries it imports, a Scheme program.
The following top-level program obtains the first argument from the command line
via the {\cf command-line} procedure from the \rsixlibrary{programs}
library (see library chapter~\extref{lib:programlibchapter}{Command-line
access and exit values}). It then opens the file using {\cf
open-file-input-port} (see library section~\extref{lib:portsiosection}),
yielding a \textit{port}, i.e.\ a connection to the file as a data
source, and calls the {\cf get-bytes-all} procedure to obtain the
contents of the file as binary data. It then uses {\cf put-bytes} to
output the contents of the file to standard output:
%
\begin{scheme}
\#!r6rs
(import (rnrs base)
(rnrs io ports)
(rnrs programs))
(put-bytes (standard-output-port)
(call-with-port
(open-file-input-port
(cadr (command-line)))
get-bytes-all))%
\end{scheme}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "r6rs"
%%% End:
|