File: struct.tex

package info (click to toggle)
r6rs-doc 1.0-1
links: PTS, VCS
area: main
in suites: squeeze
size: 6,868 kB
ctags: 2,046
sloc: lisp: 5,409; makefile: 190
file content (586 lines) | stat: -rw-r--r-- 22,376 bytes
\chapter{Overview of Scheme}
\label{semanticchapter}

This chapter gives an overview of Scheme's semantics.
The purpose of this overview is to explain
enough about the basic concepts of the language to facilitate
understanding of the subsequent chapters of the report, which are
organized as a reference manual.  Consequently, this overview is
not a complete introduction to the language, nor is it precise
in all respects or normative in any way.

\vest Following Algol, Scheme is a statically scoped programming
language.  Each use of a variable is associated with a lexically
apparent binding of that variable.

\vest Scheme has latent as opposed to manifest types
\cite{WaiteGoos}.  Types
are associated with objects\mainindex{object} (also called values) rather than
with variables.  (Some authors refer to languages with latent types as
untyped, weakly typed or dynamically typed languages.)  Other languages with
latent types are Python, Ruby, Smalltalk, and other dialects of Lisp.  Languages
with manifest types (sometimes referred to as strongly typed or
statically typed languages) include Algol 60, C, C\#, Java, Haskell, and ML.

\vest All objects created in the course of a Scheme computation, including
procedures and continuations, have unlimited extent.
No Scheme object is ever destroyed.  The reason that
implementations of Scheme do not (usually!)\ run out of storage is that
they are permitted to reclaim the storage occupied by an object if
they can prove that the object cannot possibly matter to any future
computation.  Other languages in which most objects have unlimited
extent include C\#, Java, Haskell, most Lisp dialects, ML, Python,
Ruby, and Smalltalk.

Implementations of Scheme must be properly tail-recursive.
This allows the execution of an iterative computation in constant space,
even if the iterative computation is described by a syntactically
recursive procedure.  Thus with a properly tail-recursive implementation,
iteration can be expressed using the ordinary procedure-call
mechanics, so that special iteration constructs are useful only as
syntactic sugar.

\vest Scheme was one of the first languages to support procedures as
objects in their own right.  Procedures can be created dynamically,
stored in data structures, returned as results of procedures, and so
on.  Other languages with these properties include Common Lisp,
Haskell, ML, Ruby, and Smalltalk.

\vest One distinguishing feature of Scheme is that continuations, which
in most other languages only operate behind the scenes, also have
``first-class'' status.  First-class continuations are useful for implementing a
wide variety of advanced control constructs, including non-local exits,
backtracking, and coroutines.

In Scheme, the argument expressions of a procedure call are evaluated
before the procedure gains control, whether the procedure needs the
result of the evaluation or not.  C, C\#, Common Lisp, Python,
Ruby, and Smalltalk are other languages that always evaluate argument
expressions before invoking a procedure.  This is distinct from the
lazy-evaluation semantics of Haskell, or the call-by-name semantics of
Algol 60, where an argument expression is not evaluated unless its
value is needed by the procedure.

Scheme's model of arithmetic provides a rich set of numerical types
and operations on them.  Furthermore, it distinguishes \textit{exact}
and \textit{inexact} number objects: Essentially, an exact number
object corresponds to a number exactly, and an inexact number object
is the result of a computation that involved rounding or other errors.

\section{Basic types}

Scheme programs manipulate \textit{objects}, which are also referred
to as \textit{values}.
Scheme objects are organized into sets of values called \textit{types}.
This section gives an overview of the fundamentally important types of the
Scheme language.  More types are described in later chapters.

\begin{note}
  As Scheme is latently typed, the use of the term \textit{type} in
  this report differs from the use of the term in the context of other
  languages, particularly those with manifest typing.
\end{note}

\paragraph{Booleans}

\mainindex{boolean}A boolean is a truth value, and can be either
true or false.  In Scheme, the object for ``false'' is written
\schfalse{}.  The object for ``true'' is written \schtrue{}.  In
most places where a truth value is expected, however, any object different from
\schfalse{} counts as true.

\paragraph{Numbers}

\mainindex{number}Scheme supports a rich variety of numerical data types, including
objects representing integers of arbitrary precision, rational numbers, complex numbers, and
inexact numbers of various kinds.  Chapter~\ref{numbertypeschapter} gives an
overview of the structure of Scheme's numerical tower.

\paragraph{Characters}

\mainindex{character}Scheme characters mostly correspond to textual characters.
More precisely, they are isomorphic to the \textit{scalar values} of
the Unicode standard.

\paragraph{Strings}

\mainindex{string}Strings are finite sequences of characters with fixed length and thus
represent arbitrary Unicode texts.

\paragraph{Symbols}

\mainindex{symbol}A symbol is an object representing a string,
the symbol's \textit{name}.
Unlike strings, two symbols whose names are spelled the same
way are never distinguishable.  Symbols are useful for many applications;
for instance, they may be used the way enumerated values are used in
other languages.

\paragraph{Pairs and lists}

\mainindex{pair}\mainindex{list}
A pair is a data structure with two components.  The most common use
of pairs is to represent (singly linked) lists, where the first
component (the ``car'') represents the first element of the list, and
the second component (the ``cdr'') the rest of the list.  Scheme also
has a distinguished empty list, which is the last cdr in a chain of
pairs that form a list.

\paragraph{Vectors}

\mainindex{vector}Vectors, like lists, are linear data structures
representing finite sequences of arbitrary objects.
Whereas the elements of a list are accessed
sequentially through the chain of pairs representing it,
the elements of a vector are addressed by integer indices.
Thus, vectors are more appropriate than
lists for random access to elements.

\paragraph{Procedures}

\mainindex{procedure}Procedures are values in Scheme.

\section{Expressions}

The most important elements of Scheme code are
\mainindex{expression}\textit{expressions}.  Expressions can be
\textit{evaluated}, producing a \textit{value}.  (Actually, any number
of values---see section~\ref{multiplereturnvaluessection}.)  The most
fundamental expressions are literal expressions:

\begin{scheme}
\schtrue{} \ev \schtrue
23 \ev 23%
\end{scheme}

This notation means that the expression \schtrue{} evaluates to
\schtrue{}, that is, the value for ``true'',  and that the expression
{\cf 23} evaluates to a number object representing the number 23.

Compound expressions are formed by placing parentheses around their
subexpressions.  The first subexpression identifies an operation; the
remaining subexpressions are operands to the operation:
%
\begin{scheme}
(+ 23 42) \ev 65
(+ 14 (* 23 42)) \ev 980%
\end{scheme}
%
In the first of these examples, {\cf +} is the name of
the built-in operation for addition, and {\cf 23} and {\cf 42} are the
operands.  The expression {\cf (+ 23 42)} reads as ``the sum of 23 and
42''.  Compound expressions can be nested---the second example reads
as ``the sum of 14 and the product of 23 and 42''.

As these examples indicate, compound expressions in Scheme are always
written using the same prefix notation\mainindex{prefix notation}.  As
a consequence, the parentheses are needed to indicate structure.
Consequently, ``superfluous'' parentheses, which are often permissible in
mathematical notation and also in many programming languages, are not
allowed in Scheme.

As in many other languages, whitespace (including line endings) is not
significant when it separates subexpressions of an expression, and
can be used to indicate structure.

\section{Variables and binding}

\mainindex{variable}\mainindex{binding}\mainindex{identifier}Scheme
allows identifiers to stand for locations containing values.
These identifiers are called variables.  In many cases, specifically
when the location's value is never modified after its creation, it is
useful to think of the variable as standing for the value directly.

\begin{scheme}
(let ((x 23)
      (y 42))
  (+ x y)) \ev 65%
\end{scheme}

In this case, the expression starting with {\cf let} is a binding
construct.  The parenthesized structure following the {\cf let} lists
variables alongside expressions: the variable {\cf x} alongside {\cf
  23}, and the variable {\cf y} alongside {\cf 42}.  The {\cf let}
expression binds {\cf x} to 23, and {\cf y} to 42.  These bindings are
available in the \textit{body} of the {\cf let} expression, {\cf (+ x
  y)}, and only there.

\section{Definitions}

\index{definition}The variables bound by a {\cf let} expression
are \textit{local}, because their bindings are visible only in the
{\cf let}'s body.  Scheme also allows creating top-level bindings for
identifiers as follows:

\begin{scheme}
(define x 23)
(define y 42)
(+ x y) \ev 65%
\end{scheme}

(These are actually ``top-level'' in the body of a top-level program or library;
see section~\ref{librariesintrosection} below.)

The first two parenthesized structures are \textit{definitions}; they
create top-level bindings, binding {\cf x} to 23 and {\cf y} to 42.
Definitions are not expressions, and cannot appear in all places
where an expression can occur.  Moreover, a definition has no value.

Bindings follow the lexical structure of the program:  When several
bindings with the same name exist, a variable refers to the binding
that is closest to it, starting with its occurrence in the program
and going from inside to outside, and referring to a top-level
binding if no
local binding can be found along the way:

\begin{scheme}
(define x 23)
(define y 42)
(let ((y 43))
  (+ x y)) \ev 66

(let ((y 43))
  (let ((y 44))
    (+ x y))) \ev 67%
\end{scheme}

\section{Forms}

While definitions are not expressions, compound expressions and
definitions exhibit similar syntactic structure:
%
\begin{scheme}
(define x 23)
(* x 2)%
\end{scheme}
%
While the first line contains a definition, and the second an
expression, this distinction depends on the bindings for {\cf define}
and {\cf *}.  At the purely syntactical level, both are
\textit{forms}\index{form}, and \textit{form} is the general name for
a syntactic part of a Scheme program.  In particular, {\cf 23} is a
\textit{subform}\index{subform} of the form {\cf (define x 23)}.

\section{Procedures}
\label{proceduressection}

\index{procedure}Definitions can also be used to define
procedures:

\begin{scheme}
(define (f x)
  (+ x 42))

(f 23) \ev 65%
\end{scheme}

A procedure is, slightly simplified, an abstraction of an
expression over objects.  In the example, the first definition defines a procedure
called {\cf f}.  (Note the parentheses around {\cf f x}, which
indicate that this is a procedure definition.)  The expression {\cf (f
  23)} is a \index{procedure call}procedure call, meaning,
roughly, ``evaluate {\cf (+ x 42)} (the body of the procedure) with
{\cf x} bound to 23''.

As procedures are objects, they can be passed to other
procedures:
%
\begin{scheme}
(define (f x)
  (+ x 42))

(define (g p x)
  (p x))

(g f 23) \ev 65%
\end{scheme}

In this example, the body of {\cf g} is evaluated with {\cf p}
bound to {\cf f} and {\cf x} bound to 23, which is equivalent
to {\cf (f 23)}, which evaluates to 65.

In fact, many predefined operations of Scheme are provided not by
syntax, but by variables whose values are procedures.
The {\cf +} operation, for example, which receives
special syntactic treatment in many other languages, is just a regular
identifier in Scheme, bound to a procedure that adds number objects.  The
same holds for {\cf *} and many others:

\begin{scheme}
(define (h op x y)
  (op x y))

(h + 23 42) \ev 65
(h * 23 42) \ev 966%
\end{scheme}

Procedure definitions are not the only way to create procedures.  A
{\cf lambda} expression creates a new procedure as an object, with no
need to specify a name:

\begin{scheme}
((lambda (x) (+ x 42)) 23) \ev 65%
\end{scheme}

The entire expression in this example is a procedure call; {\cf
  (lambda (x) (+ x 42))}, evaluates to a procedure that takes a single
number object and adds 42 to it.

\section{Procedure calls and syntactic keywords}

Whereas {\cf (+ 23 42)}, {\cf (f 23)}, and {\cf ((lambda (x) (+ x 42))
  23)} are all examples of procedure calls, {\cf lambda} and {\cf
  let} expressions are not.  This is because {\cf let}, even though
it is an identifier, is not a variable, but is instead a \textit{syntactic
  keyword}\index{syntactic keyword}.  A form that has a
syntactic keyword as its first subexpression obeys special rules determined by
the keyword.  The {\cf define} identifier in a definition is also a
syntactic keyword.  Hence, definitions are also not procedure calls.

The rules for the {\cf lambda} keyword specify that the first
subform is a list of parameters, and the remaining subforms are the body of
the procedure.  In {\cf let} expressions, the first subform is a list
of binding specifications, and the remaining subforms constitute a body of
expressions.

Procedure calls can generally be distinguished from these
\textit{special forms}\mainindex{special form} by
looking for a syntactic keyword in the first position of an
form: if the first position does not contain a syntactic keyword, the expression
is a procedure call.  
(So-called \textit{identifier macros} allow creating other kinds of
special forms, but are comparatively rare.)
The set of syntactic keywords of Scheme is
fairly small, which usually makes this task fairly simple.
It is possible, however, to create new bindings for syntactic keywords; see
section~\ref{macrosintrosection} below.

\section{Assignment}

Scheme variables bound by definitions or {\cf let} or {\cf lambda}
expressions are not actually bound directly to the objects specified in the
respective bindings, but to locations containing these objects.  The
contents of these locations can subsequently be modified destructively
via \textit{assignment}\index{assignment}:
%
\begin{scheme}
(let ((x 23))
  (set! x 42)
  x) \ev 42%
\end{scheme}

In this case, the body of the {\cf let} expression consists of two
expressions which are evaluated sequentially, with the value of the
final expression becoming the value of the entire {\cf let}
expression.  The expression {\cf (set! x 42)} is an assignment, saying
``replace the object in the location referenced by {\cf x} with 42''.
Thus, the previous value of {\cf x}, 23, is replaced by 42.

\section{Derived forms and macros}
\label{macrosintrosection}

Many of the special forms specified in this report
can be translated into more basic special forms.
For example, a {\cf let} expression can be translated
into a procedure call and a {\cf lambda} expression.  The following two
expressions are equivalent:
%
\begin{scheme}
(let ((x 23)
      (y 42))
  (+ x y)) \ev 65

((lambda (x y) (+ x y)) 23 42) \lev 65%
\end{scheme}

Special forms like {\cf let} expressions are called \textit{derived
  forms}\index{derived form} because their semantics can be
derived from that of other kinds of forms by a syntactic
transformation.  Some procedure definitions are also derived forms.  The
following two definitions are equivalent:

\begin{scheme}
(define (f x)
  (+ x 42))

(define f
  (lambda (x)
    (+ x 42)))%
\end{scheme}

In Scheme, it is possible for a program to create its own derived
forms by binding syntactic keywords to macros\index{macro}:

\begin{scheme}
(define-syntax def
  (syntax-rules ()
    ((def f (p ...) body)
     (define (f p ...)
       body))))

(def f (x)
  (+ x 42))%
\end{scheme}

The {\cf define-syntax} construct specifies that a parenthesized
structure matching the pattern {\cf (def f (p ...) body)}, where {\cf
  f}, {\cf p}, and {\cf body} are pattern variables, is translated to
{\cf (define (f p ...) body)}.  Thus, the {\cf def} form appearing in
the example gets translated to:

\begin{scheme}
(define (f x)
  (+ x 42))%
\end{scheme}

The ability to create new syntactic keywords makes Scheme extremely
flexible and expressive, allowing many of the features
built into other languages to be derived forms in Scheme.

\section{Syntactic data and datum values}

A subset of the Scheme objects is called \textit{datum
  values}\index{datum value}. 
These include booleans, number objects, characters, symbols,
and strings as well as lists and vectors whose elements are data.  Each
datum value may be represented in textual form as a
\textit{syntactic datum}\index{syntactic datum}, which can be written out
and read back in without loss of information.
A datum value may be represented by several different syntactic data.
Moreover, each datum value
can be trivially translated to a literal expression in a program by
prepending a {\cf\singlequote} to a corresponding syntactic datum:

\begin{scheme}
'23 \ev 23
'\schtrue{} \ev \schtrue{}
'foo \ev foo
'(1 2 3) \ev (1 2 3)
'\#(1 2 3) \ev \#(1 2 3)%
\end{scheme}

The {\cf\singlequote} shown in the previous examples
is not needed for representations of number objects or booleans.
The syntactic datum {\cf foo} represents a
symbol with name ``foo'', and {\cf 'foo} is a literal expression with
that symbol as its value.  {\cf (1 2 3)} is a syntactic datum that 
represents a list with elements 1, 2, and 3, and {\cf '(1 2 3)} is a literal
expression with this list as its value.  Likewise, {\cf \#(1 2 3)}
is a syntactic datum that represents a vector with elements 1, 2 and 3, and
{\cf '\#(1 2 3)} is the corresponding literal.

The syntactic data are a superset of the Scheme forms.  Thus, data
can be used to represent Scheme forms as data objects.  In
particular, symbols can be used to represent identifiers.

\begin{scheme}
'(+ 23 42) \ev (+ 23 42)
'(define (f x) (+ x 42)) \lev (define (f x) (+ x 42))%
\end{scheme}

This facilitates writing programs that operate on Scheme source code,
in particular interpreters and program transformers.

\section{Continuations}

Whenever a Scheme expression is evaluated there is a
\textit{continuation}\index{continuation} wanting the result of the
expression.  The continuation represents an entire (default) future
for the computation.  For example, informally the continuation of {\cf 3}
in the expression
%
\begin{scheme}
(+ 1 3)%
\end{scheme}
%
adds 1 to it.  Normally these ubiquitous continuations are hidden
behind the scenes and programmers do not think much about them.  On
rare occasions, however, a programmer may need to deal with
continuations explicitly.  The {\cf call-with-current-continuation}
procedure (see section~\ref{call-with-current-continuation}) allows
Scheme programmers to do that by creating a procedure that reinstates
the current continuation.  The {\cf call-with-current-continuation}
procedure accepts a procedure, calls it immediately with an argument
that is an \textit{escape procedure}\index{escape procedure}.  This
escape procedure can then be called with an argument that becomes the
result of the call to {\cf call-with-current-continuation}.  That is,
the escape procedure abandons its own continuation, and reinstates the
continuation of the call to {\cf call-with-current-continuation}.

In the following example, an escape procedure representing the
continuation that adds 1 to its argument is bound to {\cf escape}, and
then called with 3 as an argument.  The continuation of the call to
{\cf escape} is abandoned, and instead the 3 is passed to the
continuation that adds 1:
%
\begin{scheme}
(+ 1 (call-with-current-continuation
       (lambda (escape)
         (+ 2 (escape 3))))) \lev 4%
\end{scheme}
%
An escape procedure has unlimited extent: It can be called after the
continuation it captured has been invoked, and it can be called
multiple times.  This makes {\cf call-with-current-continuation}
significantly more powerful than typical non-local control constructs
such as exceptions in other languages.

\section{Libraries}
\label{librariesintrosection}

Scheme code can be organized in components called
\textit{libraries}\index{library}.  Each library contains 
definitions and expressions.  It can import definitions
from other libraries and export definitions to other libraries.

The following library called {\cf (hello)} exports a definition called
{\cf hello-world},  and imports the base library (see
chapter~\ref{baselibrarychapter}) and the simple I/O library (see
library section~\extref{lib:simpleiosection}{Simple I/O}).  The {\cf
  hello-world} export is a procedure that displays {\cf Hello World}
on a separate line:
%
\begin{scheme}
(library (hello)
  (export hello-world)
  (import (rnrs base)
          (rnrs io simple))
  (define (hello-world)
    (display "Hello World")
    (newline)))%
\end{scheme}

\section{Top-level programs}

A Scheme program is invoked via a \textit{top-level
  program}\index{top-level program}.  Like a library, a top-level
program contains imports, definitions and expressions, and specifies
an entry point for execution.  Thus a top-level program defines, via
the transitive closure of the libraries it imports, a Scheme program.

The following top-level program obtains the first argument from the command line
via the {\cf command-line} procedure from the \rsixlibrary{programs}
library (see library chapter~\extref{lib:programlibchapter}{Command-line
  access and exit values}).  It then opens the file using {\cf
  open-file-input-port} (see library section~\extref{lib:portsiosection}),
yielding a \textit{port}, i.e.\ a connection to the file as a data
source, and calls the {\cf get-bytes-all} procedure to obtain the
contents of the file as binary data.  It then uses {\cf put-bytes} to
output the contents of the file to standard output:
%
\begin{scheme}
\#!r6rs
(import (rnrs base)
        (rnrs io ports)
        (rnrs programs))
(put-bytes (standard-output-port)
           (call-with-port
               (open-file-input-port
                 (cadr (command-line)))
             get-bytes-all))%
\end{scheme}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "r6rs"
%%% End: