1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322
|
%\vfill\eject
\chapter{Basic concepts}
\label{basicchapter}
\section{Variables, syntactic keywords, and regions}
\label{specialformsection}
\label{variablesection}
An identifier\index{identifier} may name a type of syntax, or it may name
a location where a value can be stored. An identifier that names a type
of syntax is called a {\em syntactic keyword}\mainindex{syntactic keyword}
and is said to be {\em bound} to that syntax. An identifier that names a
location is called a {\em variable}\mainindex{variable} and is said to be
{\em bound} to that location. The set of all visible
bindings\mainindex{binding} in effect at some point in a program is
known as the {\em environment} in effect at that point. The value
stored in the location to which a variable is bound is called the
variable's value. By abuse of terminology, the variable is sometimes
said to name the value or to be bound to the value. This is not quite
accurate, but confusion rarely results from this practice.
\todo{Define ``assigned'' and ``unassigned'' perhaps?}
\todo{In programs without side effects, one can safely pretend that the
variables are bound directly to the arguments. Or:
In programs without \ide{set!}, one can safely pretend that the
variable is bound directly to the value. }
\vest Certain expression types are used to create new kinds of syntax
and bind syntactic keywords to those new syntaxes, while other
expression types create new locations and bind variables to those
locations. These expression types are called {\em binding constructs}.
\mainindex{binding construct}
Those that bind syntactic keywords are listed in section~\ref{macrosection}.
The most fundamental of the variable binding constructs is the
{\cf lambda} expression, because all other variable binding constructs
can be explained in terms of {\cf lambda} expressions. The other
variable binding constructs are {\cf let}, {\cf let*}, {\cf letrec},
and {\cf do} expressions (see sections~\ref{lambda}, \ref{letrec}, and
\ref{do}).
%Note: internal definitions not mentioned here.
\vest Like Algol and Pascal, and unlike most other dialects of Lisp
except for Common Lisp, Scheme is a statically scoped language with
block structure. To each place where an identifier is bound in a program
there corresponds a \defining{region} of the program text within which
the binding is visible. The region is determined by the particular
binding construct that establishes the binding; if the binding is
established by a {\cf lambda} expression, for example, then its region
is the entire {\cf lambda} expression. Every mention of an identifier
refers to the binding of the identifier that established the
innermost of the regions containing the use. If there is no binding of
the identifier whose region contains the use, then the use refers to the
binding for the variable in the top level environment, if any
(chapters~\ref{expressionchapter} and \ref{initialenv}); if there is no
binding for the identifier,
it is said to be \defining{unbound}.\mainindex{bound}\index{top level
environment}
\todo{Mention that some implementations have multiple top level environments?}
\todo{Pitman sez: needs elaboration in case of {\tt(let ...)}}
\todo{Pitman asks: say something about vars created after scheme starts?
{\tt (define x 3) (define (f) x) (define (g) y) (define y 4)}
Clinger replies: The language was explicitly
designed to permit a view in which no variables are created after
Scheme starts. In files, you can scan out the definitions beforehand.
I think we're agreed on the principle that interactive use should
approximate that behavior as closely as possible, though we don't yet
agree on which programming environment provides the best approximation.}
\section{Disjointness of types}
\label{disjointness}
No object satisfies more than one of the following predicates:
\begin{scheme}
boolean? pair?
symbol? number?
char? string?
vector? port?
procedure?%
\end{scheme}
These predicates define the types {\em boolean}, {\em pair}, {\em
symbol}, {\em number}, {\em char} (or {\em character}), {\em string}, {\em
vector}, {\em port}, and {\em procedure}. The empty list is a special
object of its own type; it satisfies none of the above predicates.
\mainindex{type}\schindex{boolean?}\schindex{pair?}\schindex{symbol?}
\schindex{number?}\schindex{char?}\schindex{string?}\schindex{vector?}
\schindex{port?}\schindex{procedure?}\index{empty list}
Although there is a separate boolean type,
any Scheme value can be used as a boolean value for the purpose of a
conditional test. As explained in section~\ref{booleansection}, all
values count as true in such a test except for \schfalse{}.
% and possibly the empty list.
% The only value that is guaranteed to count as
% false is \schfalse{}. It is explicitly unspecified whether the empty list
% counts as true or as false.
This report uses the word ``true'' to refer to any
Scheme value except \schfalse{}, and the word ``false'' to refer to
\schfalse{}. \mainindex{true} \mainindex{false}
\section{External representations}
\label{externalreps}
An important concept in Scheme (and Lisp) is that of the {\em external
representation} of an object as a sequence of characters. For example,
an external representation of the integer 28 is the sequence of
characters ``{\tt 28}'', and an external representation of a list consisting
of the integers 8 and 13 is the sequence of characters ``{\tt(8 13)}''.
The external representation of an object is not necessarily unique. The
integer 28 also has representations ``{\tt \#e28.000}'' and ``{\tt\#x1c}'', and the
list in the previous paragraph also has the representations ``{\tt( 08 13
)}'' and ``{\tt(8 .\ (13 .\ ()))}'' (see section~\ref{listsection}).
Many objects have standard external representations, but some, such as
procedures, do not have standard representations (although particular
implementations may define representations for them).
An external representation may be written in a program to obtain the
corresponding object (see {\cf quote}, section~\ref{quote}).
External representations can also be used for input and output. The
procedure {\cf read} (section~\ref{read}) parses external
representations, and the procedure {\cf write} (section~\ref{write})
generates them. Together, they provide an elegant and powerful
input/output facility.
Note that the sequence of characters ``{\tt(+ 2 6)}'' is {\em not} an
external representation of the integer 8, even though it {\em is} an
expression evaluating to the integer 8; rather, it is an external
representation of a three-element list, the elements of which are the symbol
{\tt +} and the integers 2 and 6. Scheme's syntax has the property that
any sequence of characters that is an expression is also the external
representation of some object. This can lead to confusion, since it may
not be obvious out of context whether a given sequence of characters is
intended to denote data or program, but it is also a source of power,
since it facilitates writing programs such as interpreters and
compilers that treat programs as data (or vice versa).
The syntax of external representations of various kinds of objects
accompanies the description of the primitives for manipulating the
objects in the appropriate sections of chapter~\ref{initialenv}.
\section{Storage model}
\label{storagemodel}
Variables and objects such as pairs, vectors, and strings implicitly
denote locations\mainindex{location} or sequences of locations. A string, for
example, denotes as many locations as there are characters in the string.
(These locations need not correspond to a full machine word.) A new value may be
stored into one of these locations using the {\tt string-set!} procedure, but
the string continues to denote the same locations as before.
An object fetched from a location, by a variable reference or by
a procedure such as {\cf car}, {\cf vector-ref}, or {\cf string-ref}, is
equivalent in the sense of \ide{eqv?} % and \ide{eq?} ??
(section~\ref{equivalencesection})
to the object last stored in the location before the fetch.
Every location is marked to show whether it is in use.
No variable or object ever refers to a location that is not in use.
Whenever this report speaks of storage being allocated for a variable
or object, what is meant is that an appropriate number of locations are
chosen from the set of locations that are not in use, and the chosen
locations are marked to indicate that they are now in use before the variable
or object is made to denote them.
In many systems it is desirable for constants\index{constant} (i.e. the values of
literal expressions) to reside in read-only-memory. To express this, it is
convenient to imagine that every object that denotes locations is associated
with a flag telling whether that object is mutable\index{mutable} or
immutable\index{immutable}. In such systems literal constants and the strings
returned by \ide{symbol->string} are immutable objects, while all objects
created by the other procedures listed in this report are mutable. It is an
error to attempt to store a new value into a location that is denoted by an
immutable object.
\section{Proper tail recursion}
\label{proper tail recursion}
Implementations of Scheme are required to be
{\em properly tail-recursive}\mainindex{proper tail recursion}.
Procedure calls that occur in certain syntactic
contexts defined below are `tail calls'. A Scheme implementation is
properly tail-recursive if it supports an unbounded number of active
tail calls. A call is {\em active} if the called procedure may still
return. Note that this includes calls that may be returned from either
by the current continuation or by continuations captured earlier by
{\cf call-with-current-continuation} that are later invoked.
In the absence of captured continuations, calls could
return at most once and the active calls would be those that had not
yet returned.
A formal definition of proper tail recursion can be found
in~\cite{propertailrecursion}.
\begin{rationale}
Intuitively, no space is needed for an active tail call because the
continuation that is used in the tail call has the same semantics as the
continuation passed to the procedure containing the call. Although an improper
implementation might use a new continuation in the call, a return
to this new continuation would be followed immediately by a return
to the continuation passed to the procedure. A properly tail-recursive
implementation returns to that continuation directly.
Proper tail recursion was one of the central ideas in Steele and
Sussman's original version of Scheme. Their first Scheme interpreter
implemented both functions and actors. Control flow was expressed using
actors, which differed from functions in that they passed their results
on to another actor instead of returning to a caller. In the terminology
of this section, each actor finished with a tail call to another actor.
Steele and Sussman later observed that in their interpreter the code
for dealing with actors was identical to that for functions and thus
there was no need to include both in the language.
\end{rationale}
A {\em tail call}\mainindex{tail call} is a procedure call that occurs
in a {\em tail context}. Tail contexts are defined inductively. Note
that a tail context is always determined with respect to a particular lambda
expression.
\begin{itemize}
\item The last expression within the body of a lambda expression,
shown as \hyper{tail expression} below, occurs in a tail context.
\begin{grammar}%
(l\=ambda \meta{formals}
\>\arbno{\meta{definition}} \arbno{\meta{expression}} \meta{tail expression})
\end{grammar}%
\item If one of the following expressions is in a tail context,
then the subexpressions shown as \meta{tail expression} are in a tail context.
These were derived from rules in the grammar given in
chapter~\ref{formalchapter} by replacing some occurrences of \meta{expression}
with \meta{tail expression}. Only those rules that contain tail contexts
are shown here.
\begin{grammar}%
(if \meta{expression} \meta{tail expression} \meta{tail expression})
(if \meta{expression} \meta{tail expression})
(cond \atleastone{\meta{cond clause}})
(cond \arbno{\meta{cond clause}} (else \meta{tail sequence}))
(c\=ase \meta{expression}
\>\atleastone{\meta{case clause}})
(c\=ase \meta{expression}
\>\arbno{\meta{case clause}}
\>(else \meta{tail sequence}))
(and \arbno{\meta{expression}} \meta{tail expression})
(or \arbno{\meta{expression}} \meta{tail expression})
(let (\arbno{\meta{binding spec}}) \meta{tail body})
(let \meta{variable} (\arbno{\meta{binding spec}}) \meta{tail body})
(let* (\arbno{\meta{binding spec}}) \meta{tail body})
(letrec (\arbno{\meta{binding spec}}) \meta{tail body})
(let-syntax (\arbno{\meta{syntax spec}}) \meta{tail body})
(letrec-syntax (\arbno{\meta{syntax spec}}) \meta{tail body})
(begin \meta{tail sequence})
(d\=o \=(\arbno{\meta{iteration spec}})
\> \>(\meta{test} \meta{tail sequence})
\>\arbno{\meta{expression}})
{\rm where}
\meta{cond clause} \: (\meta{test} \meta{tail sequence})
\meta{case clause} \: ((\arbno{\meta{datum}}) \meta{tail sequence})
\meta{tail body} \: \arbno{\meta{definition}} \meta{tail sequence}
\meta{tail sequence} \: \arbno{\meta{expression}} \meta{tail expression}
\end{grammar}%
\item
If a {\cf cond} expression is in a tail context, and has a clause of
the form {\cf (\hyperi{expression} => \hyperii{expression})}
then the (implied) call to
the procedure that results from the evaluation of \hyperii{expression} is in a
tail context. \hyperii{expression} itself is not in a tail context.
\end{itemize}
Certain built-in procedures are also required to perform tail calls.
The first argument passed to \ide{apply} and to
\ide{call-with-current-continuation}, and the second argument passed to
\ide{call-with-values}, must be called via a tail call.
Similarly, \ide{eval} must evaluate its argument as if it
were in tail position within the \ide{eval} procedure.
In the following example the only tail call is the call to {\cf f}.
None of the calls to {\cf g} or {\cf h} are tail calls. The reference to
{\cf x} is in a tail context, but it is not a call and thus is not a
tail call.
\begin{scheme}%
(lambda ()
(if (g)
(let ((x (h)))
x)
(and (g) (f))))
\end{scheme}%
\begin{note}
Implementations are allowed, but not required, to
recognize that some non-tail calls, such as the call to {\cf h}
above, can be evaluated as though they were tail calls.
In the example above, the {\cf let} expression could be compiled
as a tail call to {\cf h}. (The possibility of {\cf h} returning
an unexpected number of values can be ignored, because in that
case the effect of the {\cf let} is explicitly unspecified and
implementation-dependent.)
\end{note}
|