1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283
|
\section{MLTree Extensions} \label{sec:mltree-extension}
Pattern matching over the MLTREE intermediate representation
may not be sufficient to provide access to all the registers or
operations provided on a specific architecture. MLTREE extensions is a
method of extending the MLTREE intermediate language so that it is a
better match for the target architecture.
\subsection{Why Extensions}
Pattern matching over the MLTREE intermediate representation
may not be sufficient to provide access to all the registers or
operations provided on a specific architecture. MLTREE extensions is a
method of extending the MLTREE intermediate language so that it is a
better match for the target architecture.
For example there may be special registers to support the
increment-and-test operation on loop indices, or
support for complex mathematical functions such as
square root, or access to hardware specific registers such as the
current register window pointer on the SPARC architecture. It is not
usually possible to write expression trees that would directly
generate these instructions.
Some complex operations can be generated by performing a peephole
optimization over simpler instructions, however this is not always the
most convenient or simple thing to do.
\subsection{Cyclic Dependency}
The easiest way to provide extensions is to parameterize the MLTREE
interface with types that extend the various kinds of trees. Thus if
the type \sml{sext} represented statement extensions, we might define
MLTREE statement trees as :
\begin{SML}
datatype stm
= ...
| SEXT of sext * mlrisc list * stm list
and mlrisc = GPR of rexp | FPR of fexp | CCR of ccexp
\end{SML}
where the constructor \sml{SEXT} applies the extension to a list of
arguments. This approach is unsatisfactory in several ways, for
example, if one wanted to extend MLTREEs with for-loops, then the
following could be generated:
\begin{SML}
SEXT(FORLOOP, [GPR from, GPR to, GPR step], body)
\end{SML}
First, the loop arguments have to be wrapped up in \sml{GPR} and there
is little self documentation on the order of elements that are
arguments to the for-loop. It would be better to be able to write
something like:
\begin{SML}
SEXT(FORLOOP\{from=f, to=t, step=s, body=b\})
\end{SML}
Where \sml{f}, \sml{t}, and \sml{s} are \sml{rexp} trees representing
the loop index start, end, and step size; \sml{b} is a stm list
representing the body of the loop. Unfortunately, there is a cyclic
dependency as MLTREEs are defined in terms of \sml{sext}, and {\tt
sext} is defined in terms of MLTREEs. The usual way to deal with
cyclic dependencies is to use polymorphic type variables.
\subsection{MLTREE EXTENSION}
The statement extension type \sml{sext}, is now a type constructor
with arity four, i.e.
\sml{('s, 'r, 'f, 'c) sx} where \sml{sx} is used instead of {\tt
sext}, and \sml{'s}, \sml{'r}, \sml{'f}, and \sml{'c} represents
MLTREE statement expressions, register expressions, floating point
expressions, and condition code expressions. Thus the for-loop
extension could be declared using something like:
\begin{SML}
datatype ('s,'r,'f,'c) sx
= FORLOOP of \{from: 'r, to: 'r, step: 'r, body: 's\}
\end{SML}
and the MLTREE interface is defined as:
\begin{SML}
signature MLTREE = sig
type ('s, 'r, 'f, 'c) sx
datatype stm =
= ...
| SEXT of sext
withtype sext = (stm, rexp, fexp, cexp) sx
end
\end{SML}
where \sml{sext} is the user defined statement extension but the
type variables have been instantiated to the final form the the MLTREE
\sml{stm}, \sml{rexp}, \sml{fexp}, and \sml{cexp} components.
\subsection{Compilation}
There are dedicated modules that perform pattern matching over MLTREEs
and emit native instructions, and similar modules must be written for
extensions. However, the same kinds of choices used in regular MLTREE
patterns must be repeated for extensions. For example, one may define
an extension for the Intel IA32 of the form:
\begin{SML}
datatype ('s,'r,'f,'c) sx = PUSHL of 'r | POPL of 'r | ...
\end{SML}
that translate directly to the Intel push and pop instructions; the
operands in each case are either memory locations or registers, but
immediates are allowed in the case of \sml{PUSHL}. Considerable effort
has been invested into pattern matching the extensive set of
addressing modes for the Intel architecture, and
one would like to reuse this when compiling extensions. The pattern
matching functions are exposed by a set of functions exported from the
instruction selection module, and provided in the MLTREE
interface. They are:
\begin{SML}
struture I : INSTRUCTIONS
datatype reducer =
REDUCER of \{
reduceRexp : rexp -> reg,
reduceFexp : fexp -> reg,
reduceCCexp : ccexp -> reg,
reduceStm : stm * an list -> unit,
operand : rexp -> I.operand,
reduceOperand : I.operand -> reg,
addressOf : rexp -> I.addressing_mode,
emit : I.instr * an list -> unit,
instrStream : (I.instr, I.regmap, I.cellset) stream,
mltreeStream : (stm, I.regmap, mlrisc list) stream
\}
\end{SML}
where \sml{I} is the native instruction set.
\begin{description}
\item[\tt reduceRexp]: reduces an MLTREE \sml{rexp} to a register, and
similarly for \sml{reduceFexp} and \sml{reduceCCexp}.
\item[\tt reduceStm]: reduces an MLTREE \sml{stm} to a set of instructions
that implement the set of statements.
\item[\tt operand]: reduced an MLTREE \sml{rexp} into an instruction
operand --- usually an immediate or memory address.
\item[\tt operand]: moves a native operand into a register.
\item[\tt addressOf]: reduces an MLTREE \sml{rexp} into a memory address.
\item[\tt emit]: emits an instruction together with an annotation.
\item[\tt instrStream]: is the native instruction output stream, and
\item[\tt mltreeStream]: is the MLTREE output stream.
\end{description}
Each extension must provide a function \sml{compileSext} that compiles
a statement extension into native instructions. In the
\sml{MLTREE_EXTENSION_COMP} interface we have:
\begin{SML}
val compileSext: reducer -> {stm: MLTREE.sexp, an:MLTREE.an list} -> unit
\end{SML}
The use of extensions must follow a special structure.
\begin{enumerate}
\item A module defining the extension type using a type constructor
of arity four. Let us call this structure \sml{ExtTy} and must match
the \sml{MLTREE_EXTENSION} interface.
\item The extension module must be used to specialize MLTREEs.
\item A module that describes how to compile the extension must be
created, and must match the \sml{MLTREE_EXTENSION_COMP} interace.
This module will typically be functorized over the MLTREE interface.
Let us call the result of applying the functor, \sml{ExtComp}.
\item The extension compiler must be passed as a parameter to the
instruction selection module that will invoke it whenever an extension
is seen.
\end{enumerate}
\subsection{Multiple Extensions}
Multiple extensions are handled in a similar fashion, except that the
extension type used to specialize MLTREEs is a tagged union of the
individual extensions. The functor to compile the extension dispatches
to the compilation modules for the individual extensions.
\subsection{Example}
Suppose you are in the process of writing a compiler for a digital
signal processing(\newdef{DSP}) programming language using the MLRISC
framework. This wonderful language that you are developing allows the
programmer to specify high level looping and iteration, and
aggregation constructs that are common in DSP applications.
Furthermore, since saturated and fixed point arithmetic are common
constructs in DSP applications, the language and consequently the
compiler should directly support these operators. For simplicity, we
would like to have a unified intermediate representation that can be
used to directly represent high level constructs in our language, and
low level constructs that are already present in MLTree. Since,
MLTree does not directly support these constructs, it seems that it is
not possible to use MLRISC for such a compiler infrastructure without
substantial rewrite of the core components.
Let us suppose that for illustration that we would like to
implement high level looping and aggregation constructs such as
\begin{verbatim}
for i := lower bound ... upper bound
body
x := sum{i := lower bound ... upper bound} expression
\end{verbatim}
together with saturated arithmetic mentioned above.
Here is a first attempt:
\begin{SML}
structure DSPMLTreeExtension
struct
structure Basis = MLTreeBasis
datatype ('s,'r,'f,'c) sx =
FOR of Basis.var * 'r * 'r * 's
and ('s,'r,'f,'c) rx =
SUM of Basis.var * 'r * 'r * 'r
| SADD of 'r * 'r
| SSUB of 'r * 'r
| SMUL of 'r * 'r
| SDIV of 'r * 'r
type ('s,'r,'f,'c) fx = unit
type ('s,'r,'f,'c) ccx = unit
end
structure DSPMLTree : MLTreeF
(structure Extension = DSPMLTreeExtension
...
)
\end{SML}
In the above signature, we have defined two new datatypes \newtype{sx}
and \newtype{rx} that are used for representing the DSP statement
and integer expression extensions. Integer expression extensions
include the high level sum construct, and the low levels saturated
arithmetic operators. The recursive type definition is
necessary to ``inject'' these new constructors into the basic MLTree
definition.
The following is an example of how these new constructors that we have defined can be used. Suppose the source program in our DSP language is:
\begin{verbatim}
for i := a ... b
{ s := sadd(s, table[i]);
}
\end{verbatim}
\noindent where \verb|sadd| is the saturated add operator.
For simplicity, let us also assume that all operations and addresses
are in 32-bits.
Then the translation of the above into our extended DSP-MLTree could be:
\begin{SML}
EXT(FOR(\(i\), REG(32, \(a\)), REG(32, \(b\)),
MV(32, \(s\), REXT(32, SADD(REG(32, \(s\)),
LOAD(32,
ADD(32, REG(32, \(table\)),
SLL(32, REG(32, \(i\)), LI 2)),
\(region\)))))
))
\end{SML}
One potential short coming of our DSP extension to MLTree is that
the extension does not allow any further extensions. This restriction
may be entirely satisfactory if DSP-MLTree is only used in your compiler
applications and no where else. However, if DSP-MLTree is intended
to be an extension library for MLRISC, then we must build in the flexibility
for extension. This can be done in the same way as in the base MLTree
definition, like this:
\begin{SML}
functor ExtensibleDSPMLTreeExtension
(Extension : \mlrischref{mltree/mltree-extension.sig}{MLTREE_EXTENSION}) =
struct
structure Basis = MLTreeBasis
structure Extension = Extension
datatype ('s,'r,'f,'c) sx =
FOR of Basis.var * 'r * 'r * 's
| EXT of ('s,'r,'f,'c) Extension.sx
and ('s,'r,'f,'c) rx =
SUM of Basis.var * 'r * 'r * 'r
| SADD of 'r * 'r
| SSUB of 'r * 'r
| SMUL of 'r * 'r
| SDIV of 'r * 'r
| REXT of ('s,'r,'f,'c) Extension.rx
withtype
('s,'r,'f,'c) fx = ('s,'r,'f,'c) Extension.fx
and ('s,'r,'f,'c) ccx = ('s,'r,'f,'c) Extension.ccx
end
\end{SML}
As in MLTREE, we provide two new extension
constructors \verb|EXT| and \verb|REXT| in
the definition of \sml{DSP_MLTREE}, which can
be used to further enhance the extended MLTREE language.
|