File: mlrisc-md.tex

package info (click to toggle)
mlton 20210117%2Bdfsg-3
links: PTS, VCS
area: main
in suites: sid
size: 58,464 kB
sloc: ansic: 27,682; sh: 4,455; asm: 3,569; lisp: 2,879; makefile: 2,347; perl: 1,169; python: 191; pascal: 68; javascript: 7
file content (662 lines) | stat: -rw-r--r-- 25,311 bytes
parent folder | download | duplicates (5)
\section{Machine Description}
\subsection{Overview}

  \newdef{MDGen} is a simple tool for generating 
various modules in the MLRISC customizable code generator
directly from machine descriptions.   These descriptions 
contain architectural information such as:
\begin{enumerate}
    \item How the the register file(s) are organized.   
    \item How instructions are encoded in machine code: MLRISC uses
this information to generate machine instructions directly into a byte stream.
Directly machine code generation is used in the SML/NJ compiler.
    \item How instructions are pretty printed in assembly: this is used
for debugging and also for assembly output for other non-SML/NJ backends.
    \item How instructions are internally represented in MLRISC. 
   \item Other information needed for performing optimizations, which
        include:
   \begin{enumerate}
     \item The register transfer list (RTL) that defines the 
           operational semantics of the instruction.
     \item Delay slot mechanisms.
     \item Information for performing span dependency resolution.
     \item Pipeline and reservation table characteristics.
   \end{enumerate}
\end{enumerate}

Currently, item 5 is not ready for prime time.

\subsubsection{Why MDGen?}
MLRISC manipulates all instruction sets via a set of abstract
interfaces, which allows the programmer to arbitrarily choose an
instruction representation that is most convenient for a particular 
architecture.  However, various functions that manipulate
this representation must be provided by the instruction set's programmer.  
As the number and complexities of each optimizations grow, and as
the number of architectures increases, the functions
for manipulating the instructions become more numerous and complex.
In order to keep the effort of developing and maintaining
an instruction set manageable,
the MDGen tool is developed to (partially) automate this task.
 
\subsubsection{Syntax}

   MDGen's machine descriptions are written in a syntax that is very
much like that of 
\externhref{http://cm.bell-labs.com/cm/cs/what/smlnj/sml.html}{Standard ML}. 
Most core SML constructs are recognized.  
In addition, new declaration forms specific to MDGen are 
used to specify architectural information.

\paragraph{Reserved Words}
   All SML keywords are reserved words in MDGen.
   In addition, the following keywords are also reserved:

\begin{verbatim}
   always architecture assembly at backwards big bits branching called
   candidate cell cells cellset debug delayslot dependent endian field
   fields formats forwards instruction internal little locations lowercase
   name never nodelayslot nullified opcode ordering padded pipeline predicated
   register rtl signed span storage superscalar unsigned uppercase 
   verbatim version vliw when
\end{verbatim}

   Two kinds are quotations marks are also reserved:   
\begin{SML}
   [[ ]]
   `` ''
\end{SML}

   The first \sml{[[ ]]} is for describing semantics.  The
second \sml{`` ''} is for describing assembly syntax.

\paragraph{Syntactic Sugar}

   MDGen recognizes the following syntactic sugar.
\begin{description}
\item[Record abbreviations]
Record expressions such as \sml{{x=x,y=y,z=z}} 
can be simplified to just \sml{{x,y,z}}.
\item[Binary literals]
Literals in binary can be written with the prefix \sml{0b} (for integer types)
or \sml{0wb} (for word types).   For example, \sml{0wb101111} is the same 
as \sml{0wx2f} and \sml{0w79}.
\item[Bit slices]
   A bit slice, which extracts a range of bits from a word, can be written
using an \sml{at} expression.  For example, \sml{w at [16..18]} 
means the same thing as \verb|Word32.andb(Word32.>>(w, 0w16),0w7)|, i.e.
it extracts bit 16 to 18 from \sml{w}.  
The least significant bit the zeroth bit. 

In general, we can write:
\begin{SML}
  w at [range1, range2, ..., rangen]
\end{SML}
to extract a sequence of slices from $w$ and concatenate them together.
For example, the expression
\begin{SML}
   0wxabcd at [0..3, 4..7, 8..11, 12..15]
\end{SML}
swap the 4 nybbles from the 16-bit word, and evaluates to \sml{0wxdcba}.

\item[Signature]
Signature declarations of the form
\begin{SML}
   val x y z : int -> int
\end{SML}
can be used as a shorthand for the more verbose:
\begin{SML}
   val x : int -> int
   val y : int -> int
   val z : int -> int
\end{SML}
\end{description}

\subsubsection{Elaboration Semantics}

   Unfortunately, there is no complete formal semantics of how
an MDGen specification elaborates.  
   But generally speaking, a machine description is a just a structure 
(in the SML sense).   Different components of this structure describe 
different aspects of the architecture.

\paragraph{Syntactic Overloading}
In general, the syntactic overloading are used heavily in MDGen.
There are three types of definitions:
\begin{itemize}
 \item Definitions that defines properties of the instruction set.
 \item Definitions of functions and terms that are in the RTL meta-language.
The syntax of MDGen's RTL language is borrowed heavily from Lambda-RTL, 
which in turns is borrowed heavily from SML.
 \item Definitions of functions and types that are to be included in the
output generated by the MDGen tool.  These are usually auxiliary
helper functions and definitions.
\end{itemize}
In general, entities of type 2, when appearing in other context, are
properly meta-quoted in the semantics quotations \sml{[[ ]]}.

\subsubsection{Basic Structure of A Machine Description}

   The machine description for an architecture are defined via
an \sml{architecture} declaration, which has the following general
form.

\begin{SML}
architecture name =
struct
   \Term{architecture type declaration}
   \Term{endianess declaration}
   \Term{storage class declarations}
   \Term{locations declarations}
   \Term{assembly case declarations}
   \Term{delayslot declaration}
   \Term{instruction machine encoding format declarations}
   \Term{nested structure declarations}
   \Term{instruction definition}
end
\end{SML}

\subsection{Describing the Architecture} 

\subsubsection{Architecture type}
  Architecture type declaration specifies whether the architecture is
a superscalar or a VLIW/EPIC machine.  Currently, this information is
ignored.

\begin{SML}
   \Term{architecture type declaration} ::= superscalar | vliw
\end{SML}

\subsubsection{Storage class}

Storage class declarations specify various information about the
registers in the architecture.  For example, the Alpha has 32 general
purpose registers and 32 floating point registers.  In addition, MLRISC
requires that each architecture specifies a (pseudo) register 
type\footnote{Called cellkind in MLRISC.} for 
holding condition codes (\sml{CC}). 
To specify these information in MDGen, we can say:

\begin{SML}
   storage
     GP "r" = 32 cells of 64 bits in cellset called "register"
                assembly as (fn (30,_) => "$sp"
                              | (r,_)  => "$"^Int.toString r
                            )
   | FP "f" = 32 cells of 64 bits in cellset called "floating point register"
                assembly as (fn (f,_) => "$f"^Int.toString f)
   | CC "cc" = cells of 64 bits in cellset GP called "condition code register"
                assembly as "cc"
\end{SML}

\begin{itemize}
  \item There are 32 64-bit general purpose registers,
32 64-bit floating point registers, while \sml{CC} is not a 
real register type. 
  \item Cellsets
are used by MLRISC for annotating liveness information in the program.
  The clause \sml{in cellset} states that register type \sml{GP} 
and \sml{FP} are allotted their own components in the cellset,    
while the register type \sml{CC} are put
in the same cellset component as \sml{GP}.
  \item The clause \sml{assembly as} specifies
   how each register is to be pretty printed.  On the Alpha, general 
   purpose register are pretty printed with prefix \sml{$}, while
   floating point registers are pretty printed with the prefix \sml{$f}. 
   A special case is made for register 30, which is the stack pointer, and 
   is pretty printing as \sml{$sp}.  Pseudo condition code registers
   are pretty printed with the prefix \sml{cc}.   
\end{itemize}

\subsubsection{Locations}

  Special locations in the register files can be declared using the
\sml{locations} declarations.  On the Alpha, GPR
30 is the stack pointer, GPR 28 and floating point register 30
are used as the assembly temporaries.  This special constants
can be defined as follows:

\begin{SML}
   locations
       stackptrR = $GP[30]
   and asmTmpR   = $GP[28]
   and fasmTmp   = $FP[30]
\end{SML}

\subsection{Specifying the Machine Encoding}
\subsubsection{Endianess}

The endianess declaration specifies whether the machine is little
endian or big endian so that the correct machine instruction encoding 
functions can be generated.  The general syntax of this is:

\begin{SML}
   \Term{endianess declaration} ::= little endian | big endian
\end{SML}

The Alpha is little endian, so we just say: 
\begin{SML}
    little endian
\end{SML}

\subsubsection{Defining New Instruction Formats}

   How instructions are encoded are specified using 
\sml{instruction format} declarations.  An instruction format declaration
has the following syntax:
\begin{SML}
  \Term{instruction machine encoding format declarations} ::=
     instruction formats n bits 
       \Term{format}1
     | \Term{format}2
     | \Term{format}3
     | ...
     | \Term{format}n-1
     | \Term{format}n
\end{SML}

Each encoding format can be a primitive format, or a derived format.

\paragraph{Primitive formats}

A primitive format is simply specified by giving it a name and specifying
the position, names and types of its fields.   This is usually the same
way it is described in a architectural reference manual.


Here is how we specify some of the (32 bit) primitive instruction formats 
used in the Alpha. 
\begin{SML}
   instruction formats 32 bits
     Memory\{opc:6, ra:5, rb:GP 5, disp: signed 16\} 
   | Jump\{opc:6=0wx1a,ra:GP 5,rb:GP 5,h:2,disp:int signed 14\}  
   | Memory_fun\{opc:6, ra:GP 5, rb:GP 5, func:16\}     
   | Branch\{opc:branch 6, ra:GP 5, disp:signed 21\}         
   | Fbranch\{opc:fbranch 6, ra:FP 5, disp:signed 21\}        
   | Operate0\{opc:6,ra:GP 5,rb:GP 5,sbz:13..15=0,_:1=0,func:5..11,rc:GP 5\}
   | Operate1\{opc:6,ra:GP 5,lit:signed 13..20,_:1=1,func:5..11,rc:GP 5\} 
\end{SML}

For example, the format \sml{Memory}
\begin{SML}
     Memory\{opc:6, ra:5, rb:GP 5, disp: signed 16\} 
\end{SML}
has a 6-bit opcode field, a 5-bit \sml{ra} field, a 5-bit \sml{rb}
field which always hold a general purpose register, and a 16-bit 
sign-extended displacement field.  The field to the left is positioned 
at the most significant bits, while the field to the right is positioned
at the least.  The widths of these fields must add up to 32 bits.


Similarly, the format \sml{Jump}
\begin{SML}
  Jump{opc:6=0wx1a,ra:GP 5,rb:GP 5,h:2,disp:int signed 14}  
\end{SML}
contains a 6-bit opcode field which always hold the constant \sml{0x1a},
two 5-bit fields \sml{ra} and \sml{rb} which are of type \sml{GP},
and a 14-bit sign-extended field of type integer.

  Each field in a primitive format has one of 5 forms:
\begin{SML}
   \Term{name} : \Term{position} 
   \Term{name} : \Term{position} = \Term{value} 
   \Term{name} : \Term{type} \Term{position} 
   \Term{name} : \Term{type} \Term{position} = \Term{value} 
   _           : \Term{position} = \Term{value} 
\end{SML}
where \Term{position} is either a width, or a bits range 
$n$\sml{..}$m$,
with an optional \sml{signed} prefix.  The last form, with a wild card
for the field name, can be used to specify an anonymous field that
always has a fixed value.  


  By default, a field has type \sml{Word32.word}.  If a type $T$ 
is specified, then the function \sml{emit_}$T$ is implicitly called
to convert the type into the appropriate encoding.   The function 
\sml{emit_}$T$ are generated automatically by MDGen if it is a cellkind
defined by the \sml{storage} class declaration, or if it is a primitive
type such as integer or boolean.  
There are also other ways to automatically generate this function
(more on this later.)

  For example, the format \sml{Operate1}
\begin{SML}
   Operate1\{opc:6,ra:GP 5,lit:signed 13..20,_:1=1,func:5..11,rc:GP 5\} 
\end{SML}
states that bits 26 to 31 are allocated to field \sml{opc}, 
bits 21 to 25 are allocated to field \sml{ra}, which is of type 
\sml{GP}, bits 13 to 20 are allocated to field \sml{lit}, bit 12
is a single bit of value 1, etc.


MDGen generates a function for each primitive format declaration of
the same name that can be used for emitting the instruction.  
In the case of the Alpha, the following functions are generated:
\begin{SML}
   val Memory : \{opc:Word32.word, ra:Word32.word, 
                 rb:int, disp:Word32.word\} -> unit
   val Jump   : \{ra:int, rb:int, disp:Word32.word\} -> unit
   val Operate1 : \{opc:Word32.word, ra:int, lit:Word32.word,
                   func:Word32.word, rc:int\} -> unit
\end{SML}

\paragraph{Derived formats}

   Derived formats are simply instruction formats that are defined
in terms of other formats.  On the alpha, we have a \sml{Operate}
format that simplifies to either \sml{Operate0} or \sml{Operate1},
depending on whether the second argument is a literal or a register.  
\begin{SML}
   Operate\{opc,ra,rb,func,rc\} =
     (case rb of
       I.REGop rb => Operate0\{opc,ra,rb,func,rc\}
     | I.IMMop i  => Operate1\{opc,ra,lit=itow i,func,rc\}
     | I.HILABop le => Operate1\{opc,ra,lit=High{le=le},func,rc\}
     | I.LOLABop le => Operate1\{opc,ra,lit=Low{le=le},func,rc\}
     | I.LABop le => Operate1\{opc,ra,lit=itow(LabelExp.valueOf le),func,rc\}
     )
\end{SML}

\subsubsection{Generating Encoding Functions}

   In MLRISC, we represent an instruction as a set of ML datatypes.
Some of these datatypes represent specific fields or 
opcodes of the instructions.
MDGen lets us to associate a machine encoding to each datatype constructor
directly in the specification, and automatically generates an 
encoding function for these datatypes.

There are two different ways of specifying an encoding.  The first way
is just to write the machine encoding directly next the constructor.
Here's an example directly from the Alpha description:
\begin{SML}
   structure Instruction =
   struct
      datatype branch! =  (* table C-2 *)
         BR   0x30
                   | BSR 0x34
                              | BLBC 0x38
       | BEQ  0x39 | BLT 0x3a | BLE  0x3b
       | BLBS 0x3c | BNE 0x3d | BGE  0x3e
       | BGT  0x3f

      datatype fbranch! = (* table C-2 *)
                     FBEQ 0x31 | FBLT 0x32
       | FBLE 0x33             | FBNE 0x35
       | FBGE 0x36 | FBGT 0x37

      ...
   end
\end{SML}

The datatypes \sml{branch} and \sml{fbranch} represent specific
branch opcodes for integer branches \sml{BRANCH}, or floating point
branches \sml{FBRANCH}.  On the Alpha, instruction \sml{BR} is encoded
with an opcode of \sml{0x30}, instruction \sml{BSR} is encoded 
as \sml{0x34} etc.  MDGen automatically generates two functions
\begin{SML}
    val emit_branch : branch -> Word32.word
    val emit_fbranch : branch -> Word32.word
\end{SML}
that perform this encoding.    

In the specification for the instruction set, we state that the
\sml{BRANCH} instruction should be encoded using format \sml{Branch},
while the \sml{FBRANCH} instruction should be encoded using
format \sml{Fbranch}.
\begin{SML}
   structure MC =
   struct
      (* Auxiliary function for computing the displacement of a label *)
      fun disp ... = ...
      ...
   end

   ...

   instruction
     ...

   | BRANCH of branch * $GP * Label.label
     Branch\{opc=branch,ra=GP,disp=disp label\}

   | FBRANCH of fbranch * $FP * Label.label
     Fbranch\{opc=fbranch,ra=FP,disp=disp label\}

   | ...
\end{SML}

Since the primitive instructions formats \sml{Branch} and \sml{FBranch}
are defined with branch and fbranch as the type in the opcode field
\begin{SML}
   | Branch\{opc:branch 6, ra:GP 5, disp:signed 21\}          
   | Fbranch\{opc:fbranch 6, ra:FP 5, disp:signed 21\}       
\end{SML}
the functions \sml{emit_branch} and \sml{emit_fbranch} are implicitly
called.

 
Another way to specify an encoding is to specify a range, as 
in the following example:
\begin{SML}
   datatype fload[0x20..0x23]! = LDF | LDG | LDS | LDT

   datatype fstore[0x24..0x27]! = STF | STG | STS | STT
\end{SML}

This states that \sml{LDF} should be assigned the encoding \sml{0x20},
\sml{LDG} the encoding \sml{0x21} etc.  This form is useful for 
specifying a consecutive range.

\subsubsection{Encoding Variable Length Instructions}

   Most architectures nowadays have fixed length encodings for instructions.  
There are some notatable exceptions, however.  
The Intel x86 architecture uses a legacy
variable length encoding.   Modern RISC machines developed for 
embedded systems may utilize space-reduction compression schemes in their
instruction sets.  Finally, VLIW machines usually have some form
of NOP compression scheme for compacting issue packets. 

\subsection{Specifying the Assembly Formats}

\subsubsection{Assembly Case Declaration}  

  The assembly case declaration specifies whether the assembly should be
emitted in lower case, upper case, or verbatim.  If either lower case
or upper case is specified, all literal strings are converted to the 
appropriate case.  The general syntax of this declaration is:

\begin{SML}
   \Term{assembly case declaration} ::=
      lowercase assembly
    | uppercase assembly
    | verbatim  assembly
\end{SML}

\subsubsection{Assembly Annotations}

   Assembly output are specified in the assembly meta quotations
\sml{`` ''}, or string quotations \sml{" "}.   
For example, here is a fragment from the Alpha description:

\begin{SML}
   instruction
     ...
   | LOAD of \{ldOp:load, r: $GP, b: $GP, d:operand, mem:Region.region\}
     ``<ldOp>\t<r>, <d>()<mem>''

   | STORE of \{stOp:store, r: $GP, b: $GP, d:operand, mem:Region.region\}
     ``<stOp>\t<r>, <d>()<mem>''

   | BRANCH of branch * $GP * Label.label
     ``<branch>\t<GP>, <label>''

   | FBRANCH of fbranch * $FP * Label.label
     ``<fbranch>\t<FP>, <label>''

   | CMOVE of \{oper:cmove, ra: $GP, rb:operand, rc: $GP\}
     ``<oper>\t<ra>, <rb>, <rc>''

   | FOPERATE of \{oper:foperate, fa: $FP, fb: $FP, fc: $FP\}
     ``<oper>\t<fa>, <fb>, <fc>''

   | ...
\end{SML}

   All characters within the quotations \sml{`` ''} have the same 
interpretation as in the string quotation \sml{" "}, except when
they are delimited by the \newdef{backquotes}
\verb|< >|.
Here's how the backquote is interpreted:
\begin{itemize}
\item If it is \verb|<|$x$\verb.>. and $x$ is a variable name of type $t$,
  and if an assembly function of type $t$ is defined, then it will be invoked
  to convert $x$ to the appropriate text.
\item If it is \verb|<|$x$\verb.>. and $x$ is a variable name of type $t$,
  and if an assembly function of type $t$ is NOT defined, 
  then the function \sml{emit_}$x$ will be called to pretty print $x$.
 \item If it is \verb|<|$e$\verb.>. where $e$ is a general expression, then
  it will be used directly. 
\end{itemize}

\subsubsection{Generating Assembly Functions}
   Similar to machine encodings, we can attach assembly annotations to
datatype definitions and let MDGen generate the assembly functions for us.  
Annotations take two forms, explicit or implicit.
Explicit annotations are enclosed within assembly quotations \sml{`` ''}.

 
For example, on the Alpha the datatype \sml{operand} is used to represent
an integer operand.  This datatype is defined as follows:
\begin{SML}
   datatype operand =
       REGop of $GP                     ``<GP>'' 
     | IMMop of int                     ``<int>''
     | HILABop of LabelExp.labexp       ``hi(<labexp>)''
     | LOLABop of LabelExp.labexp       ``lo(<labexp>)''
     | LABop of LabelExp.labexp         ``<labexp>''
     | CONSTop of Constant.const        ``<const>''
\end{SML}
Basicaly this states that \sml{REGop r} should be pretty printed 
as \sml{$r}, \sml{IMMop i} 
as \sml{i}, \sml{HILABexp le} 
as \sml{hi(le)},
etc.
 
Implicit assembly annotations are specified by simply attaching 
an exclamation mark at the end of the datatype name.  This states
that the assembly output is the same as the name of the datatype 
constructor\footnote{But appropriately modified by the assembly case 
declaration.}. For example,
the datatype \sml{operate} is a listing of all integer opcodes 
used in MLRISC.  
\begin{SML}
   datatype operate! = (* table C-5 *)
       ADDL  (0wx10,0wx00)                       | ADDQ (0wx10,0wx20)
                           | CMPBGE(0wx10,0wx0f) | CMPEQ (0wx10,0wx2d)
     | CMPLE (0wx10,0wx6d) | CMPLT (0wx10,0wx4d) | CMPULE (0wx10,0wx3d)
     | CMPULT(0wx10,0wx1d) | SUBL  (0wx10,0wx09)
     | SUBQ  (0wx10,0wx29)
     | S4ADDL(0wx10,0wx02) | S4ADDQ (0wx10,0wx22) | S4SUBL (0wx10,0wx0b)
     | S4SUBQ(0wx10,0wx2b) | S8ADDL (0wx10,0wx12) | S8ADDQ (0wx10,0wx32)
     | S8SUBL(0wx10,0wx1b) | S8SUBQ (0wx10,0wx3b)

     | AND   (0wx11,0wx00) | BIC    (0wx11,0wx08) | BIS (0wx11,0wx20)
                                                  | EQV (0wx11,0wx48)
     | ORNOT (0wx11,0wx28) | XOR    (0wx11,0wx40)

     | EXTBL (0wx12,0wx06) | EXTLH  (0wx12,0wx6a) | EXTLL(0wx12,0wx26)
     | EXTQH (0wx12,0wx7a) | EXTQL  (0wx12,0wx36) | EXTWH(0wx12,0wx5a)
     | EXTWL (0wx12,0wx16) | INSBL  (0wx12,0wx0b) | INSLH(0wx12,0wx67)
     | INSLL (0wx12,0wx2b) | INSQH  (0wx12,0wx77) | INSQL(0wx12,0wx3b)
     | INSWH (0wx12,0wx57) | INSWL  (0wx12,0wx1b) | MSKBL(0wx12,0wx02)
     | MSKLH (0wx12,0wx62) | MSKLL  (0wx12,0wx22) | MSKQH(0wx12,0wx72)
     | MSKQL (0wx12,0wx32) | MSKWH  (0wx12,0wx52) | MSKWL(0wx12,0wx12)
     | SLL   (0wx12,0wx39) | SRA    (0wx12,0wx3c) | SRL  (0wx12,0wx34)
     | ZAP   (0wx12,0wx30) | ZAPNOT (0wx12,0wx31)
     | MULL  (0wx13,0wx00)                        | MULQ (0wx13,0wx20)
                           | UMULH  (0wx13,0wx30)
     | SGNXL "addl" (0wx10,0wx00) (* same as ADDL *)
\end{SML}
This definitions states that \sml{ADDL} should be pretty printed
as \sml{addl}, \sml{ADDQ} as \sml{addq}, etc.  However, the opcode 
\sml{SGNXL} is pretty printed as \sml{addl} since it has been explicitly
overridden.

\subsection{Defining the Instruction Set}

How the instruction set is represented is declared using the
\sml{instruction} declaration.  For example, here's how the Alpha instruction
set is defined:
\begin{SML}
  instruction 
     DEFFREG of $FP
   | LDA of \{r: $GP, b: $GP, d:operand\}	
   | LDAH of \{r: $GP, b: $GP, d:operand\} 
   | LOAD of \{ldOp:load, r: $GP, b: $GP, d:operand, mem:Region.region\}
   | STORE of \{stOp:store, r: $GP, b: $GP, d:operand, mem:Region.region\}
   | FLOAD of \{ldOp:fload, r: $FP, b: $GP, d:operand, mem:Region.region\}
   | FSTORE of \{stOp:fstore, r: $FP, b: $GP, d:operand, mem:Region.region\}
   | JMPL of \{r: $GP, b: $GP, d:int\} * Label.label list
   | JSR of \{r: $GP, b: $GP, d:int\} * C.cellset * C.cellset * Region.region
   | RET of \{r: $GP, b: $GP, d:int\} 
   | BRANCH of branch * $GP * Label.label   
   | FBRANCH of fbranch * $FP * Label.label  
   | ...
\end{SML}

The \sml{instruction} declaration defines a datatype and specifies
that this datatype is used to represent the instruction set.  Generally
speaking, the instruction set's designer has complete freedom in how the
datatype is structured, but there are a few simple rules that she should
follow:
\begin{itemize}
  \item If a field represents a register, it should be typed
with the appropriate storage types \sml{$GP}, 
\sml{$FP}, etc.~instead 
of \sml{int}.   MDGen will treat its value in the correct manner; for
example, during assembly emission a field declared type \sml{int} is
printed as an integer, while a field declared type \sml{$GP} is displayed
as a general purpose register.
  \item MDGen recognizes the following special 
types: \sml{label}, \sml{labexp}, \sml{region}, and \sml{cellset}.
\end{itemize}

\subsection{Specifying Instruction Semantics}
   MLRISC performs all optimizations at 
the granulariy of individual instructions,
specialized to the architecture at hand.  Many
optimizations are possible only if the ``semantics'' of the 
instructions set to are properly specified.  MDGen contains a 
\emph{register transfer language} (RTL) sub-language that let us to describe
instruction semantics in a modular and succinct manner.  
 
   The semantics of this RTL sub-language has been borrowed heavily from
Norman Ramsey's and Jack Davidson's Lambda RTL.  There
are a few main differences, however: 
\begin{itemize}
  \item The syntax of our RTL language 
        is closer to that of ML than Lambda RTL.  
  \item Our RTL language, like that of MDGen, is tied closely to MLRISC.
\end{itemize}

\subsection{How to Run the Tool}

\subsection{Machine Description}
Here are some machine descriptions in varing degree of completion.

\begin{itemize}
 \item \mlrischref{sparc/sparc.mdl}{Sparc} 
 \item \mlrischref{hppa/hppa.mdl}{Hppa} 
 \item \mlrischref{alpha/alpha.mdl}{Alpha}
 \item \mlrischref{ppc/ppc.mdl}{PowerPC} 
 \item \mlrischref{x86/x86.mdl}{X86} 
\end{itemize}

\subsection{ Syntax Highlighting Macros }

\begin{itemize}
  \item \href{md.vim}{For vim 5.3}
\end{itemize}