1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662
|
\section{Machine Description}
\subsection{Overview}
\newdef{MDGen} is a simple tool for generating
various modules in the MLRISC customizable code generator
directly from machine descriptions. These descriptions
contain architectural information such as:
\begin{enumerate}
\item How the the register file(s) are organized.
\item How instructions are encoded in machine code: MLRISC uses
this information to generate machine instructions directly into a byte stream.
Directly machine code generation is used in the SML/NJ compiler.
\item How instructions are pretty printed in assembly: this is used
for debugging and also for assembly output for other non-SML/NJ backends.
\item How instructions are internally represented in MLRISC.
\item Other information needed for performing optimizations, which
include:
\begin{enumerate}
\item The register transfer list (RTL) that defines the
operational semantics of the instruction.
\item Delay slot mechanisms.
\item Information for performing span dependency resolution.
\item Pipeline and reservation table characteristics.
\end{enumerate}
\end{enumerate}
Currently, item 5 is not ready for prime time.
\subsubsection{Why MDGen?}
MLRISC manipulates all instruction sets via a set of abstract
interfaces, which allows the programmer to arbitrarily choose an
instruction representation that is most convenient for a particular
architecture. However, various functions that manipulate
this representation must be provided by the instruction set's programmer.
As the number and complexities of each optimizations grow, and as
the number of architectures increases, the functions
for manipulating the instructions become more numerous and complex.
In order to keep the effort of developing and maintaining
an instruction set manageable,
the MDGen tool is developed to (partially) automate this task.
\subsubsection{Syntax}
MDGen's machine descriptions are written in a syntax that is very
much like that of
\externhref{http://cm.bell-labs.com/cm/cs/what/smlnj/sml.html}{Standard ML}.
Most core SML constructs are recognized.
In addition, new declaration forms specific to MDGen are
used to specify architectural information.
\paragraph{Reserved Words}
All SML keywords are reserved words in MDGen.
In addition, the following keywords are also reserved:
\begin{verbatim}
always architecture assembly at backwards big bits branching called
candidate cell cells cellset debug delayslot dependent endian field
fields formats forwards instruction internal little locations lowercase
name never nodelayslot nullified opcode ordering padded pipeline predicated
register rtl signed span storage superscalar unsigned uppercase
verbatim version vliw when
\end{verbatim}
Two kinds are quotations marks are also reserved:
\begin{SML}
[[ ]]
`` ''
\end{SML}
The first \sml{[[ ]]} is for describing semantics. The
second \sml{`` ''} is for describing assembly syntax.
\paragraph{Syntactic Sugar}
MDGen recognizes the following syntactic sugar.
\begin{description}
\item[Record abbreviations]
Record expressions such as \sml{{x=x,y=y,z=z}}
can be simplified to just \sml{{x,y,z}}.
\item[Binary literals]
Literals in binary can be written with the prefix \sml{0b} (for integer types)
or \sml{0wb} (for word types). For example, \sml{0wb101111} is the same
as \sml{0wx2f} and \sml{0w79}.
\item[Bit slices]
A bit slice, which extracts a range of bits from a word, can be written
using an \sml{at} expression. For example, \sml{w at [16..18]}
means the same thing as \verb|Word32.andb(Word32.>>(w, 0w16),0w7)|, i.e.
it extracts bit 16 to 18 from \sml{w}.
The least significant bit the zeroth bit.
In general, we can write:
\begin{SML}
w at [range1, range2, ..., rangen]
\end{SML}
to extract a sequence of slices from $w$ and concatenate them together.
For example, the expression
\begin{SML}
0wxabcd at [0..3, 4..7, 8..11, 12..15]
\end{SML}
swap the 4 nybbles from the 16-bit word, and evaluates to \sml{0wxdcba}.
\item[Signature]
Signature declarations of the form
\begin{SML}
val x y z : int -> int
\end{SML}
can be used as a shorthand for the more verbose:
\begin{SML}
val x : int -> int
val y : int -> int
val z : int -> int
\end{SML}
\end{description}
\subsubsection{Elaboration Semantics}
Unfortunately, there is no complete formal semantics of how
an MDGen specification elaborates.
But generally speaking, a machine description is a just a structure
(in the SML sense). Different components of this structure describe
different aspects of the architecture.
\paragraph{Syntactic Overloading}
In general, the syntactic overloading are used heavily in MDGen.
There are three types of definitions:
\begin{itemize}
\item Definitions that defines properties of the instruction set.
\item Definitions of functions and terms that are in the RTL meta-language.
The syntax of MDGen's RTL language is borrowed heavily from Lambda-RTL,
which in turns is borrowed heavily from SML.
\item Definitions of functions and types that are to be included in the
output generated by the MDGen tool. These are usually auxiliary
helper functions and definitions.
\end{itemize}
In general, entities of type 2, when appearing in other context, are
properly meta-quoted in the semantics quotations \sml{[[ ]]}.
\subsubsection{Basic Structure of A Machine Description}
The machine description for an architecture are defined via
an \sml{architecture} declaration, which has the following general
form.
\begin{SML}
architecture name =
struct
\Term{architecture type declaration}
\Term{endianess declaration}
\Term{storage class declarations}
\Term{locations declarations}
\Term{assembly case declarations}
\Term{delayslot declaration}
\Term{instruction machine encoding format declarations}
\Term{nested structure declarations}
\Term{instruction definition}
end
\end{SML}
\subsection{Describing the Architecture}
\subsubsection{Architecture type}
Architecture type declaration specifies whether the architecture is
a superscalar or a VLIW/EPIC machine. Currently, this information is
ignored.
\begin{SML}
\Term{architecture type declaration} ::= superscalar | vliw
\end{SML}
\subsubsection{Storage class}
Storage class declarations specify various information about the
registers in the architecture. For example, the Alpha has 32 general
purpose registers and 32 floating point registers. In addition, MLRISC
requires that each architecture specifies a (pseudo) register
type\footnote{Called cellkind in MLRISC.} for
holding condition codes (\sml{CC}).
To specify these information in MDGen, we can say:
\begin{SML}
storage
GP "r" = 32 cells of 64 bits in cellset called "register"
assembly as (fn (30,_) => "$sp"
| (r,_) => "$"^Int.toString r
)
| FP "f" = 32 cells of 64 bits in cellset called "floating point register"
assembly as (fn (f,_) => "$f"^Int.toString f)
| CC "cc" = cells of 64 bits in cellset GP called "condition code register"
assembly as "cc"
\end{SML}
\begin{itemize}
\item There are 32 64-bit general purpose registers,
32 64-bit floating point registers, while \sml{CC} is not a
real register type.
\item Cellsets
are used by MLRISC for annotating liveness information in the program.
The clause \sml{in cellset} states that register type \sml{GP}
and \sml{FP} are allotted their own components in the cellset,
while the register type \sml{CC} are put
in the same cellset component as \sml{GP}.
\item The clause \sml{assembly as} specifies
how each register is to be pretty printed. On the Alpha, general
purpose register are pretty printed with prefix \sml{$}, while
floating point registers are pretty printed with the prefix \sml{$f}.
A special case is made for register 30, which is the stack pointer, and
is pretty printing as \sml{$sp}. Pseudo condition code registers
are pretty printed with the prefix \sml{cc}.
\end{itemize}
\subsubsection{Locations}
Special locations in the register files can be declared using the
\sml{locations} declarations. On the Alpha, GPR
30 is the stack pointer, GPR 28 and floating point register 30
are used as the assembly temporaries. This special constants
can be defined as follows:
\begin{SML}
locations
stackptrR = $GP[30]
and asmTmpR = $GP[28]
and fasmTmp = $FP[30]
\end{SML}
\subsection{Specifying the Machine Encoding}
\subsubsection{Endianess}
The endianess declaration specifies whether the machine is little
endian or big endian so that the correct machine instruction encoding
functions can be generated. The general syntax of this is:
\begin{SML}
\Term{endianess declaration} ::= little endian | big endian
\end{SML}
The Alpha is little endian, so we just say:
\begin{SML}
little endian
\end{SML}
\subsubsection{Defining New Instruction Formats}
How instructions are encoded are specified using
\sml{instruction format} declarations. An instruction format declaration
has the following syntax:
\begin{SML}
\Term{instruction machine encoding format declarations} ::=
instruction formats n bits
\Term{format}1
| \Term{format}2
| \Term{format}3
| ...
| \Term{format}n-1
| \Term{format}n
\end{SML}
Each encoding format can be a primitive format, or a derived format.
\paragraph{Primitive formats}
A primitive format is simply specified by giving it a name and specifying
the position, names and types of its fields. This is usually the same
way it is described in a architectural reference manual.
Here is how we specify some of the (32 bit) primitive instruction formats
used in the Alpha.
\begin{SML}
instruction formats 32 bits
Memory\{opc:6, ra:5, rb:GP 5, disp: signed 16\}
| Jump\{opc:6=0wx1a,ra:GP 5,rb:GP 5,h:2,disp:int signed 14\}
| Memory_fun\{opc:6, ra:GP 5, rb:GP 5, func:16\}
| Branch\{opc:branch 6, ra:GP 5, disp:signed 21\}
| Fbranch\{opc:fbranch 6, ra:FP 5, disp:signed 21\}
| Operate0\{opc:6,ra:GP 5,rb:GP 5,sbz:13..15=0,_:1=0,func:5..11,rc:GP 5\}
| Operate1\{opc:6,ra:GP 5,lit:signed 13..20,_:1=1,func:5..11,rc:GP 5\}
\end{SML}
For example, the format \sml{Memory}
\begin{SML}
Memory\{opc:6, ra:5, rb:GP 5, disp: signed 16\}
\end{SML}
has a 6-bit opcode field, a 5-bit \sml{ra} field, a 5-bit \sml{rb}
field which always hold a general purpose register, and a 16-bit
sign-extended displacement field. The field to the left is positioned
at the most significant bits, while the field to the right is positioned
at the least. The widths of these fields must add up to 32 bits.
Similarly, the format \sml{Jump}
\begin{SML}
Jump{opc:6=0wx1a,ra:GP 5,rb:GP 5,h:2,disp:int signed 14}
\end{SML}
contains a 6-bit opcode field which always hold the constant \sml{0x1a},
two 5-bit fields \sml{ra} and \sml{rb} which are of type \sml{GP},
and a 14-bit sign-extended field of type integer.
Each field in a primitive format has one of 5 forms:
\begin{SML}
\Term{name} : \Term{position}
\Term{name} : \Term{position} = \Term{value}
\Term{name} : \Term{type} \Term{position}
\Term{name} : \Term{type} \Term{position} = \Term{value}
_ : \Term{position} = \Term{value}
\end{SML}
where \Term{position} is either a width, or a bits range
$n$\sml{..}$m$,
with an optional \sml{signed} prefix. The last form, with a wild card
for the field name, can be used to specify an anonymous field that
always has a fixed value.
By default, a field has type \sml{Word32.word}. If a type $T$
is specified, then the function \sml{emit_}$T$ is implicitly called
to convert the type into the appropriate encoding. The function
\sml{emit_}$T$ are generated automatically by MDGen if it is a cellkind
defined by the \sml{storage} class declaration, or if it is a primitive
type such as integer or boolean.
There are also other ways to automatically generate this function
(more on this later.)
For example, the format \sml{Operate1}
\begin{SML}
Operate1\{opc:6,ra:GP 5,lit:signed 13..20,_:1=1,func:5..11,rc:GP 5\}
\end{SML}
states that bits 26 to 31 are allocated to field \sml{opc},
bits 21 to 25 are allocated to field \sml{ra}, which is of type
\sml{GP}, bits 13 to 20 are allocated to field \sml{lit}, bit 12
is a single bit of value 1, etc.
MDGen generates a function for each primitive format declaration of
the same name that can be used for emitting the instruction.
In the case of the Alpha, the following functions are generated:
\begin{SML}
val Memory : \{opc:Word32.word, ra:Word32.word,
rb:int, disp:Word32.word\} -> unit
val Jump : \{ra:int, rb:int, disp:Word32.word\} -> unit
val Operate1 : \{opc:Word32.word, ra:int, lit:Word32.word,
func:Word32.word, rc:int\} -> unit
\end{SML}
\paragraph{Derived formats}
Derived formats are simply instruction formats that are defined
in terms of other formats. On the alpha, we have a \sml{Operate}
format that simplifies to either \sml{Operate0} or \sml{Operate1},
depending on whether the second argument is a literal or a register.
\begin{SML}
Operate\{opc,ra,rb,func,rc\} =
(case rb of
I.REGop rb => Operate0\{opc,ra,rb,func,rc\}
| I.IMMop i => Operate1\{opc,ra,lit=itow i,func,rc\}
| I.HILABop le => Operate1\{opc,ra,lit=High{le=le},func,rc\}
| I.LOLABop le => Operate1\{opc,ra,lit=Low{le=le},func,rc\}
| I.LABop le => Operate1\{opc,ra,lit=itow(LabelExp.valueOf le),func,rc\}
)
\end{SML}
\subsubsection{Generating Encoding Functions}
In MLRISC, we represent an instruction as a set of ML datatypes.
Some of these datatypes represent specific fields or
opcodes of the instructions.
MDGen lets us to associate a machine encoding to each datatype constructor
directly in the specification, and automatically generates an
encoding function for these datatypes.
There are two different ways of specifying an encoding. The first way
is just to write the machine encoding directly next the constructor.
Here's an example directly from the Alpha description:
\begin{SML}
structure Instruction =
struct
datatype branch! = (* table C-2 *)
BR 0x30
| BSR 0x34
| BLBC 0x38
| BEQ 0x39 | BLT 0x3a | BLE 0x3b
| BLBS 0x3c | BNE 0x3d | BGE 0x3e
| BGT 0x3f
datatype fbranch! = (* table C-2 *)
FBEQ 0x31 | FBLT 0x32
| FBLE 0x33 | FBNE 0x35
| FBGE 0x36 | FBGT 0x37
...
end
\end{SML}
The datatypes \sml{branch} and \sml{fbranch} represent specific
branch opcodes for integer branches \sml{BRANCH}, or floating point
branches \sml{FBRANCH}. On the Alpha, instruction \sml{BR} is encoded
with an opcode of \sml{0x30}, instruction \sml{BSR} is encoded
as \sml{0x34} etc. MDGen automatically generates two functions
\begin{SML}
val emit_branch : branch -> Word32.word
val emit_fbranch : branch -> Word32.word
\end{SML}
that perform this encoding.
In the specification for the instruction set, we state that the
\sml{BRANCH} instruction should be encoded using format \sml{Branch},
while the \sml{FBRANCH} instruction should be encoded using
format \sml{Fbranch}.
\begin{SML}
structure MC =
struct
(* Auxiliary function for computing the displacement of a label *)
fun disp ... = ...
...
end
...
instruction
...
| BRANCH of branch * $GP * Label.label
Branch\{opc=branch,ra=GP,disp=disp label\}
| FBRANCH of fbranch * $FP * Label.label
Fbranch\{opc=fbranch,ra=FP,disp=disp label\}
| ...
\end{SML}
Since the primitive instructions formats \sml{Branch} and \sml{FBranch}
are defined with branch and fbranch as the type in the opcode field
\begin{SML}
| Branch\{opc:branch 6, ra:GP 5, disp:signed 21\}
| Fbranch\{opc:fbranch 6, ra:FP 5, disp:signed 21\}
\end{SML}
the functions \sml{emit_branch} and \sml{emit_fbranch} are implicitly
called.
Another way to specify an encoding is to specify a range, as
in the following example:
\begin{SML}
datatype fload[0x20..0x23]! = LDF | LDG | LDS | LDT
datatype fstore[0x24..0x27]! = STF | STG | STS | STT
\end{SML}
This states that \sml{LDF} should be assigned the encoding \sml{0x20},
\sml{LDG} the encoding \sml{0x21} etc. This form is useful for
specifying a consecutive range.
\subsubsection{Encoding Variable Length Instructions}
Most architectures nowadays have fixed length encodings for instructions.
There are some notatable exceptions, however.
The Intel x86 architecture uses a legacy
variable length encoding. Modern RISC machines developed for
embedded systems may utilize space-reduction compression schemes in their
instruction sets. Finally, VLIW machines usually have some form
of NOP compression scheme for compacting issue packets.
\subsection{Specifying the Assembly Formats}
\subsubsection{Assembly Case Declaration}
The assembly case declaration specifies whether the assembly should be
emitted in lower case, upper case, or verbatim. If either lower case
or upper case is specified, all literal strings are converted to the
appropriate case. The general syntax of this declaration is:
\begin{SML}
\Term{assembly case declaration} ::=
lowercase assembly
| uppercase assembly
| verbatim assembly
\end{SML}
\subsubsection{Assembly Annotations}
Assembly output are specified in the assembly meta quotations
\sml{`` ''}, or string quotations \sml{" "}.
For example, here is a fragment from the Alpha description:
\begin{SML}
instruction
...
| LOAD of \{ldOp:load, r: $GP, b: $GP, d:operand, mem:Region.region\}
``<ldOp>\t<r>, <d>()<mem>''
| STORE of \{stOp:store, r: $GP, b: $GP, d:operand, mem:Region.region\}
``<stOp>\t<r>, <d>()<mem>''
| BRANCH of branch * $GP * Label.label
``<branch>\t<GP>, <label>''
| FBRANCH of fbranch * $FP * Label.label
``<fbranch>\t<FP>, <label>''
| CMOVE of \{oper:cmove, ra: $GP, rb:operand, rc: $GP\}
``<oper>\t<ra>, <rb>, <rc>''
| FOPERATE of \{oper:foperate, fa: $FP, fb: $FP, fc: $FP\}
``<oper>\t<fa>, <fb>, <fc>''
| ...
\end{SML}
All characters within the quotations \sml{`` ''} have the same
interpretation as in the string quotation \sml{" "}, except when
they are delimited by the \newdef{backquotes}
\verb|< >|.
Here's how the backquote is interpreted:
\begin{itemize}
\item If it is \verb|<|$x$\verb.>. and $x$ is a variable name of type $t$,
and if an assembly function of type $t$ is defined, then it will be invoked
to convert $x$ to the appropriate text.
\item If it is \verb|<|$x$\verb.>. and $x$ is a variable name of type $t$,
and if an assembly function of type $t$ is NOT defined,
then the function \sml{emit_}$x$ will be called to pretty print $x$.
\item If it is \verb|<|$e$\verb.>. where $e$ is a general expression, then
it will be used directly.
\end{itemize}
\subsubsection{Generating Assembly Functions}
Similar to machine encodings, we can attach assembly annotations to
datatype definitions and let MDGen generate the assembly functions for us.
Annotations take two forms, explicit or implicit.
Explicit annotations are enclosed within assembly quotations \sml{`` ''}.
For example, on the Alpha the datatype \sml{operand} is used to represent
an integer operand. This datatype is defined as follows:
\begin{SML}
datatype operand =
REGop of $GP ``<GP>''
| IMMop of int ``<int>''
| HILABop of LabelExp.labexp ``hi(<labexp>)''
| LOLABop of LabelExp.labexp ``lo(<labexp>)''
| LABop of LabelExp.labexp ``<labexp>''
| CONSTop of Constant.const ``<const>''
\end{SML}
Basicaly this states that \sml{REGop r} should be pretty printed
as \sml{$r}, \sml{IMMop i}
as \sml{i}, \sml{HILABexp le}
as \sml{hi(le)},
etc.
Implicit assembly annotations are specified by simply attaching
an exclamation mark at the end of the datatype name. This states
that the assembly output is the same as the name of the datatype
constructor\footnote{But appropriately modified by the assembly case
declaration.}. For example,
the datatype \sml{operate} is a listing of all integer opcodes
used in MLRISC.
\begin{SML}
datatype operate! = (* table C-5 *)
ADDL (0wx10,0wx00) | ADDQ (0wx10,0wx20)
| CMPBGE(0wx10,0wx0f) | CMPEQ (0wx10,0wx2d)
| CMPLE (0wx10,0wx6d) | CMPLT (0wx10,0wx4d) | CMPULE (0wx10,0wx3d)
| CMPULT(0wx10,0wx1d) | SUBL (0wx10,0wx09)
| SUBQ (0wx10,0wx29)
| S4ADDL(0wx10,0wx02) | S4ADDQ (0wx10,0wx22) | S4SUBL (0wx10,0wx0b)
| S4SUBQ(0wx10,0wx2b) | S8ADDL (0wx10,0wx12) | S8ADDQ (0wx10,0wx32)
| S8SUBL(0wx10,0wx1b) | S8SUBQ (0wx10,0wx3b)
| AND (0wx11,0wx00) | BIC (0wx11,0wx08) | BIS (0wx11,0wx20)
| EQV (0wx11,0wx48)
| ORNOT (0wx11,0wx28) | XOR (0wx11,0wx40)
| EXTBL (0wx12,0wx06) | EXTLH (0wx12,0wx6a) | EXTLL(0wx12,0wx26)
| EXTQH (0wx12,0wx7a) | EXTQL (0wx12,0wx36) | EXTWH(0wx12,0wx5a)
| EXTWL (0wx12,0wx16) | INSBL (0wx12,0wx0b) | INSLH(0wx12,0wx67)
| INSLL (0wx12,0wx2b) | INSQH (0wx12,0wx77) | INSQL(0wx12,0wx3b)
| INSWH (0wx12,0wx57) | INSWL (0wx12,0wx1b) | MSKBL(0wx12,0wx02)
| MSKLH (0wx12,0wx62) | MSKLL (0wx12,0wx22) | MSKQH(0wx12,0wx72)
| MSKQL (0wx12,0wx32) | MSKWH (0wx12,0wx52) | MSKWL(0wx12,0wx12)
| SLL (0wx12,0wx39) | SRA (0wx12,0wx3c) | SRL (0wx12,0wx34)
| ZAP (0wx12,0wx30) | ZAPNOT (0wx12,0wx31)
| MULL (0wx13,0wx00) | MULQ (0wx13,0wx20)
| UMULH (0wx13,0wx30)
| SGNXL "addl" (0wx10,0wx00) (* same as ADDL *)
\end{SML}
This definitions states that \sml{ADDL} should be pretty printed
as \sml{addl}, \sml{ADDQ} as \sml{addq}, etc. However, the opcode
\sml{SGNXL} is pretty printed as \sml{addl} since it has been explicitly
overridden.
\subsection{Defining the Instruction Set}
How the instruction set is represented is declared using the
\sml{instruction} declaration. For example, here's how the Alpha instruction
set is defined:
\begin{SML}
instruction
DEFFREG of $FP
| LDA of \{r: $GP, b: $GP, d:operand\}
| LDAH of \{r: $GP, b: $GP, d:operand\}
| LOAD of \{ldOp:load, r: $GP, b: $GP, d:operand, mem:Region.region\}
| STORE of \{stOp:store, r: $GP, b: $GP, d:operand, mem:Region.region\}
| FLOAD of \{ldOp:fload, r: $FP, b: $GP, d:operand, mem:Region.region\}
| FSTORE of \{stOp:fstore, r: $FP, b: $GP, d:operand, mem:Region.region\}
| JMPL of \{r: $GP, b: $GP, d:int\} * Label.label list
| JSR of \{r: $GP, b: $GP, d:int\} * C.cellset * C.cellset * Region.region
| RET of \{r: $GP, b: $GP, d:int\}
| BRANCH of branch * $GP * Label.label
| FBRANCH of fbranch * $FP * Label.label
| ...
\end{SML}
The \sml{instruction} declaration defines a datatype and specifies
that this datatype is used to represent the instruction set. Generally
speaking, the instruction set's designer has complete freedom in how the
datatype is structured, but there are a few simple rules that she should
follow:
\begin{itemize}
\item If a field represents a register, it should be typed
with the appropriate storage types \sml{$GP},
\sml{$FP}, etc.~instead
of \sml{int}. MDGen will treat its value in the correct manner; for
example, during assembly emission a field declared type \sml{int} is
printed as an integer, while a field declared type \sml{$GP} is displayed
as a general purpose register.
\item MDGen recognizes the following special
types: \sml{label}, \sml{labexp}, \sml{region}, and \sml{cellset}.
\end{itemize}
\subsection{Specifying Instruction Semantics}
MLRISC performs all optimizations at
the granulariy of individual instructions,
specialized to the architecture at hand. Many
optimizations are possible only if the ``semantics'' of the
instructions set to are properly specified. MDGen contains a
\emph{register transfer language} (RTL) sub-language that let us to describe
instruction semantics in a modular and succinct manner.
The semantics of this RTL sub-language has been borrowed heavily from
Norman Ramsey's and Jack Davidson's Lambda RTL. There
are a few main differences, however:
\begin{itemize}
\item The syntax of our RTL language
is closer to that of ML than Lambda RTL.
\item Our RTL language, like that of MDGen, is tied closely to MLRISC.
\end{itemize}
\subsection{How to Run the Tool}
\subsection{Machine Description}
Here are some machine descriptions in varing degree of completion.
\begin{itemize}
\item \mlrischref{sparc/sparc.mdl}{Sparc}
\item \mlrischref{hppa/hppa.mdl}{Hppa}
\item \mlrischref{alpha/alpha.mdl}{Alpha}
\item \mlrischref{ppc/ppc.mdl}{PowerPC}
\item \mlrischref{x86/x86.mdl}{X86}
\end{itemize}
\subsection{ Syntax Highlighting Macros }
\begin{itemize}
\item \href{md.vim}{For vim 5.3}
\end{itemize}
|