1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118
|
\section{A small assembler for the MIPS}
This is part of the code generator for Standard ML of New Jersey.
We generate code in several stages.
This is nearly the lowest stage; it is like an assembler.
The user can call any function in the [[MIPSCODER]] signature.
Each one corresponds to an assembler pseudo-instruction.
Most correspond to single MIPS instructions.
The assembler remembers all the instructions that have been
requested, and when [[codegen]] is called it generates MIPS
code for them.
Some other structure will be able to use the MIPS structure to implement
a [[CMACHINE]], which is the abstract machine that ML thinks it is running
on.
(What really happens is a functor maps some structure
implementing [[MIPSCODER]] to a different structure implementing
[[CMACHINE]].)
{\em Any function using a structure of this signature must avoid
touching registers 1~and~31.
Those registers are reserved for use by the assembler.}
@ Here is the signature of the assembler, [[MIPSCODER]].
It can be extracted from this file by
$$\hbox{\tt notangle mipsinstr.nw -Rsignature}.$$
<<signature>>=
signature MIPSCODER = sig
(* Assembler for the MIPS chip *)
eqtype Label
datatype Register = Reg of int
(* Registers 1 and 31 are reserved for use by this assembler *)
datatype EA = Direct of Register | Immed of int | Immedlab of Label
(* effective address *)
structure M : sig
(* Emit various constants into the code *)
val emitstring : string -> unit (* put a literal string into the
code (null-terminated?) and
extend with nulls to 4-byte
boundary. Just chars, no
descriptor or length *)
exception BadReal of string
val low_order_offset : int (* does the low-order word of a
floating point literal come
first (0) or second (1) *)
val realconst : string -> unit (* emit a floating pt literal *)
val emitlong : int -> unit (* emit a 4-byte integer literal *)
(* Label bindings and emissions *)
val newlabel : unit -> Label (* new, unbound label *)
val define : Label -> unit (* cause the label to be bound to
the code about to be generated *)
val emitlab : int * Label -> unit (* L3: emitlab(k,L2) is equivalent to
L3: emitlong(k+L2-L3) *)
(* Control flow instructions *)
val slt : Register * EA * Register -> unit
(* (operand1, operand2, result) *)
(* set less than family *)
val beq : bool * Register * Register * Label -> unit
(* (beq or bne, operand1, operand2, branch address) *)
(* branch equal/not equal family *)
val jump : Register -> unit (* jump register instruction *)
val slt_double : Register * Register -> unit
(* floating pt set less than *)
val seq_double : Register * Register -> unit
(* floating pt set equal *)
val bcop1 : bool * Label -> unit (* floating pt conditional branch *)
(* Arithmetic instructions *)
(* arguments are (operand1, operand2, result) *)
val add : Register * EA * Register -> unit
val and' : Register * EA * Register -> unit
val or : Register * EA * Register -> unit
val xor : Register * EA * Register -> unit
val sub : Register * Register * Register -> unit
val div : Register * Register * Register -> unit
(* first arg is some register
guaranteed to overflow when
added to itself. Used to
detect divide by zero. *)
val mult : Register * Register * Register -> unit
val mfhi : Register -> unit (* high word of 64-bit multiply *)
(* Floating point arithmetic *)
val neg_double : Register * Register -> unit
val mul_double : Register * Register * Register -> unit
val div_double : Register * Register * Register -> unit
val add_double : Register * Register * Register -> unit
val sub_double : Register * Register * Register -> unit
(* Move pseudo-instruction : move(src,dest) *)
val move : EA * Register -> unit
(* Load and store instructions *)
(* arguments are (destination, source address, offset) *)
val lbu : Register * EA * int -> unit (* bytes *)
val sb : Register * EA * int -> unit
val lw : Register * EA * int -> unit (* words *)
val sw : Register * EA * int -> unit
val lwc1: Register * EA * int -> unit (* floating point coprocessor *)
val swc1: Register * EA * int -> unit
val lui : Register * int -> unit
(* Shift instructions *)
(* arguments are (shamt, operand, result) *)
(* shamt as Immedlab _ is senseless *)
val sll : EA * Register * Register -> unit
val sra : EA * Register * Register -> unit
(* Miscellany *)
val align : unit -> unit (* cause next data to be emitted on
a 4-byte boundary *)
val mark : unit -> unit (* emit a back pointer,
also called mark *)
val comment : string -> unit
end (* signature of structure M *)
val codegen : unit->unit
val codestats : outstream -> unit (* write statistics on stream *)
end (* signature MIPSCODER *)
@ The basic strategy of the implementation is to hold on, via the [[kept]]
pointer, to the list of instructions generated so far.
We use [[instr]] for the type of an instruction, so
[[kept]] has type [[instr list ref]].
The instructions will be executed in the following order: the
instruction at the head of the [[!kept]] is executed last.
This enables us to accept calls in the order of execution but
add the new instruction(s) to the list in constant time.
@
We structure the instruction stream a little bit by factoring
out the different load and store instructions that can
occur: we have load byte, load word, and load to coprocessor (floating point).
<<types auxiliary to [[instr]]>>=
datatype size = Byte | Word | Floating
@
Here are the instructions that exist.
We list them in more or less the order of the MIPSCODER signature.
<<definition of [[instr]]>>=
<<types auxiliary to [[instr]]>>
datatype instr =
STRINGCONST of string (* constants *)
| EMITLONG of int
| DEFINE of Label (* labels *)
| EMITLAB of int * Label
| SLT of Register * EA * Register (* control flow *)
| BEQ of bool * Register * Register * Label
| JUMP of Register
| SLT_D of Register * Register
| SEQ_D of Register * Register
| BCOP1 of bool * Label
| NOP (* no-op for delay slot *)
| ADD of Register * EA * Register (* arithmetic *)
| AND of Register * EA * Register
| OR of Register * EA * Register
| XOR of Register * EA * Register
| SUB of Register * Register * Register
| MULT of Register * Register
| DIV of Register * Register
| MFLO of Register (* mflo instruction used with
64-bit multiply and divide *)
| MFHI of Register
| NEG_D of Register * Register
| MUL_D of Register * Register * Register
| DIV_D of Register * Register * Register
| ADD_D of Register * Register * Register
| SUB_D of Register * Register * Register
| MOVE of EA * Register (* put something into a register *)
| LDI_32 of int * Register (* load in a big immediate constant (>16 bits) *)
| LUI of Register * int (* Mips lui instruction *)
| LOAD of size * Register * EA * int (* load and store *)
| STORE of size * Register * EA * int
| SLL of EA * Register * Register (* shift *)
| SRA of EA * Register * Register
| COMMENT of string (* generates nothing *)
| MARK (* a backpointer *)
| BREAK of int (* break instruction *)
@
Here is the code that handles the generated stream, [[kept]].
It begins life as [[nil]] and returns to [[nil]] every time code is
generated.
The function [[keep]] is a convenient way of adding a single [[instr]] to
the list; it's very terse.
Sometimes we have to add multiple [[instr]]s; then we use [[keeplist]].
We also define a function [[delay]] that is just like a [[keep]] but
it adds a NOP in the delay slot.
<<instruction stream and its functions>>=
val kept = ref nil : instr list ref
fun keep f a = kept := f a :: !kept
fun delay f a = kept := NOP :: f a :: !kept
fun keeplist l = kept := l @ !kept
<<reinitialize [[kept]]>>=
kept := nil
@
\subsection{Exporting functions for {\tt MIPSCODER}}
We now know enough to implement most of the functions called for in
[[MIPSCODER]].
We still haven't decided on an implementation of labels,
and there is one subtlety in multiplication and division,
but the rest is set.
<<[[MIPSCODER]] functions>>=
val emitstring = keep STRINGCONST (* literals *)
exception BadReal = IEEEReal.BadReal
val low_order_offset = Emitter.low_order_offset
val realconst = keep (STRINGCONST o order_real o IEEEReal.realconst)
val emitlong = keep EMITLONG
<<label functions>> (* labels *)
val slt = keep SLT (* control flow *)
val beq = delay BEQ
val jump = delay JUMP
val slt_double = delay SLT_D
val seq_double = delay SEQ_D
val bcop1 = delay BCOP1
val add = keep ADD (* arithmetic *)
val and' = keep AND
val or = keep OR
val xor = keep XOR
val op sub = keep SUB
<<multiplication and division functions>>
val neg_double = keep NEG_D
val mul_double = keep MUL_D
val div_double = keep DIV_D
val add_double = keep ADD_D
val sub_double = keep SUB_D
val move = keep MOVE
fun lbu (a,b,c) = delay LOAD (Byte,a,b,c) (* load and store *)
fun lw (a,b,c) = delay LOAD (Word,a,b,c)
fun lwc1 (a,b,c) = delay LOAD (Floating,a,b,c)
fun sb (a,b,c) = keep STORE (Byte,a,b,c)
fun sw (a,b,c) = keep STORE (Word,a,b,c)
fun swc1 (a,b,c) = delay STORE (Floating,a,b,c)
val lui = keep LUI
val sll = keep SLL (* shift *)
val sra = keep SRA
fun align() = () (* never need to align on MIPS *)
val mark = keep (fn () => MARK)
val comment = keep COMMENT
@
Multiplication has a minor complication; the
result has to be fetched from the LO register.
<<multiplication and division functions>>=
fun mult (op1, op2, result) = keeplist [MFLO result, MULT (op1, op2)]
val mfhi = keep MFHI
@
Division has a major complication; I must test for divide by zero since
the hardware does not.
If the divisor is zero, I cause an overflow exception by
adding [[limitreg]] to itself.
<<multiplication and division functions>>=
fun op div (op1, op2, result) =
let val next = newlabel()
in keeplist [
MFLO result, (* get the result *)
DEFINE next, (* skip to here if nonzero *)
BREAK 7, (* signals zerodivide *)
DIV (op1, op2), (* divide in delay slot *)
BEQ (false, Reg 0, op2, next) (* skip if divisor nonzero *)
]
end
@
For now, labels are just pointers to integers.
During code generation, those integers will be set to positions
in the instruction stream, and then they'll be useful as addresses
relative to the program counter pointer (to be held in [[Reg pcreg]]).
<<definition of [[Label]]>>=
type Label = int ref
<<label functions>>=
fun newlabel () = ref 0
val define = keep DEFINE
val emitlab = keep EMITLAB
@
Here's the overall plan of this structure:
<<*>>=
functor MipsCoder(Emitter: EMITTER) : MIPSCODER = struct
open Emitter
<<definition of [[Label]]>>
datatype Register = Reg of int
datatype EA = Direct of Register
| Immed of int
| Immedlab of Label
<<definition of [[instr]]>>
<<instruction stream and its functions>>
structure M = struct
<<[[MIPSCODER]] functions>>
end
open M
<<functions that assemble [[instr]]s into code>>
<<statistics>>
end (* MipsInstr *)
@ \subsection{Sizes of {\tt instr}s}
Now let's consider the correspondence between our [[instr]] type and the
actual MIPS instructions we intend to emit.
One important problem to solve is figuring out how big things are,
so that we know what addresses to generate for the various labels.
We will also want to know what address is currently stored in the program
counter regsiter ([[pcreg]]),
because we'll need to know when something is close
enough that we can use a sixteen-bit address relative to that register.
The kind of address we can use will determine how big things are.
We'll rearrange the code so that we have a list of [[ref int * instr]] pairs,
where the [[ref int]] stores the position in the list.
(Positions start at zero.)
Since in the MIPS all instructions are the same size, we measure
position as number of instructions.
While we're at it, we reverse the list so that the head will execute first,
then the rest of the list.
We begin with each position set to zero, and make a pass over the list
trying to set the value of each position.
We do this by estimating the size of (number of MIPS instructions
generated for) each [[instr]].
Since there are forward references, we may not have all the distances right
the first time, so we have to make a second pass.
But during this second pass we could find that something is further
away than we thought, and we have to switch from using a pc-relative mode to
something else (or maybe grab the new pc?), which changes the size again,
and moves things even further away.
Because we can't control this process, we just keep making passes over the
list until the process quiesces (we get the same size twice).
In order to guarantee termination, we have to make sure later passes only
increase the sizes of things.
This is sufficient since there is a maximum number of MIPS instructions
we can generate for each [[instr]].
While we're at it, we might want to complicate things by making the function
that does the passes also emit code.
For a single pass we hand an optional triple of emitters, the initial position,
an [[int option]] for the program counter pointer (if known), and the
instructions.
I'm not sure what explains the use of the [[ref int]] to track the position,
instead of just an [[int]]---it might be a desire to avoid the
overhead of creating a bunch of new objects, or it might be really hard
to do the passes cheaply.
It should think a variation on [[map]] would do the job, but maybe I'm
missing something.
@
[[emit : int * int -> unit]] emits one instruction,
and [[emit_string : int -> string -> unit]] emits a string constant.
[[emit_string]] could be specified as a function of [[emit]],
but the nature of the function would depend on whether the target
machine was little-endian or big-endian, and we don't want to have
that dependency built in.
[[instrs]] is the
list of instructions (in execute-head-last order).
The second argument to [[pass]] indicates for what instructions code
is to be generated.
It is a record (position of next instruction, program counter pointer if any,
remaining instructions to generate [with positions]).
\indent [[prepare]] produces two results: the instruction stream with
size pointers added, and the total size of code to be generated.
We add the total size because that is the only way to find the number
of [[bltzal]]s, which are implicit in the instruction stream.
<<assembler>>=
fun prepare instrs =
let fun add_positions(done, inst::rest) =
add_positions( (ref 0, inst) :: done, rest)
| add_positions(done, nil) = done
val instrs' = add_positions(nil, instrs) (* reverse and add [[ref int]]s*)
fun passes(oldsize) =
(* make passes with no emission until size is stable*)
let val size = pass false (0,NONE,instrs')
in if size=oldsize then size
else passes size
end
in {size = passes 0, stream = instrs'}
end
fun assemble instrs =
pass true (0,NONE,#stream (prepare instrs))
<<functions that assemble [[instr]]s into code>>=
fun get (SOME x) = x
| get NONE = ErrorMsg.impossible "missing pcptr in mipscoder"
<<[[pcptr]] functions>>
<<single pass>>
<<assembler>>
fun codegen () = (
assemble (!kept);
<<reinitialize [[kept]]>>
)
@
The program counter pointer is a device that enables us to to addressing
relative to the pcp register, register 31.
The need for it arises when we want to access a data element which we know
only by its label.
The labels give us addresses relative to the beginning of the function,
but we can only use addresses relative to some register.
The answer is to set register~31 with a [[bltzal]] instruction,
then use that for addressing.
The function [[needs_a_pcptr]] determines when it is necessary
to have a known value in register~31.
That is, we need the program counter pointer
\begin{itemize}
\item
at [[NOP]] for a reason to be named later?
\item
at any operation that uses an effective address that refers to a label
(since all labels have to be relative to the program counter).
\item
BEQ's and BCOP1's to very far away,
since we have to compute the address for a JUMP
knowing the value of the program counter pointer.
\end{itemize}
<<[[pcptr]] functions>>=
fun needs_a_pcptr(_,SLT(_,Immedlab _,_)) = true
| needs_a_pcptr(_,ADD(_,Immedlab _,_)) = true
| needs_a_pcptr(_,AND(_,Immedlab _,_)) = true
| needs_a_pcptr(_,OR(_,Immedlab _,_)) = true
| needs_a_pcptr(_,XOR(_,Immedlab _,_)) = true
| needs_a_pcptr(_,MOVE(Immedlab _,_)) = true
| needs_a_pcptr(_,LOAD(_,_,Immedlab _,_)) = true
| needs_a_pcptr(_,STORE(_,_,Immedlab _,_)) = true
| needs_a_pcptr(_,SLL(Immedlab _,_,_)) = true
| needs_a_pcptr(_,SRA(Immedlab _,_,_)) = true
| needs_a_pcptr(1, BEQ _) = false (* small BEQ's dont need pcptr *)
| needs_a_pcptr(_, BEQ _) = true (* but large ones do *)
| needs_a_pcptr(1, BCOP1 _) = false (* small BCOP1's dont need pcptr *)
| needs_a_pcptr(_, BCOP1 _) = true (* but large ones do *)
| needs_a_pcptr _ = false
@
Creating the program counter pointer once, with a [[bltzal]], is not
enough; we have to invalidate the program counter pointer at every
label, since control could arrive at the label from God knows where, and
therefore we don't know what the program counter pointer is.
We use the function [[makepcptr]] to create a new program counter pointer
``on the fly'' while generating code for other [[instrs]].
(I chose not to create a special [[instr]] for [[bltzal]], which I
could have inserted at appropriate points in the instruction stream.)
To try and find an odd bug, I'm adding no-ops after each [[bltzal]].
I don't really believe they're necessary.
The function [[gen]], which generates the instructions (or computes
their size), takes three arguments.
Third: the list of instructions to be generated (paired with pointers
to their sizes); first: the position (in words) at which to generate
those instructions; second: the current value of the program counter
pointer (register~31), if known.
The mutual recursion between [[gen]] and [[makepcptr]] maintains
the program counter pointer.
[[gen]] invalidates it at labels, and calls [[makepcptr]] to create a valid
one when necessary (as determined by [[needs_a_pcptr]]).
<<single pass>>=
fun pass emit_now =
let fun makepcptr(i,x) =
(* may need to emit NOP for delay slot if next instr is branch *)
let val size = case x of ((_,BEQ _)::rest) => 2
| ((_,BCOP1 _)::rest) => 2
| _ => 1
in if emit_now then (emit(Opcodes.bltzal(0,0));
if size=2 then emit(Opcodes.add(0,0,0)) else ())
else ();
gen(i+size, SOME (i+2), x)
end
and gen(i,_,nil) = i
| gen(i, _, (_,DEFINE lab) :: rest) = (lab := i; gen(i,NONE, rest))
(* invalidate the pc pointer at labels *)
(* may want to do special fiddling with NOPs *)
| gen(pos, pcptr, x as ((sizeref as ref size, inst) :: rest)) =
if (pcptr=NONE andalso needs_a_pcptr(size, inst)) then makepcptr(pos,x)
else if emit_now
then
<<emit MIPS instructions>>
else
<<compute positions>>
in gen
end
@ \subsection{Generating the instructions}
Now we need to consider the nitty-gritty details of just what instructions
are generated for each [[instr]].
In early passes, we'll just need to know how many instructions are
required (and that number may change from pass to pass, so it must be
recomputed).
In the last pass, the sizes are stable (by definition), so we can look
at the sizes to see what instructions to generate.
We'll consider the [[instrs]] in groups, but first, here's the
way we will structure things:
<<compute positions>>=
let <<functions for computing sizes>>
val newsize = case inst of
<<cases for sizes to be computed>>
in if newsize > size then sizeref := newsize else ();
gen(pos+(!sizeref) (* BUGS -- was pos+size*),pcptr,rest)
end
<<emit MIPS instructions>>=
let fun gen1() = gen(pos+size,pcptr,rest)
(* generate the rest of the [[instr]]s *)
open Bits
open Opcodes
<<declare reserved registers [[tempreg]] and [[pcreg]]>>
<<functions for emitting instructions>>
in case inst of
<<cases of instructions to be emitted>>
end
@ When we get around to generating code, we may need to use a temporary
register.
For example, if we want to load into a register
an immediate constant that won't fit
into 16~bits, we will have to load the high-order part of the constant
with [[lui]], then use [[addi]] to add then the low-order part.
The MIPS assembler has a similar problem, and on page D-2 of
the MIPS book we notice that register~1 is reserved for the use of the
assembler.
So we do the same.
We need to reserve a second register for use in pointing to the program
counter.
We will use register 31 because the [[bltzal]] instruction automatically
sets register 31 to the PC.
<<declare reserved registers [[tempreg]] and [[pcreg]]>>=
val tempreg = 1
val pcreg = 31
@
Before showing the code for the actual instructions, we should
point out that
we have two different ways of emitting a long word.
[[emitlong]] just splits the bits into two pieces for those cases
when it's desirable to put a word into the memory image.
[[split]] gives something that will load correctly
when the high-order piece is loaded into a high-order halfword
(using [[lui]]),
and the low-order piece is sign-extended and then added to the
high-order piece.
This is the way we load immediate constants of more than sixteen bits.
It is also useful for generating load or store instructions with
offsets of more than sixteen bits: we [[lui]] the [[hi]] part and
add it to the base regsiter, then use the [[lo]] part as an offset.
<<functions for emitting instructions>>=
fun emitlong i = emit(rshift(i,16), andb(i,65535))
(* emit one long word (no sign fiddling) *)
fun split i = let val hi = rshift(i,16) and lo = andb(i,65535)
in if lo<32768 then (hi,lo) else (hi+1, lo-65536)
end
@ We begin implementing [[instrs]] by considering those that emit constants.
String constants are padded with nulls out to a word boundary.
Integer constants are just emitted with [[emitlong]].
<<cases for sizes to be computed>>=
STRINGCONST s => Integer.div(String.length(s)+3,4)
| EMITLONG _ => 1
<<cases of instructions to be emitted>>=
STRINGCONST s =>
let val s' = s ^ "\000\000\000\000"
in gen1(emit_string (4*size) s')
(* doesn't know Big vs Little-Endian *)
end
| EMITLONG i => gen1(emitlong i)
@
Next consider the labels.
A [[DEFINE]] should never reach this far, and [[EMITLAB]] is almost like
an [[EMITLONG]].
<<cases for sizes to be computed>>=
| DEFINE _ => ErrorMsg.impossible "generate code for DEFINE in mipscoder"
| EMITLAB _ => 1
<<cases of instructions to be emitted>>=
| DEFINE _ => gen1(ErrorMsg.impossible "generate code for DEFINE in mipscoder")
| EMITLAB(i, ref d) => gen1(emitlong((d-pos)*4+i))
@
Now we have to start worrying about instructions with [[EA]] in them.
The real difficulty these things present is that they may have an
immediate operand that won't fit in 16~bits.
So we'll need to get this large immediate operand into a register,
sixteen bits at a time, and then do the operation on the register.
Since all of the arithmetic instructions have this difficulty, and since
we can use them to implement the others, we'll start with those and
catch up with the control-flow instructions later.
@ [[SUB]], [[MULT]], [[DIV]], and [[MFLO]] all use registers only,
so they are easy.
The other arithmetic operations get treated exactly the same, so we'll
use a function to compute the size.
{\bf move this to follow the definition of [[arith]]?}
<<cases for sizes to be computed>>=
| ADD(_, ea, _) => easize ea
| AND(_, ea, _) => easize ea
| OR (_, ea, _) => easize ea
| XOR(_, ea, _) => easize ea
| SUB _ => 1
| DIV (_,_) => 1
| MULT (_,_) => 1
| MFLO _ => 1
| MFHI _ => 1
@ Register operations take one instruction.
Immediate operations take one instruction for 16~bit constants,
and 3 for larger constants (since it costs two instructions to load
a big immediate constant into a register).
An immediate instruction with [[Immedlab l]] means that the operand
is intended to be the machine address associated with that label.
To compute that address, we need to add
[[4*(l-pcptr)]] to the contents of
register~[[pcreg]] (which holds [[4*pcptr]]),
put the results in a register, and operate on that register.
This tells us enough to compute the sizes.
<<functions for computing sizes>>=
fun easize (Direct _) = 1
| easize (Immed i) = if abs(i)<32768 then 1 else 3
| easize (Immedlab(ref lab)) = 1 + easize(Immed (4*(lab-(get pcptr))))
@
As we have seen,
to implement any arithmetic operation, we need to know the register
form and the sixteen-bit immediate form.
We will also want the operator from [[instr]], since we do the
large immediate via a recursive call.
We'll set up a function, [[arith]], that does the job.
<<functions for emitting instructions>>=
fun arith (opr, rform, iform) =
let fun ar (Reg op1, Direct (Reg op2), Reg result) =
gen1(emit(rform(result,op1,op2)))
| ar (Reg op1, Immed op2, Reg result) =
(case size of
1 (* 16 bits *) => gen1(emit(iform(result,op1,op2)))
| 3 (* 32 bits *) =>
gen(pos,pcptr,
(ref 2, LDI_32(op2, Reg tempreg))::
(ref 1, opr(Reg op1, Direct(Reg tempreg), Reg result))::
rest)
| _ => gen(ErrorMsg.impossible
"bad size in arith Immed in mipscoder")
)
| ar (Reg op1, Immedlab (ref op2), Reg result) =
gen(pos, pcptr,
(ref (size-1),
ADD(Reg pcreg,Immed(4*(op2-(get pcptr))), Reg tempreg))::
(ref 1, opr(Reg op1, Direct(Reg tempreg), Reg result))::
rest)
in ar
end
@
The generation itself may be a bit anticlimactic.
The MIPS has no ``subtract immediate'' instruction, and [[SUB]] has
a different type than the others, so we emit it directly.
<<cases of instructions to be emitted>>=
| ADD stuff => arith (ADD,add,addi) stuff
| AND stuff => arith (AND,and',andi) stuff
| OR stuff => arith (OR,or,ori) stuff
| XOR stuff => arith (XOR,xor,xori) stuff
| SUB (Reg op1, Reg op2, Reg result) => gen1(emit(sub(result,op1,op2)))
| DIV (Reg op1, Reg op2) => gen1(emit(div(op1,op2)))
| MULT(Reg op1, Reg op2) => gen1(emit(mult(op1,op2)))
| MFLO(Reg result) => gen1(emit(mflo(result)))
| MFHI(Reg result) => gen1(emit(mfhi(result)))
@ Floating point arithmetic is pretty easy because we always do it in
registers.
We also support only one format, double precision.
<<cases for sizes to be computed>>=
| NEG_D _ => 1
| MUL_D _ => 1
| DIV_D _ => 1
| ADD_D _ => 1
| SUB_D _ => 1
@ When emitting instructions we have to remember the Mips instructions
use result on the left, but the [[MIPSCODER]] signature requires result
on the right.
<<cases of instructions to be emitted>>=
| NEG_D (Reg op1,Reg result) => gen1(emit(neg_fmt(D_fmt,result,op1)))
<<functions for emitting instructions>>=
fun float3double instruction (Reg op1,Reg op2,Reg result) =
gen1(emit(instruction(D_fmt,result,op1,op2)))
<<cases of instructions to be emitted>>=
| MUL_D x => float3double mul_fmt x
| DIV_D x => float3double div_fmt x
| ADD_D x => float3double add_fmt x
| SUB_D x => float3double sub_fmt x
@ We offer a separate [[MOVE]] instruction because of large immediate
constants.
It is always possible to do [[move(src,dest)]] by doing
[[add(Reg 0,src,dest)]], but the general form [[add(Reg i, Immed c, dest)]]
takes three instructions when [[c]] is a large constant (more than 16 bits).
Rather than clutter up the code for [[add]] (and [[or]] and [[xor]]) by
trying to recognize register~0, we provide [[move]] explicitly.
\indent [[LDI_32]] takes care of the particular case in which we are
loading a 32-bit immediate constant into a register.
It dates from the bad old days before [[MOVE]], and it might be a good idea
to remove it sometime.
<<functions for emitting instructions>>=
fun domove (Direct (Reg src), Reg dest) = gen1(emit(add(dest,src,0)))
| domove (Immed src, Reg dest) =
(case size of
1 (* 16 bits *) => gen1(emit(addi(dest,0,src)))
| 2 (* 32 bits *) =>
gen(pos,pcptr,(ref 2, LDI_32(src, Reg dest))::rest)
| _ => gen(ErrorMsg.impossible "bad size in domove Immed in mipscoder")
)
| domove (Immedlab (ref src), Reg dest) =
gen(pos, pcptr,
(ref size,
ADD(Reg pcreg,Immed(4*(src-(get pcptr))), Reg dest))::rest)
@ Notice we use [[easize]] and not [[movesize]] in the third clause
because when we reach this point the treatment of a [[MOVE]] is the same
as that of an [[ADD]].
<<functions for computing sizes>>=
fun movesize (Direct _) = 1
| movesize (Immed i) = if abs(i)<32768 then 1 else 2
| movesize (Immedlab(ref lab)) = easize(Immed (4*(lab-(get pcptr))))
<<cases for sizes to be computed>>=
| MOVE (src,_) => movesize src
| LDI_32 _ => 2
| LUI _ => 1
<<cases of instructions to be emitted>>=
| MOVE stuff => domove stuff
| LDI_32 (immedconst, Reg dest) =>
let val (hi,lo) = split immedconst
in gen1(emit(lui(dest,hi));emit(addi(dest,dest,lo)))
end
| LUI (Reg dest,immed16) => gen1(emit(lui(dest,immed16)))
@
Now that we've done arithmetic, we can see how to do control flow without
too much trouble.
[[SLT]] can be treated just like an arithmetic operator.
[[BEQ]] is simple if the address to which we branch is close enough.
Otherwise we use the following sequence for [[BEQ(Reg op1, Reg op2, ref dest)]]:
\begin{verbatim}
bne op1,op2,L
ADD (Reg pcreg, Immed (4*(dest-pcptr)), Reg tempreg)
jr tempreg
L: ...
\end{verbatim}
Notice we don't have to put a [[NOP]] in the delay slot of the [[bne]].
We don't need one after the jump unless we needed one after the
original [[BEQ]], in which case one will be there.
If the branch is taken, we're doing as well as we can.
If the branch is not taken, we will have executed an [[add]] or [[lui]] in the
delay slot of the [[bne]], but the results just get thrown away.
<<cases for sizes to be computed>>=
| SLT(_, ea, _) => easize ea
| BEQ(_,_,_,ref dest) =>
if abs((pos+1)-dest) < 32768 then 1 (* single instruction *)
else 2+easize (Immed (4*(dest-(get pcptr))))
| JUMP _ => 1
| SLT_D _ => 1
| SEQ_D _ => 1
| BCOP1(_,ref dest) =>
if abs((pos+1)-dest) < 32768 then 1 (* single instruction *)
else 2+easize (Immed (4*(dest-(get pcptr))))
| NOP => 1
@ The implementation is as described, except we use a
non-standard [[nop]].
There are many Mips instructions that have no effect, and the standard
one is the word with all zeroes ([[sll 0,0,0]]).
We use [[add]], adding 0 to 0 and store the result in 0, because it
will be easy to distinguish from a data word that happens to be zero.
<<cases of instructions to be emitted>>=
| SLT stuff => arith (SLT,slt,slti) stuff
| BEQ(b, Reg op1, Reg op2, ref dest) =>
if size = 1 then
gen1(emit((if b then beq else bne)(op1,op2,dest-(pos+1))))
else gen(pos,pcptr,
(ref 1, BEQ(not b, Reg op1, Reg op2, ref(pos+size)))
::(ref (size-2),
ADD(Reg pcreg, Immed(4*(dest-(get pcptr))), Reg tempreg))
::(ref 1, JUMP(Reg tempreg))
::rest)
| JUMP(Reg dest) => gen1(emit(jr(dest)))
| SLT_D (Reg op1, Reg op2) =>
gen1(emit(c_lt(D_fmt,op1,op2)))
| SEQ_D (Reg op1, Reg op2) =>
gen1(emit(c_seq(D_fmt,op1,op2)))
| BCOP1(b, ref dest) =>
let fun bc1f offset = cop1(8,0,offset)
fun bc1t offset = cop1(8,1,offset)
in if size = 1 then
gen1(emit((if b then bc1t else bc1f)(dest-(pos+1))))
else gen(pos,pcptr,
(ref 1, BCOP1(not b, ref(pos+size)))
::(ref (size-2),
ADD(Reg pcreg, Immed(4*(dest-(get pcptr))), Reg tempreg))
::(ref 1, JUMP(Reg tempreg))
::rest)
end
| NOP => gen1(emit(add(0,0,0))) (* one of the many MIPS no-ops *)
@
Our next problem is to tackle load and store.
The major difficulty is if the offset is too large to fit in
sixteen bits; if so, we have to create a new base register.
If we have [[Immedlab]], we do it as an offset from [[pcreg]].
<<functions for emitting instructions>>=
fun memop(rform,Reg dest, Direct (Reg base), offset) =
(case size
of 1 => gen1(emit(rform(dest,offset,base)))
| 3 => let val (hi,lo) = split offset
in gen1(emit(lui(tempreg,hi)); (* tempreg = hi @<< 16 *)
emit(add(tempreg,base,tempreg));(* tempreg += base *)
emit(rform(dest,lo,tempreg)) (* load dest,lo(tempreg) *)
)
end
| _ => gen1(ErrorMsg.impossible "bad size in memop Direct in mipscoder")
)
| memop(rform,Reg dest, Immed address, offset) =
(case size
of 1 => gen1(emit(rform(dest,offset+address,0)))
| 2 => let val (hi,lo) = split (offset+address)
in gen1(emit(lui(tempreg,hi));
emit(rform(dest,lo,tempreg))
)
end
| _ => gen1(ErrorMsg.impossible "bad size in memop Immed in mipscoder")
)
| memop(rform,Reg dest, Immedlab (ref lab), offset) =
memop(rform, Reg dest, Direct (Reg pcreg), offset+4*(lab - get pcptr))
@ The actual registers don't matter for computing sizes, and in fact
the value of [[pcreg]] is not visible here, so we use an arbitrary
register ([[Reg 0]]) to compute the size.
<<functions for computing sizes>>=
fun adrsize(_, Reg _, Direct _, offset) =
if abs(offset)<32768 then 1 else 3
| adrsize(_, Reg _, Immed address, offset) =
if abs(address+offset) < 32768 then 1 else 2
| adrsize(x, Reg dest, Immedlab (ref lab), offset) =
adrsize(x, Reg dest, Direct (Reg 0 (* pcreg in code *) ),
offset+4*(lab-(get pcptr)))
<<cases for sizes to be computed>>=
| LOAD x => adrsize x
| STORE x => adrsize x
<<cases of instructions to be emitted>>=
| LOAD (Byte,dest,address,offset) => memop(lbu,dest,address,offset)
| LOAD (Word,dest,address,offset) => memop(lw,dest,address,offset)
| LOAD (Floating,dest,address,offset) => memop(lwc1,dest,address,offset)
| STORE (Byte,dest,address,offset) => memop(sb,dest,address,offset)
| STORE (Word,dest,address,offset) => memop(sw,dest,address,offset)
| STORE (Floating,dest,address,offset) => memop(swc1,dest,address,offset)
@
For the shift instructions, only register and immediate operands
make sense.
Immediate operands make sense if and only if they are representable
in five bits.
If everything is right, these are single instructions.
<<cases for sizes to be computed>>=
| SLL _ => 1
| SRA _ => 1
<<cases of instructions to be emitted>>=
| SLL (Immed shamt, Reg op1, Reg result) => gen1(
if (shamt >= 0 andalso shamt < 32) then emit(sll(result,op1,shamt))
else ErrorMsg.impossible ("bad sll shamt "
^ (Integer.makestring shamt) ^ " in mipscoder"))
| SLL (Direct(Reg shamt), Reg op1, Reg result) =>
gen1(emit(sllv(result,op1,shamt)))
| SLL (Immedlab _,_,_) => ErrorMsg.impossible "sll shamt is Immedlab in mipscoder"
| SRA (Immed shamt, Reg op1, Reg result) => gen1(
if (shamt >= 0 andalso shamt < 32) then emit(sra(result,op1,shamt))
else ErrorMsg.impossible ("bad sra shamt "
^ (Integer.makestring shamt) ^ " in mipscoder"))
| SRA (Direct(Reg shamt), Reg op1, Reg result) =>
gen1(emit(srav(result,op1,shamt)))
| SRA (Immedlab _,_,_) => ErrorMsg.impossible "sra shamt is Immedlab in mipscoder"
@
Finally, comments are ignored, and marks (backpointers) are written into the
instruction stream.
Comments are used by the front end to give diagnostics.
In the bad old days we would have had two different [[MIPSCODER]]s, one
which generated machine code (and ignored comments), and one which
wrote out assembly code (and copied comments).
Today we have just one, which means the rerouting of comments takes place
at a much higher level. Look in [[cps/mipsglue.nw]].
<<cases for sizes to be computed>>=
| COMMENT _ => 0
| MARK => 1 (* backpointer takes one word *)
| BREAK _ => 1 (* break instruction *)
@ Just for the record, here's the description of what a mark (backpointer)
is.
``Take the byte address at which the mark resides and add 4, giving
the byte address of the object following the mark.
(That object is the marked object.)
Subtract the byte address of the initial word that marks the
start of this instruction stream.
Now divide by 4, giving the distance in words between the
beginning of the block and the marked object.
Take that quantity and shift it left by multiplying by [[power_tags]],
and indicate the result is a mark by adding the tag bits [[tag_backptr]]
into the low order part.''
[[pos+1]] is exactly the required distance in words.
<<cases of instructions to be emitted>>=
| COMMENT _ => gen1()
| MARK => gen1(
let open System.Tags
in emitlong((pos+1) * power_tags + tag_backptr)
end)
| BREAK n => gen1(
if n < 0 orelse n > 32 then ErrorMsg.impossible "bad break code"
else emit(break n))
@
\subsection{Optimization}
The first step towards optimization is to take statistics.
We will count: [[instrs]], Mips words, [[NOP]]s in load and branch delays,
and [[bltzal]]s.
In the current implementation the [[bltzal]]s are implicit, so there
is no way to count them or optimize them.
<<statistics>>=
fun printstats stream
{inst : int, code : int, data : int,
load : int, branch : int, compare : int, size : int} =
let val print = output stream
val nop = load+branch+compare
val bltzal = size - (code + data)
val code = code + bltzal
<<definition of [[sprintf]]>>
fun P x = substring(makestring(100.0 * x),0,4) (* percent *)
fun printf f d = print (sprintf f d)
in printf ["Counted "," instrs in "," words (",
" code, "," data)\n" ^
"Used "," NOPs ("," load, "," branch,"," compare) and "," bltzals\n" ^
"","% of code words were NOPs; ","% were bltzals\n" ^
"","% of all words were code; ","% of all words were NOPs\n"]
[I inst, I size, I code, I data,
I nop, I load, I branch, I compare, I bltzal,
P (real nop / real code), P (real bltzal / real code),
P (real code / real size), P (real nop / real size)]
handle Overflow => print "[Overflow in computing Mips stats]\n"
end
<<statistics>>=
<<definition of [[iscode]]>>
fun addstats (counts as {inst,code,data,load,branch,compare}) =
fn nil => counts
| (sizeref,first)::(_,NOP)::rest => addstats
{inst=inst+2, code=code+(!sizeref)+1, data=data,
load=load+ (case first of LOAD _ => 1 | _ => 0),
branch=branch +(case first of BEQ _ => 1 | JUMP _ => 1
| BCOP1 _ => 1 | _ => 0),
compare=compare+(case first of SLT_D _ => 1 | SEQ_D _ => 1
| _ => 0)
} rest
| (sizeref,first)::rest => addstats
{inst=inst+1,
code = code + if iscode(first) then !sizeref else 0,
data = data + if not (iscode first) then !sizeref else 0,
load=load,
branch=branch,
compare=compare
} rest
fun codestats outfile =
let val {size,stream=instrs} = prepare (!kept)
val zero = {inst=0, code=0, data=0, load=0, branch=0, compare=0}
val counts as {inst,code,data,load,branch,compare} =
addstats zero instrs
in printstats outfile
{inst=inst,code=code,data=data,
load=load,branch=branch,compare=compare,size=size}
end
<<definition of [[iscode]]>>=
val iscode = fn
STRINGCONST _ => false
| EMITLONG _ => false
| DEFINE _ => false
| EMITLAB _ => false
| SLT _ => true
| BEQ _ => true
| JUMP _ => true
| NOP => true
| SLT_D _ => true
| SEQ_D _ => true
| BCOP1 _ => true
| ADD _ => true
| AND _ => true
| OR _ => true
| XOR _ => true
| SUB _ => true
| MULT _ => true
| DIV _ => true
| MFLO _ => true
| MFHI _ => true
| NEG_D _ => true
| MUL_D _ => true
| DIV_D _ => true
| ADD_D _ => true
| SUB_D _ => true
| MOVE _ => true
| LDI_32 _ => true
| LUI _ => true
| LOAD _ => true
| STORE _ => true
| SLL _ => true
| SRA _ => true
| COMMENT _ => false
| MARK => false
| BREAK _ => true
<<definition of [[sprintf]]>>=
val I = Integer.makestring
val R = Real.makestring
exception Printf
fun sprintf format values =
let fun merge([x],nil) = [x]
| merge(nil,nil) = nil
| merge(x::y,z::w) = x::z:: merge(y,w)
| merge _ = raise Printf
in implode(merge(format,values))
end
@
At the moment these functions are meaningless junk.
<<functions that remove pipeline bubbles>>=
val rec squeeze =
fn (x as LOAD(_,Reg d, m, i))::NOP::instr::rest =>
if use(instr,d) then ??
else squeeze(x::instr::rest)
| (x as STORE _)::(y as LOAD _)::rest =>
x :: squeeze(y::rest)
| instr::(x as LOAD(_,Reg d, Direct(Reg s), i))::NOP::rest =>
if use(instr, d) orelse gen(instr, s) then ??
else squeeze(x::instr::rest)
| instr::(x as LOAD(_,Reg d, _, i))::NOP::rest =>
if use(instr,d) then ??
else squeeze(x::instr::rest)
| (x as MFLO _):: (y as MULDIV _) :: rest =>
x :: squeeze (y::rest)
| (x as MFLO(Reg d))::instr::rest =>
if (use(instr,d) orelse gen(instr,d) then ??
else squeeze(instr::x::rest)
| instr :: (x as MULDIV(Reg a, Reg b)) :: rest =>
if gen(instr,a) orelse gen(instr,b) then ??
else squeeze(x::instr::rest)
val rec final =
fn
| instr::(x as LOAD(_,Reg d, Direct(Reg s), i))::NOP::rest =>
if gen(instr, s) then instr::final(x::NOP::rest)
else x::instr::(final rest)
| instr :: (x as JUMP _) :: NOP :: rest =>
x :: instr :: final rest
| instr :: (x as BEQ(_,Reg a, Reg b, _)) :: NOP :: rest =>
if gen(instr,a) orelse gen(instr,b) then instr::x::NOP::(final rest)
else x::instr::(final rest)
@
\section{Indices}
\subsection{Chunks}
\nowebchunks
\subsection{Identifiers}
\nowebindex
|