File: mlrisc-ir-rep.tex

package info (click to toggle)
mlton 20210117%2Bdfsg-3
links: PTS, VCS
area: main
in suites: sid
size: 58,464 kB
sloc: ansic: 27,682; sh: 4,455; asm: 3,569; lisp: 2,879; makefile: 2,347; perl: 1,169; python: 191; pascal: 68; javascript: 7
file content (158 lines) | stat: -rw-r--r-- 7,481 bytes
parent folder | download | duplicates (5)
\section{MLRISC Intermediate Representation}
   The MLRISC intermediate language is called 
   \newdef{MLTREE} At the lowest level, the core of MLTREE is a 
    \italics{Register Transfer Language (RTL)} 
   but represented in tree form. The tree
   form makes it convenient to use tree pattern matching tools like
   BURG (where appropriate) to do target instruction selection. Thus a
   tree such as: 

   \begin{SML}  
  MV(32, t, 
     ADDT(32, MULT(32, REG(32, b), REG(32, b)),
              MULT(32, MULT(REG(32, a), LI(4)), REG(32, c))))
   \end{SML}

   computes \sml{t := b*b + 4*a*c} to 32-bit precision. 
   The nodes \sml{ADDT} and
   \sml{MULT} are the trapping form of addition and multiplication,
   and \sml{LI} is used for integer constants. An infinite number
   of registers are assumed by the model, however depending on the
   target machine the first \sml{0..K} registers map onto the first
   \sml{K} registers on the target machine. Everything else is
   assumed to be a pseudo-register. The \sml{REG} node is used to
   indicate a  general purpose register. 

   
   The core MLTREE language makes no assumptions about instructions or
   calling convections of the target architecture. Trees can be
   created and combined in almost any form, with certain meaningless
   trees such as \sml{LOAD(32, FLOAD(64, LI 0))} being forbidden by the
   MLTREE type structure.

    Such pure trees are nice but inadequate in real compilers. One
   needs to be able to propagate front end specific information, such
   as frame sizes and frame offsets where the actual values are only
   available after register allocation and spilling. One could add
   support for frames in MLRISC, however this becomes a slippery slope
   because some compilers (e.g. SML/NJ) do not have a conventional
   notion of frames --- indeed there is no runtime stack in the
   execution of SML/NJ. A frame organization for one person may not
   meet the needs for another, and so on.  In MLRISC, the special
   requirements of different compilers is communicated into the MLTREE
   language, and subsequently into the optimizations phases, by
   specializing the MLTREE data structure with client specific
   information. There are currently \emph{five} dimensions over
   which one could specialize the MLTREE language.

  \begin{description} 
    \item[Constants] Constants are an
    abstraction for integer literals whose value is known after
    certain phases of code generation. Frame sizes and offsets are an
    example.  
    \image{MLRISC intermediate representation}{pictures/png/mlrisc-ir.png}{align=right}
    \item[Regions] While the data
    dependencies between arithmetic operations is implicit in the
    instruction, the data dependencies between memory operations is
    not. Regions are an abstract view of memory that make this
    dependence explicit and is specially useful for instruction
    reordering. 

    \item[Pseudo-ops] Pseudo-ops are
    intended to correspond to pseudo-op directives provided by native
    assemblers to lay out data, jump tables, and perform alignment.

    \item[Annotations]
    \href{annotations.html}{Annotations} are used
    for injecting semantics and other program information from the front-end 
    into the backend.  For example, a probability annotation can be
    attached to a branch instruction.  Similarly, line number annotations
    can be attached to basic blocks to aid debugging.   
    In many language implementations function local variables are
    spilled to activation frames on the stack. Spill slots contribute
    to the size of a function's frame. When an instruction produces a
    spill, we may need to update the frame associated to that
    instruction (increase the size of its spilling area). The frame
    for the current function can be injected in an annotation, which
    can be later examined by the spill callback during register allocation. 

     Annotations are
    implemented as an universal type and can be arbitrarily extended.
    Individual annotations can be associated
    with compiler objects of varying granularity, 
    from compilation units, to regions, basic blocks, flow edges,
    and down to the instructions.


    \item[User Defined Extensions]
    In the most extreme case, the basic constructors defined in the MLTREE
    language may be inadequate for the task at hand.  
    MLTREE allows the client to arbitrarily extend
    the set of statements and expressions to more closely match the
    source language and the target architecture(s). 
    
     For example, when using MLRISC for the backend of a DSP compiler 
     it may be useful to extend the set of MLRISC operators to include 
     fix point and saturated arithmetic.  
     Similarly, when developing a language for loop parallelization, it may
     be useful to extend the MLTREE language with higher-level loop 
     constructs.
  \end{description} 

\subsection{Examples}
   
   In the SML/NJ compiler, an encoding of a list of registers
   is passed to the garbage collector as the roots of live
   variables. This encoding cannot be computed until register
   allocation has been performed, therefore the integer literal
   encoding is represented as an abstract 
   \href{constants.html}{constant}.

    Again, in the SML/NJ compiler, most stores are for initializing 
   records in the allocation space, therefore representing every slot in
   the allocation space as a unique region allows one to commute
   most store instructions. Similarly, most loads are from
   \emph{immutable} records, and a simple analysis marks these are
   being accesses to \emph{read-only} memory. Read-only memory is
   characterized as having multiple \emph{uses} but no
   \emph{definitions}.

    In the TIL compiler, a \emph{trace table} is generated for
   every call site that records the set of live variables, their
   location (register or stack offset), and the type associated with
   the variable. This table is integrated into the program using the
   abstract pseudo-op mechanism. An interesting aspect of these tables
   is that they may need adjustment based on the results of register
   spilling.

    The more convention use of the psuedo-op abstraction is to
   propagate function prologue and epilogue information.

    The constants abstraction are created by a tree node called
   \sml{CONST}. In the SML/NJ compiler, the tree that communicates
   garbage collection information looks like:

\begin{verbatim}
   MV(32, maskReg, CONST{r110,r200,r300,r400 ...})
\end{verbatim}

  where \sml{maskReg} is a dedicated register. On the DEC Alpha,
  this would get translated to:

\begin{verbatim}
   LDA maskReg, {encode(r110,r200,r300,r400, ...)}
\end{verbatim}

   which indicates that the alpha instruction set (and optimization
   suite) know about these types of values. Further, after
   register allocation, the \sml{LDA} instruction may not be
   sufficient as the encoding may result in a value that is too large
   as an operand to \sml{LDA}. Two instructions may ultimately be
   required to load the encoding into the \sml{maskReg}
   register. This expansion is done during 
   \href{span-dep.html}{span-dependency resolution}.

    All these examples are intended to indicate that one
   intermediate representation and optimization suite does not fit
   all, but that the intermediate representation and optimization
   suite needs to be specialized to the needs of the client.