1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195
|
===========================
QBE Intermediate Language
===========================
- Table of Contents
-------------------
1. <@ Basic Concepts >
* <@ Input Files >
* <@ BNF Notation >
* <@ Sigils >
* <@ Spacing >
2. <@ Types >
* <@ Simple Types >
* <@ Subtyping >
3. <@ Constants and Vals >
4. <@ Linkage >
5. <@ Definitions >
* <@ Aggregate Types >
* <@ Data >
* <@ Functions >
6. <@ Control >
* <@ Blocks >
* <@ Jumps >
7. <@ Instructions >
* <@ Arithmetic and Bits >
* <@ Memory >
* <@ Comparisons >
* <@ Conversions >
* <@ Cast and Copy >
* <@ Call >
* <@ Variadic >
* <@ Phi >
8. <@ Instructions Index >
- 1. Basic Concepts
-------------------
The intermediate language (IL) is a higher-level language
than the machine's assembly language. It smoothes most
of the irregularities of the underlying hardware and
allows an infinite number of temporaries to be used.
This higher abstraction level lets frontend programmers
focus on language design issues.
~ Input Files
~~~~~~~~~~~~~
The intermediate language is provided to QBE as text.
Usually, one file is generated per each compilation unit from
the frontend input language. An IL file is a sequence of
<@ Definitions > for data, functions, and types. Once
processed by QBE, the resulting file can be assembled and
linked using a standard toolchain (e.g., GNU binutils).
Here is a complete "Hello World" IL file which defines a
function that prints to the screen. Since the string is
not a first class object (only the pointer is) it is
defined outside the function's body. Comments start with
a # character and finish with the end of the line.
# Define the string constant.
data $str = { b "hello world", b 0 }
export function w $main() {
@start
# Call the puts function with $str as argument.
%r =w call $puts(l $str)
ret 0
}
If you have read the LLVM language reference, you might
recognize the example above. In comparison, QBE makes a
much lighter use of types and the syntax is terser.
~ BNF Notation
~~~~~~~~~~~~~~
The language syntax is vaporously described in the sections
below using BNF syntax. The different BNF constructs used
are listed below.
* Keywords are enclosed between quotes;
* `... | ...` expresses alternatives;
* `( ... )` groups syntax;
* `[ ... ]` marks the nested syntax as optional;
* `( ... ),` designates a comma-separated list of the
enclosed syntax;
* `...*` and `...+` are used for arbitrary and
at-least-once repetition respectively.
~ Sigils
~~~~~~~~
The intermediate language makes heavy use of sigils, all
user-defined names are prefixed with a sigil. This is
to avoid keyword conflicts, and also to quickly spot the
scope and nature of identifiers.
* `:` is for user-defined <@ Aggregate Types>
* `$` is for globals (represented by a pointer)
* `%` is for function-scope temporaries
* `@` is for block labels
In this BNF syntax, we use `?IDENT` to designate an identifier
starting with the sigil `?`.
~ Spacing
~~~~~~~~~
`bnf
NL := '\n'+
Individual tokens in IL files must be separated by one or
more spacing characters. Both spaces and tabs are recognized
as spacing characters. In data and type definitions, newlines
may also be used as spaces to prevent overly long lines. When
exactly one of two consecutive tokens is a symbol (for example
`,` or `=` or `{`), spacing may be omitted.
- 2. Types
----------
~ Simple Types
~~~~~~~~~~~~~~
`bnf
BASETY := 'w' | 'l' | 's' | 'd' # Base types
EXTTY := BASETY | 'b' | 'h' # Extended types
The IL makes minimal use of types. By design, the types
used are restricted to what is necessary for unambiguous
compilation to machine code and C interfacing. Unlike LLVM,
QBE is not using types as a means to safety; they are only
here for semantic purposes.
The four base types are `w` (word), `l` (long), `s` (single),
and `d` (double), they stand respectively for 32-bit and
64-bit integers, and 32-bit and 64-bit floating-point numbers.
There are no pointer types available; pointers are typed
by an integer type sufficiently wide to represent all memory
addresses (e.g., `l` on 64-bit architectures). Temporaries
in the IL can only have a base type.
Extended types contain base types plus `b` (byte) and `h`
(half word), respectively for 8-bit and 16-bit integers.
They are used in <@ Aggregate Types> and <@ Data> definitions.
For C interfacing, the IL also provides user-defined aggregate
types as well as signed and unsigned variants of the sub-word
extended types. Read more about these types in the
<@ Aggregate Types > and <@ Functions > sections.
~ Subtyping
~~~~~~~~~~~
The IL has a minimal subtyping feature, for integer types only.
Any value of type `l` can be used in a `w` context. In that
case, only the 32 least significant bits of the word value
are used.
Make note that it is the opposite of the usual subtyping on
integers (in C, we can safely use an `int` where a `long`
is expected). A long value cannot be used in word context.
The rationale is that a word can be signed or unsigned, so
extending it to a long could be done in two ways, either
by zero-extension, or by sign-extension.
- 3. Constants and Vals
-----------------------
`bnf
CONST :=
['-'] NUMBER # Decimal integer
| 's_' FP # Single-precision float
| 'd_' FP # Double-precision float
| $IDENT # Global symbol
DYNCONST :=
CONST
| 'thread' $IDENT # Thread-local symbol
VAL :=
DYNCONST
| %IDENT
Constants come in two kinds: compile-time constants and
dynamic constants. Dynamic constants include compile-time
constants and other symbol variants that are only known at
program-load time or execution time. Consequently, dynamic
constants can only occur in function bodies.
The representation of integers is two's complement.
Floating-point numbers are represented using the
single-precision and double-precision formats of the
IEEE 754 standard.
Constants specify a sequence of bits and are untyped.
They are always parsed as 64-bit blobs. Depending on
the context surrounding a constant, only some of its
bits are used. For example, in the program below, the
two variables defined have the same value since the first
operand of the subtraction is a word (32-bit) context.
%x =w sub -1, 0
%y =w sub 4294967295, 0
Because specifying floating-point constants by their bits
makes the code less readable, syntactic sugar is provided
to express them. Standard scientific notation is prefixed
with `s_` and `d_` for single and double precision numbers
respectively. Once again, the following example defines twice
the same double-precision constant.
%x =d add d_0, d_-1
%y =d add d_0, -4616189618054758400
Global symbols can also be used directly as constants;
they will be resolved and turned into actual numeric
constants by the linker.
When the `thread` keyword prefixes a symbol name, the
symbol's numeric value is resolved at runtime in the
thread-local storage.
Vals are used as arguments in regular, phi, and jump
instructions within function definitions. They are
either constants or function-scope temporaries.
- 4. Linkage
------------
`bnf
LINKAGE :=
'export' [NL]
| 'thread' [NL]
| 'section' SECNAME [NL]
| 'section' SECNAME SECFLAGS [NL]
SECNAME := '"' .... '"'
SECFLAGS := '"' .... '"'
Function and data definitions (see below) can specify
linkage information to be passed to the assembler and
eventually to the linker.
The `export` linkage flag marks the defined item as
visible outside the current file's scope. If absent,
the symbol can only be referred to locally. Functions
compiled by QBE and called from C need to be exported.
The `thread` linkage flag can only qualify data
definitions. It mandates that the object defined is
stored in thread-local storage. Each time a runtime
thread starts, the supporting platform runtime is in
charge of making a new copy of the object for the
fresh thread. Objects in thread-local storage must
be accessed using the `thread $IDENT` syntax, as
specified in the <@ Constants and Vals > section.
A `section` flag can be specified to tell the linker to
put the defined item in a certain section. The use of
the section flag is platform dependent and we refer the
user to the documentation of their assembler and linker
for relevant information.
section ".init_array"
data $.init.f = { l $f }
The section flag can be used to add function pointers to
a global initialization list, as depicted above. Note
that some platforms provide a BSS section that can be
used to minimize the footprint of uniformly zeroed data.
When this section is available, QBE will automatically
make use of it and no section flag is required.
The section and export linkage flags should each appear
at most once in a definition. If multiple occurrences
are present, QBE is free to use any.
- 5. Definitions
----------------
Definitions are the essential components of an IL file.
They can define three types of objects: aggregate types,
data, and functions. Aggregate types are never exported
and do not compile to any code. Data and function
definitions have file scope and are mutually recursive
(even across IL files). Their visibility can be controlled
using linkage flags.
~ Aggregate Types
~~~~~~~~~~~~~~~~~
`bnf
TYPEDEF :=
# Regular type
'type' :IDENT '=' ['align' NUMBER]
'{'
( SUBTY [NUMBER] ),
'}'
| # Union type
'type' :IDENT '=' ['align' NUMBER]
'{'
(
'{'
( SUBTY [NUMBER] ),
'}'
)+
'}'
| # Opaque type
'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}'
SUBTY := EXTTY | :IDENT
Aggregate type definitions start with the `type` keyword.
They have file scope, but types must be defined before being
referenced. The inner structure of a type is expressed by a
comma-separated list of types enclosed in curly braces.
type :fourfloats = { s, s, d, d }
For ease of IL generation, a trailing comma is tolerated by
the parser. In case many items of the same type are
sequenced (like in a C array), the shorter array syntax
can be used.
type :abyteandmanywords = { b, w 100 }
By default, the alignment of an aggregate type is the
maximum alignment of its members. The alignment can be
explicitly specified by the programmer.
type :cryptovector = align 16 { w 4 }
Union types allow the same chunk of memory to be used with
different layouts. They are defined by enclosing multiple
regular aggregate type bodies in a pair of curly braces.
Size and alignment of union types are set to the maximum size
and alignment of each variation or, in the case of alignment,
can be explicitly specified.
type :un9 = { { b } { s } }
Opaque types are used when the inner structure of an
aggregate cannot be specified; the alignment for opaque
types is mandatory. They are defined simply by enclosing
their size between curly braces.
type :opaque = align 16 { 32 }
~ Data
~~~~~~
`bnf
DATADEF :=
LINKAGE*
'data' $IDENT '=' ['align' NUMBER]
'{'
( EXTTY DATAITEM+
| 'z' NUMBER ),
'}'
DATAITEM :=
$IDENT ['+' NUMBER] # Symbol and offset
| '"' ... '"' # String
| CONST # Constant
Data definitions express objects that will be emitted in the
compiled file. Their visibility and location in the compiled
artifact are controlled with linkage flags described in the
<@ Linkage > section.
They define a global identifier (starting with the sigil
`$`), that will contain a pointer to the object specified
by the definition.
Objects are described by a sequence of fields that start with
a type letter. This letter can either be an extended type,
or the `z` letter. If the letter used is an extended type,
the data item following specifies the bits to be stored in
the field. When several data items follow a letter, they
initialize multiple fields of the same size.
The members of a struct will be packed. This means that
padding has to be emitted by the frontend when necessary.
Alignment of the whole data objects can be manually specified,
and when no alignment is provided, the maximum alignment from
the platform is used.
When the `z` letter is used the number following indicates
the size of the field; the contents of the field are zero
initialized. It can be used to add padding between fields
or zero-initialize big arrays.
Here are various examples of data definitions.
# Three 32-bit values 1, 2, and 3
# followed by a 0 byte.
data $a = { w 1 2 3, b 0 }
# A thousand bytes 0 initialized.
data $b = { z 1000 }
# An object containing two 64-bit
# fields, one with all bits sets and the
# other containing a pointer to the
# object itself.
data $c = { l -1, l $c }
~ Functions
~~~~~~~~~~~
`bnf
FUNCDEF :=
LINKAGE*
'function' [ABITY] $IDENT '(' (PARAM), ')' [NL]
'{' NL
BLOCK+
'}'
PARAM :=
ABITY %IDENT # Regular parameter
| 'env' %IDENT # Environment parameter (first)
| '...' # Variadic marker (last)
SUBWTY := 'sb' | 'ub' | 'sh' | 'uh' # Sub-word types
ABITY := BASETY | SUBWTY | :IDENT
Function definitions contain the actual code to emit in
the compiled file. They define a global symbol that
contains a pointer to the function code. This pointer
can be used in `call` instructions or stored in memory.
The type given right before the function name is the
return type of the function. All return values of this
function must have this return type. If the return
type is missing, the function must not return any value.
The parameter list is a comma separated list of
temporary names prefixed by types. The types are used
to correctly implement C compatibility. When an argument
has an aggregate type, a pointer to the aggregate is passed
by the caller. In the example below, we have to use a load
instruction to get the value of the first (and only)
member of the struct.
type :one = { w }
function w $getone(:one %p) {
@start
%val =w loadw %p
ret %val
}
If a function accepts or returns values that are smaller
than a word, such as `signed char` or `unsigned short` in C,
one of the sub-word type must be used. The sub-word types
`sb`, `ub`, `sh`, and `uh` stand, respectively, for signed
and unsigned 8-bit values, and signed and unsigned 16-bit
values. Parameters associated with a sub-word type of bit
width N only have their N least significant bits set and
have base type `w`. For example, the function
function w $addbyte(w %a, sb %b) {
@start
%bw =w extsb %b
%val =w add %a, %bw
ret %val
}
needs to sign-extend its second argument before the
addition. Dually, return values with sub-word types do
not need to be sign or zero extended.
If the parameter list ends with `...`, the function is
a variadic function: it can accept a variable number of
arguments. To access the extra arguments provided by
the caller, use the `vastart` and `vaarg` instructions
described in the <@ Variadic > section.
Optionally, the parameter list can start with an
environment parameter `env %e`. This special parameter is
a 64-bit integer temporary (i.e., of type `l`). If the
function does not use its environment parameter, callers
can safely omit it. This parameter is invisible to a C
caller: for example, the function
export function w $add(env %e, w %a, w %b) {
@start
%c =w add %a, %b
ret %c
}
must be given the C prototype `int add(int, int)`.
The intended use of this feature is to pass the
environment pointer of closures while retaining a
very good compatibility with C. The <@ Call > section
explains how to pass an environment parameter.
Since global symbols are defined mutually recursive,
there is no need for function declarations: a function
can be referenced before its definition.
Similarly, functions from other modules can be used
without previous declaration. All the type information
necessary to compile a call is in the instruction itself.
The syntax and semantics for the body of functions
are described in the <@ Control > section.
- 6. Control
------------
The IL represents programs as textual transcriptions of
control flow graphs. The control flow is serialized as
a sequence of blocks of straight-line code which are
connected using jump instructions.
~ Blocks
~~~~~~~~
`bnf
BLOCK :=
@IDENT NL # Block label
( PHI NL )* # Phi instructions
( INST NL )* # Regular instructions
JUMP NL # Jump or return
All blocks have a name that is specified by a label at
their beginning. Then follows a sequence of instructions
that have "fall-through" flow. Finally one jump terminates
the block. The jump can either transfer control to another
block of the same function or return; jumps are described
further below.
The first block in a function must not be the target of
any jump in the program. If a jump to the function start
is needed, the frontend must insert an empty prelude block
at the beginning of the function.
When one block jumps to the next block in the IL file,
it is not necessary to write the jump instruction, it
will be automatically added by the parser. For example
the start block in the example below jumps directly
to the loop block.
function $loop() {
@start
@loop
%x =w phi @start 100, @loop %x1
%x1 =w sub %x, 1
jnz %x1, @loop, @end
@end
ret
}
~ Jumps
~~~~~~~
`bnf
JUMP :=
'jmp' @IDENT # Unconditional
| 'jnz' VAL, @IDENT, @IDENT # Conditional
| 'ret' [VAL] # Return
| 'hlt' # Termination
A jump instruction ends every block and transfers the
control to another program location. The target of
a jump must never be the first block in a function.
The three kinds of jumps available are described in
the following list.
1. Unconditional jump.
Simply jumps to another block of the same function.
2. Conditional jump.
When its word argument is non-zero, it jumps to its
first label argument; otherwise it jumps to the other
label. The argument must be of word type; because of
subtyping a long argument can be passed, but only its
least significant 32 bits will be compared to 0.
3. Function return.
Terminates the execution of the current function,
optionally returning a value to the caller. The value
returned must be of the type given in the function
prototype. If the function prototype does not specify
a return type, no return value can be used.
4. Program termination.
Terminates the execution of the program with a
target-dependent error. This instruction can be used
when it is expected that the execution never reaches
the end of the block it closes; for example, after
having called a function such as `exit()`.
- 7. Instructions
-----------------
Instructions are the smallest piece of code in the IL, they
form the body of <@ Blocks >. The IL uses a three-address
code, which means that one instruction computes an operation
between two operands and assigns the result to a third one.
An instruction has both a name and a return type, this
return type is a base type that defines the size of the
instruction's result. The type of the arguments can be
unambiguously inferred using the instruction name and the
return type. For example, for all arithmetic instructions,
the type of the arguments is the same as the return type.
The two additions below are valid if `%y` is a word or a long
(because of <@ Subtyping >).
%x =w add 0, %y
%z =w add %x, %x
Some instructions, like comparisons and memory loads
have operand types that differ from their return types.
For instance, two floating points can be compared to give a
word result (0 if the comparison succeeds, 1 if it fails).
%c =w cgts %a, %b
In the example above, both operands have to have single type.
This is made explicit by the instruction suffix.
The types of instructions are described below using a short
type string. A type string specifies all the valid return
types an instruction can have, its arity, and the type of
its arguments depending on its return type.
Type strings begin with acceptable return types, then
follows, in parentheses, the possible types for the arguments.
If the N-th return type of the type string is used for an
instruction, the arguments must use the N-th type listed for
them in the type string. When an instruction does not have a
return type, the type string only contains the types of the
arguments.
The following abbreviations are used.
* `T` stands for `wlsd`
* `I` stands for `wl`
* `F` stands for `sd`
* `m` stands for the type of pointers on the target; on
64-bit architectures it is the same as `l`
For example, consider the type string `wl(F)`, it mentions
that the instruction has only one argument and that if the
return type used is long, the argument must be of type double.
~ Arithmetic and Bits
~~~~~~~~~~~~~~~~~~~~~
* `add`, `sub`, `div`, `mul` -- `T(T,T)`
* `neg` -- `T(T)`
* `udiv`, `rem`, `urem` -- `I(I,I)`
* `or`, `xor`, `and` -- `I(I,I)`
* `sar`, `shr`, `shl` -- `I(I,ww)`
The base arithmetic instructions in the first bullet are
available for all types, integers and floating points.
When `div` is used with word or long return type, the
arguments are treated as signed. The unsigned integral
division is available as `udiv` instruction. When the
result of a division is not an integer, it is truncated
towards zero.
The signed and unsigned remainder operations are available
as `rem` and `urem`. The sign of the remainder is the same
as the one of the dividend. Its magnitude is smaller than
the divisor one. These two instructions and `udiv` are only
available with integer arguments and result.
Bitwise OR, AND, and XOR operations are available for both
integer types. Logical operations of typical programming
languages can be implemented using <@ Comparisons > and
<@ Jumps >.
Shift instructions `sar`, `shr`, and `shl`, shift right or
left their first operand by the amount from the second
operand. The shifting amount is taken modulo the size of
the result type. Shifting right can either preserve the
sign of the value (using `sar`), or fill the newly freed
bits with zeroes (using `shr`). Shifting left always
fills the freed bits with zeroes.
Remark that an arithmetic shift right (`sar`) is only
equivalent to a division by a power of two for non-negative
numbers. This is because the shift right "truncates"
towards minus infinity, while the division truncates
towards zero.
~ Memory
~~~~~~~~
* Store instructions.
* `stored` -- `(d,m)`
* `stores` -- `(s,m)`
* `storel` -- `(l,m)`
* `storew` -- `(w,m)`
* `storeh` -- `(w,m)`
* `storeb` -- `(w,m)`
Store instructions exist to store a value of any base type
and any extended type. Since halfwords and bytes are not
first class in the IL, `storeh` and `storeb` take a word
as argument. Only the first 16 or 8 bits of this word will
be stored in memory at the address specified in the second
argument.
* Load instructions.
* `loadd` -- `d(m)`
* `loads` -- `s(m)`
* `loadl` -- `l(m)`
* `loadsw`, `loaduw` -- `I(mm)`
* `loadsh`, `loaduh` -- `I(mm)`
* `loadsb`, `loadub` -- `I(mm)`
For types smaller than long, two variants of the load
instruction are available: one will sign extend the loaded
value, while the other will zero extend it. Note that
all loads smaller than long can load to either a long or
a word.
The two instructions `loadsw` and `loaduw` have the same
effect when they are used to define a word temporary.
A `loadw` instruction is provided as syntactic sugar for
`loadsw` to make explicit that the extension mechanism
used is irrelevant.
* Blits.
* `blit` -- `(m,m,w)`
The blit instruction copies in-memory data from its
first address argument to its second address argument.
The third argument is the number of bytes to copy. The
source and destination spans are required to be either
non-overlapping, or fully overlapping (source address
identical to the destination address). The byte count
argument must be a nonnegative numeric constant; it
cannot be a temporary.
One blit instruction may generate a number of
instructions proportional to its byte count argument,
consequently, it is recommended to keep this argument
relatively small. If large copies are necessary, it is
preferable that frontends generate calls to a supporting
`memcpy` function.
* Stack allocation.
* `alloc4` -- `m(l)`
* `alloc8` -- `m(l)`
* `alloc16` -- `m(l)`
These instructions allocate a chunk of memory on the
stack. The number ending the instruction name is the
alignment required for the allocated slot. QBE will
make sure that the returned address is a multiple of
that alignment value.
Stack allocation instructions are used, for example,
when compiling the C local variables, because their
address can be taken. When compiling Fortran,
temporaries can be used directly instead, because
it is illegal to take the address of a variable.
The following example makes use of some of the memory
instructions. Pointers are stored in long temporaries.
%A0 =l alloc4 8 # stack allocate an array A of 2 words
%A1 =l add %A0, 4
storew 43, %A0 # A[0] <- 43
storew 255, %A1 # A[1] <- 255
%v1 =w loadw %A0 # %v1 <- A[0] as word
%v2 =w loadsb %A1 # %v2 <- A[1] as signed byte
%v3 =w add %v1, %v2 # %v3 is 42 here
~ Comparisons
~~~~~~~~~~~~~
Comparison instructions return an integer value (either a word
or a long), and compare values of arbitrary types. The returned
value is 1 if the two operands satisfy the comparison
relation, or 0 otherwise. The names of comparisons respect
a standard naming scheme in three parts.
1. All comparisons start with the letter `c`.
2. Then comes a comparison type. The following
types are available for integer comparisons:
* `eq` for equality
* `ne` for inequality
* `sle` for signed lower or equal
* `slt` for signed lower
* `sge` for signed greater or equal
* `sgt` for signed greater
* `ule` for unsigned lower or equal
* `ult` for unsigned lower
* `uge` for unsigned greater or equal
* `ugt` for unsigned greater
Floating point comparisons use one of these types:
* `eq` for equality
* `ne` for inequality
* `le` for lower or equal
* `lt` for lower
* `ge` for greater or equal
* `gt` for greater
* `o` for ordered (no operand is a NaN)
* `uo` for unordered (at least one operand is a NaN)
Because floating point types always have a sign bit,
all the comparisons available are signed.
3. Finally, the instruction name is terminated with a
basic type suffix precising the type of the operands
to be compared.
For example, `cod` (`I(dd,dd)`) compares two double-precision
floating point numbers and returns 1 if the two floating points
are not NaNs, or 0 otherwise. The `csltw` (`I(ww,ww)`)
instruction compares two words representing signed numbers and
returns 1 when the first argument is smaller than the second one.
~ Conversions
~~~~~~~~~~~~~
Conversion operations change the representation of a value,
possibly modifying it if the target type cannot hold the value
of the source type. Conversions can extend the precision of a
temporary (e.g., from signed 8-bit to 32-bit), or convert a
floating point into an integer and vice versa.
* `extsw`, `extuw` -- `l(w)`
* `extsh`, `extuh` -- `I(ww)`
* `extsb`, `extub` -- `I(ww)`
* `exts` -- `d(s)`
* `truncd` -- `s(d)`
* `stosi` -- `I(ss)`
* `stoui` -- `I(ss)`
* `dtosi` -- `I(dd)`
* `dtoui` -- `I(dd)`
* `swtof` -- `F(ww)`
* `uwtof` -- `F(ww)`
* `sltof` -- `F(ll)`
* `ultof` -- `F(ll)`
Extending the precision of a temporary is done using the
`ext` family of instructions. Because QBE types do not
specify the signedness (like in LLVM), extension instructions
exist to sign-extend and zero-extend a value. For example,
`extsb` takes a word argument and sign-extends the 8
least-significant bits to a full word or long, depending on
the return type.
The instructions `exts` (extend single) and `truncd` (truncate
double) are provided to change the precision of a floating
point value. When the double argument of `truncd` cannot
be represented as a single-precision floating point, it is
truncated towards zero.
Converting between signed integers and floating points is done
using `stosi` (single to signed integer), `stoui` (single to
unsigned integer, `dtosi` (double to signed integer), `dtoui`
(double to unsigned integer), `swtof` (signed word to float),
`uwtof` (unsigned word to float), `sltof` (signed long to
float) and `ultof` (unsigned long to float).
Because of <@ Subtyping >, there is no need to have an
instruction to lower the precision of an integer temporary.
~ Cast and Copy
~~~~~~~~~~~~~~~
The `cast` and `copy` instructions return the bits of their
argument verbatim. However a `cast` will change an integer
into a floating point of the same width and vice versa.
* `cast` -- `wlsd(sdwl)`
* `copy` -- `T(T)`
Casts can be used to make bitwise operations on the
representation of floating point numbers. For example
the following program will compute the opposite of the
single-precision floating point number `%f` into `%rs`.
%b0 =w cast %f
%b1 =w xor 2147483648, %b0 # flip the msb
%rs =s cast %b1
~ Call
~~~~~~
`bnf
CALL := [%IDENT '=' ABITY] 'call' VAL '(' (ARG), ')'
ARG :=
ABITY VAL # Regular argument
| 'env' VAL # Environment argument (first)
| '...' # Variadic marker
SUBWTY := 'sb' | 'ub' | 'sh' | 'uh' # Sub-word types
ABITY := BASETY | SUBWTY | :IDENT
The call instruction is special in several ways. It is not
a three-address instruction and requires the type of all
its arguments to be given. Also, the return type can be
either a base type or an aggregate type. These specifics
are required to compile calls with C compatibility (i.e.,
to respect the ABI).
When an aggregate type is used as argument type or return
type, the value respectively passed or returned needs to be
a pointer to a memory location holding the value. This is
because aggregate types are not first-class citizens of
the IL.
Sub-word types are used for arguments and return values
of width less than a word. Details on these types are
presented in the <@ Functions > section. Arguments with
sub-word types need not be sign or zero extended according
to their type. Calls with a sub-word return type define
a temporary of base type `w` with its most significant bits
unspecified.
Unless the called function does not return a value, a
return temporary must be specified, even if it is never
used afterwards.
An environment parameter can be passed as first argument
using the `env` keyword. The passed value must be a 64-bit
integer. If the called function does not expect an environment
parameter, it will be safely discarded. See the <@ Functions >
section for more information about environment parameters.
When the called function is variadic, there must be a `...`
marker separating the named and variadic arguments.
~ Variadic
~~~~~~~~~~
The `vastart` and `vaarg` instructions provide a portable
way to access the extra parameters of a variadic function.
* `vastart` -- `(m)`
* `vaarg` -- `T(mmmm)`
The `vastart` instruction initializes a *variable argument
list* used to access the extra parameters of the enclosing
variadic function. It is safe to call it multiple times.
The `vaarg` instruction fetches the next argument from
a variable argument list. It is currently limited to
fetching arguments that have a base type. This instruction
is essentially effectful: calling it twice in a row will
return two consecutive arguments from the argument list.
Both instructions take a pointer to a variable argument
list as sole argument. The size and alignment of variable
argument lists depend on the target used. However, it
is possible to conservatively use the maximum size and
alignment required by all the targets.
type :valist = align 8 { 24 } # For amd64_sysv
type :valist = align 8 { 32 } # For arm64
type :valist = align 8 { 8 } # For rv64
The following example defines a variadic function adding
its first three arguments.
function s $add3(s %a, ...) {
@start
%ap =l alloc8 32
vastart %ap
%r =s call $vadd(s %a, l %ap)
ret %r
}
function s $vadd(s %a, l %ap) {
@start
%b =s vaarg %ap
%c =s vaarg %ap
%d =s add %a, %b
%e =s add %d, %c
ret %e
}
~ Phi
~~~~~
`bnf
PHI := %IDENT '=' BASETY 'phi' ( @IDENT VAL ),
First and foremost, phi instructions are NOT necessary when
writing a frontend to QBE. One solution to avoid having to
deal with SSA form is to use stack allocated variables for
all source program variables and perform assignments and
lookups using <@ Memory > operations. This is what LLVM
users typically do.
Another solution is to simply emit code that is not in SSA
form! Contrary to LLVM, QBE is able to fixup programs not
in SSA form without requiring the boilerplate of loading
and storing in memory. For example, the following program
will be correctly compiled by QBE.
@start
%x =w copy 100
%s =w copy 0
@loop
%s =w add %s, %x
%x =w sub %x, 1
jnz %x, @loop, @end
@end
ret %s
Now, if you want to know what phi instructions are and how
to use them in QBE, you can read the following.
Phi instructions are specific to SSA form. In SSA form
values can only be assigned once, without phi instructions,
this requirement is too strong to represent many programs.
For example consider the following C program.
int f(int x) {
int y;
if (x)
y = 1;
else
y = 2;
return y;
}
The variable `y` is assigned twice, the solution to
translate it in SSA form is to insert a phi instruction.
@ifstmt
jnz %x, @ift, @iff
@ift
jmp @retstmt
@iff
jmp @retstmt
@retstmt
%y =w phi @ift 1, @iff 2
ret %y
Phi instructions return one of their arguments depending
on where the control came from. In the example, `%y` is
set to 1 if the `@ift` branch is taken, or it is set to
2 otherwise.
An important remark about phi instructions is that QBE
assumes that if a variable is defined by a phi it respects
all the SSA invariants. So it is critical to not use phi
instructions unless you know exactly what you are doing.
- 8. Instructions Index
-----------------------
* <@ Arithmetic and Bits >:
* `add`
* `and`
* `div`
* `mul`
* `neg`
* `or`
* `rem`
* `sar`
* `shl`
* `shr`
* `sub`
* `udiv`
* `urem`
* `xor`
* <@ Memory >:
* `alloc16`
* `alloc4`
* `alloc8`
* `blit`
* `loadd`
* `loadl`
* `loads`
* `loadsb`
* `loadsh`
* `loadsw`
* `loadub`
* `loaduh`
* `loaduw`
* `loadw`
* `storeb`
* `stored`
* `storeh`
* `storel`
* `stores`
* `storew`
* <@ Comparisons >:
* `ceqd`
* `ceql`
* `ceqs`
* `ceqw`
* `cged`
* `cges`
* `cgtd`
* `cgts`
* `cled`
* `cles`
* `cltd`
* `clts`
* `cned`
* `cnel`
* `cnes`
* `cnew`
* `cod`
* `cos`
* `csgel`
* `csgew`
* `csgtl`
* `csgtw`
* `cslel`
* `cslew`
* `csltl`
* `csltw`
* `cugel`
* `cugew`
* `cugtl`
* `cugtw`
* `culel`
* `culew`
* `cultl`
* `cultw`
* `cuod`
* `cuos`
* <@ Conversions >:
* `dtosi`
* `dtoui`
* `exts`
* `extsb`
* `extsh`
* `extsw`
* `extub`
* `extuh`
* `extuw`
* `sltof`
* `ultof`
* `stosi`
* `stoui`
* `swtof`
* `uwtof`
* `truncd`
* <@ Cast and Copy > :
* `cast`
* `copy`
* <@ Call >:
* `call`
* <@ Variadic >:
* `vastart`
* `vaarg`
* <@ Phi >:
* `phi`
* <@ Jumps >:
* `hlt`
* `jmp`
* `jnz`
* `ret`
|