1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682
|
#+TITLE: JIT Compiler TODO list
* VM integration
** DONE Stack walker for current position
Currently we mark the 'current position' in the JIT entry label at the
start of every basic block, the start-and-end of frame handlers, and
the start-and-end of inlines. This is major code bloat, for a feature
that is only necessary in exceptional cases,
Concept of stack walker is very simple:
#+BEGIN_SRC asm
mov rcx, 1 ; rsp = []
call foo ; rsp = [label:],
label: mov rcx, rax; rsp = []
...
foo: ; stack (from rsp) looks like: [label:]
push rbp ; [label:,rbp]
mov rbp, rsp ; rbp is now top of stack, so that
add rsp, 0xff; rsp = [label:,rbp, ? x 1]
...
sub rsp, 0xff ; rsp = [label:,rbp]
pop rbp ; rsp = [label:]
ret ; rsp = []
#+END_SRC
- On POSIX, arg 0 = rdi, arg 1 = rsi, arg2 = rdx.
- On Windows, arg0 = rcx, arg1 = rdx, arg2 = r8.
- On linux, names are generally used as-is, mac wants them prefixed by an underscore.
Desirable thing: limit the depth of stack walking to some reasonable number (say, 5 or so)
#+BEGIN_SRC asm
walk_stack_posix:
_walk_stack_posix:
mov rcx, rdi ; base pointer
mov r8, rdx ; maximum number of steps
mov rdx, rsi ; end pointer
_walk_stack_win64:
# rdi = base pointer, rsi = end pointer
push rbp
mov r9, rsp
loop:
dec r8 ; counter
jz done
mov rax, qword ptr [r9+0x8]
mov r9, qword ptr [r9]
cmp rax, rcx
jl loop
cmp rax, rdx
jg loop
done:
## rax is now within range by definition, or, we're to deep
pop rbp
ret
#+END_SRC
There are three things to do:
This doesn't have to start in the expr JIT though.
- Figure out where we need it. As far as I can tell, this is separate
from the jit_entry_label thing, and we will never *set* the
jit_entry_label with the result of this value, as that might lead to
a jump right behind the handler, and in the case of a THROWISH_POST,
an infinite loop. Indeed throwish_pre and throwish_post don't change.
- src/exceptions.c: search_frame_handlers (we compare the current
jit label, but we're interested in the current position); other
than that, the only updates are to the goto_handlers, and/or
setting the resume labels, but that only ever happens with
throwobj, and that one is explicitly throwish anyway, so the
jit_entry_label will be set correctly.
- src/core/frame.c: assignments from predefined labels, but, also,
MVM_frame_find_contextual_by_name, which uses it as a location
marker. For frames higher in the callstack, that is correct,
though, so we need to distinguish the top frame from the rest.
- src/spesh/deopt.c: for upper frames, we use jit_entry_label as
current location marker.... which is correct as it relies on exact
matches, and anything invoking anything that could deopt_all must
set the label anyway.
- Finally, configure our toolchain so they have
=-fno-omit-frame-pointer= portably, this is spelled =[[https://msdn.microsoft.com/en-us/library/2kxx5t2c.aspx][/Oy]]= in microsoft
land.
- Integrate this in the build system. clang and gcc can build this
just fine (clang is ... whiney about comment syntax). Microsoft
has: [[https://docs.microsoft.com/en-us/cpp/assembler/masm/masm-for-x64-ml64-exe][ml64]]. It also supports intel syntax. It can be a bit fuzzy
about directives. I don't want to ask our users to install another
assembler, but what I can do is use the C preprocessor to smoothen
out the differences (with $(CC) -E or whatever is the equivalent for
windows).
*** DONE stack walker itself
*** DONE hide jit_entry_label from core
*** DONE activate stack walker
*** DONE remove eager dynamic label storage
*** DONE build system integration
This works without issues on linux/gcc, but I'll have to see about
MSVC and apple/clang I think I need to define a rule for '.s' just
like there is for .c.
*** DONE replace with return address pointer pointer
Even with =/Oy-= (and =-fno-omit-frame-pointer=) we can't reliably
walk the stack on windows, because we can store other things than
=rbp= under the =rsp= - so the 'linked list' is simply not respectd.
What we *can* do instead, is take a pointer to where we know the
return address will be (which is =rsp-0x8=, because that is fully
under our control. This works (but uncovers an existing bug (somehow)
in the expr JIT).
*** DONE the windows bug
- This appears to be a broken label issue (backwards halfway into an
instruction)
- I need to know where that label is supposed to point too though, and
how far it is supposed to go
** DONE Replace 'invokish' and 'throwish' check code with return address updates
Currently, we have to check arround every 'invokish' opcode whether or
not we have in fact invoked (by comparing the current frame sequence
number with the stored frame sequence number). If we have, we need to
jump out to the interpreter.
For throwish opcodes, we go a bit further; becuase we may require a
jump within the frame (to go to the correct frame handler), we load
the jit-entry-label after the opcode, then jump towards that after the
throwish opocde (if we haven't already jumped out due to different
frame nrs). If we haven't in fact thrown, this is effectively a no-op.
We can completely avoid doing either of those things if we update the
return address instead.
- invokish/throwish opcodes that need to trampoline to the interpreter
can do so by setting the return label to the =-exit= pointer
instead, see [[https://corsix.github.io/dynasm-doc/reference.html#dasm_setupglobal][dasm_setupglobal]] on how to get this label
- throwish opcodes that need to do an 'internal' jump can also update
this pointer, and execution will automatically continue there.
Have to investigate if there are more options than that, but I don't
really think so. This makes invokish ops nearly free if not actually
invoking.
We can *not* update an invokee's frame that way, because the relevant
=rbx= register will point to the wrong frame. (Unfortunately!)
- [x] get exit label.
we unfortunately can't get the exact constant name because it is
only defined in the 'dasc' file
- [x] assign a new label
- [x] frame.c
- frame entry
- frame exit
- unwind
- [x] deopt.c
- frame exit (never the current frame though, always the caller, so
no work there)
- [x] exceptions.c
- frame exit
- within-frame jump (GOTO)
- [x] emit.dasc
- control handlers
- invocations
- [x] continuation.c
- =MVM_continuation_control= tries to find a frame with a matching tag,
then invokes *another* frame with that continuation.
- It does not currently store the current JIT position (for the
purpose of reentry), so that might explain infinite-looping.
- Also, the invoke of code will try to set the jit return address to
a label, reading from a frame not associated with the JIT code
(potentially not *having* any JIT code, hence crashing).
- What this needs is both jit-leave-frame, and
jit-store-current-position.
- =MVM_continuation_reset= either calls =MVM_continuation_invoke=
(on a continuation) or =fSTABLE(code)->invoke= (on any other code
object) - if that is interpreter code, it should normally pass
through =MVM_frame_invoke=, and so we have that handled.
- =MVM_continuation_invoke= also calls =STABLE(code)->invoke=, so
same goes there
- [x] nativecall.c
- somehow tell the JIT not to emit code assigning a jit_return_address?
- a flag on the graph probably
- [x] nativecall_dyncall / nativecall_libffi
- when handling a callback, these are the only conceivable options
of running a recursive interpreter frame; and we store a jmp_buf
Should we set a tc->jit_exit_label to the jitcodes' exit label always?
would simplify things a bit.
*** DONE wrap in tidy interface and check assigment
We want some form of abstraction, if only for 'set current position',
so that we can check we're setting a value within bounds.
I'm not sure if there are more such abstractions to be had... But this
will help me debug for sure.
Let's add:
- [x] MVM_jit_code_set_current_position()
- [x] MVM_jit_code_trampoline()
Turns out, if we have an exception and we set a return-after handler...
Then we can't bloody well jump to the lower frame. It won't work.
*** DONE clean up
- [x] graph.h
- [x] graph.c
- [x] emit.dasc
- [x] epxr.c
* Tools
** Create =make-gdb-script.pl= script
It should be relatively easy to create a script to startup GDB (or
another debugger) with the correct context. I've been writing a few of
those by hand now.
** Make =jit-bisect.pl= figure out if something is expr-bisectable
Currently we rely on a flag. But the jit-bisect.pl tool *can* figure
out which combination of flags break or unbreak a run. So why not for
the expr JIT as well?
** Make =jit-dump.pl= work on windows
This should be doable, really, if rather than pipe-and-fork, we simply
use a temporary file and use that as an intermediate.
First problem is to make it capture the output in the first
place. (Backticks work, it seems, so we can always try that).
** Make =jit-comparify-asm.pl= warn about broken labels
Currently, we try to resovle the link address directly. What we should
probably do is compute a table of linked addresses and code addresses,
and then warn about anything that's not a link address.
** Make =expr-template-compiler.pl warn for conditional-use-before-unconditional-use
This is a common cause of errors, because the conditional use will
'define' the value, and the tiler will not emit code to redefine it
for the unconditional use. Especially dangerous is that this can
happen in macro expansions (like write-barrier).
This should probably be a hard error. How to go about checking it?
First:
- we can only get this problem in DAGs not trees
- we can only create a DAG with named references (=$1=, =$foo=)
- we can only have a conditional use if we have one of the following node types:
- WHEN (second node is conditional)
- IF/IFV (second and third nodes are conditional)
- ALL/ANY (all but first, and the 'conditionality' is relative)
I'm thinking of using a stack to indicate the conditional order of
definitions and uses.... but I'm not sure how just yet.
* Expression Tree
** Reduce tree node size to 32 bits
Tree nodes are currently 64 bits wide to allow them to coexist with
constant pointers. This is handy, but not really required, since we
could use a lookup table to get the pointers (as long as we can
declare pointers, for which I think we can still use the '@' sigil,
e.g:
#+BEGIN_EXAMPLE
(template: say
(call (const @MVM_string_say ptr_sz)
(arglist 2
(carg (tc) ptr)
(carg $0 ptr))
#+END_EXAMPLE
The @MVM_string_say pointer can be stashed in an array:
#+BEGIN_SRC C
static const void *MVM_jit_expr_ptrs[] = {
...
MVM_string_say,
...
};
#+END_SRC
And the pointer itself replaced by the index.
We could argue against dealing with 64 bit constants in general, but
unfortunately, const_i64 prevents us from doing that.... Ways of
dealing with that:
+ A 'large constants' table per tree (into which we could copy both the
i64 constants and the function pointer constants)
+ We could store this entire table in the data section, too
+ A 'large constants' op, which could take the space to store the 64
bit constant directly; one of the advantages of that is that we
could specialise tiling to that (e.g. there is no advantage to
including a very large constant in the ADD tile since the underlying
'add' instruction cannot handle it).
+ Or both: have a large_const op and a large_const table, and only
have the large_const op refer to the large_const table (i.e. not the
regular const)
NB - I didn't do the @ notation, because it turns out we need to
specially handle the indirected constants, and because of platform
differences, we also make the distinction between large constants in
general (MVMint64 or MVMnum64) and pointer constants, because pointers
have different sizes per architecture, and general large constants do
not.
** Encode (parts of) info as flags
Even when we'd have 32 bit operands and we'd (wastefully) use 16 bits
for the operator (rather than 8 which would be till 4x more than we
use now), we still have 16 perfectly good bits left. What I want to do
is to make info mostly sparse, so that:
- we can use a hashtable to represent all nodes that do have info
- we can encode all relevant attributes 'inline'
Which would be:
- number-of-args (redundant but nice, especially if it gets rid of the
first-child-node-is-arg-count nonsense)
- we could maximize this to 16? (4 bits) or 32 (5 bits)
- size (1,2,4,8) (2 bits or 3 bits)
- that may be a bit aggressive, we may want to support SIMD?
- type flag (int/num/str/obj) (2 bits or 5 bits if we want to encode
the whole field)
All that would remain of info would be spesh_ins, meaning that this
would become sufficiently sparse to use a hash table, which would mean
we no longer need to resize the number of args.
** REPR-Specialized expression code
Parts needed:
+ A hook for instructions with operands of known repr type to insert a template
+ So how do we know which instruction/operand this is? (Hardcode with a switch, maybe)
+ Runtime hook should be similar to spesh hook
+ We should probably pass the tree and let the repr do manipulations itself for maximum flexibility
+ and have a default hook which attempts to apply a template
+ return root if succesful, otherwise -1 (in which case we can fallback to the normal mode)
+ should have a separate jit flags entry which is also settable by
the specializer (for jittivity, template destructiveness, possibly
other things)
+ operands loading must be public / template apply must become 'public methods'
+ Compile-time support for arbitrary templates in the expression templates
+ I think adding to a makefile list is acceptable, in general, but
it would be nice if we could have a substitution rule that would
make sure the expression templates are compiled 'automatically'
#+BEGIN_SRC makefile
EPXR_TEMPLATES=src/jit/core_expr.h \
src/6model/reprs/MMArray_expr.h \
src/6model/reprs/NativeRef_expr.h \
src/6model/reprs/MultiDimArray_expr.h \
# preferefably, we'd match the .expr with the file name automatically
src/6model/reprs/%.c: src/6model/reprs/%_expr.h # would be ideal, but this is not automatically picked up
# Expression list tables
%_expr_tables.h: %.expr tools/expr-template-compiler.pl src/core/oplist src/jit/expr_ops.h
$(PERL) -Itools/ tools/expr-template-compiler.pl -o $@ $<
#+END_SRC
** FLAGVAL ALL/ANY
Basically, flagval all/any is legal according to the type system, it
will just never work. We should translate it to (IF (ALL|ANY ..)
(CONST 1 1) (CONST 0 1))
The problem is, replacing all references to the node. (This is common
with the optimizer, which also needs it).
We don't actually need this yet, but we don't guard against it
either. (So maybe install an oops in analyze first).
NB - I didn't do the @ notation, because it turns out we need to
specially handle the indirected constants, and because of platform
differences, we also make the distinction between large constants in
general (MVMint64 or MVMnum64) and pointer constants, because pointers
have different sizes per architecture, and general large constants do
not.
** Encode (parts of) info as flags
Even when we'd have 32 bit operands and we'd (wastefully) use 16 bits
for the operator (rather than 8 which would be till 4x more than we
use now), we still have 16 perfectly good bits left. What I want to do
is to make info mostly sparse, so that:
- we can use a hashtable to represent all nodes that do have info
- we can encode all relevant attributes 'inline'
Which would be:
- number-of-args (redundant but nice, especially if it gets rid of the
first-child-node-is-arg-count nonsense)
- we could maximize this to 16? (4 bits) or 32 (5 bits)
- size (1,2,4,8) (2 bits or 3 bits)
- that may be a bit aggressive, we may want to support SIMD?
- type flag (int/num/str/obj) (2 bits or 5 bits if we want to encode
the whole field)
All that would remain of info would be spesh_ins, meaning that this
would become sufficiently sparse to use a hash table, which would mean
we no longer need to resize the number of args.
#+BEGIN_SRC C
typedef union {
struct {
MVMint16 o; /* operator */
MVMuint8 c; /* number of children */
MVMuint8 s : 4; /* size of result */
MVMuint8 t : 4; /* type of result */
}
MVMint32 r; /* reference */
} MVMJitExprNode;
#+END_SRC
** REPR-Specialized expression code
Parts needed:
+ A hook for instructions with operands of known repr type to insert a template
+ So how do we know which instruction/operand this is? (Hardcode with a switch, maybe)
+ Runtime hook should be similar to spesh hook
+ We should probably pass the tree and let the repr do manipulations itself for maximum flexibility
+ and have a default hook which attempts to apply a template
+ return root if succesful, otherwise -1 (in which case we can fallback to the normal mode)
+ should have a separate jit flags entry which is also settable by
the specializer (for jittivity, template destructiveness, possibly
other things)
+ operands loading must be public / template apply must become 'public methods'
+ Compile-time support for arbitrary templates in the expression templates
+ I think adding to a makefile list is acceptable, in general, but
it would be nice if we could have a substitution rule that would
make sure the expression templates are compiled 'automatically'
#+BEGIN_SRC makefile
EPXR_TEMPLATES=src/jit/core_expr.h \
src/6model/reprs/MMArray_expr.h \
src/6model/reprs/NativeRef_expr.h \
src/6model/reprs/MultiDimArray_expr.h \
# preferefably, we'd match the .expr with the file name automatically
src/6model/reprs/%.c: src/6model/reprs/%_expr.h # would be ideal, but this is not automatically picked up
# Expression list tables
%_expr_tables.h: %.expr tools/expr-template-compiler.pl src/core/oplist src/jit/expr_ops.h
$(PERL) -Itools/ tools/expr-template-compiler.pl -o $@ $<
#+END_SRC
** FLAGVAL ALL/ANY
Basically, flagval all/any is legal according to the type system, it
will just never work. We should translate it to (IF (ALL|ANY ..)
(CONST 1 1) (CONST 0 1))
The problem is, replacing all references to the node. (This is common
with the optimizer, which also needs it).
We don't actually need this yet, but we don't guard against it
either. (So maybe install an oops in analyze first).
** Use explicit stack for tree walking
Simple, mechanical transformation. I wonder if we can have a maximum
depth; probably not, if we can allow revisits. More importantly, this
should allow for some control on the iteration order
** Right-to-left evaluation
E.g. (STORE addr value sz) - it usually makes sense to calculate value
before address. There are a bunch of these things, and then again, a
bunch of things that rely on left-to-right evaluation:
+ IF/IFV
+ ALL/ANY
+ DO/DOV
So the thing is probably to:
+ store a preference per op
+ add a policy for the traverser (default,left-to-right,right-to-left)
* Register Allocator
** Switch to storing register numbers in a stack rather than in a ring
Using a ring has the disadvantage that register values are
'continuously moving', even when they do not need to be.
** Dump register allocator graph
I think it should be possible to dump the (result) of register
allocation. That is to say, create a graph that displays all tiles,
their basic block structure, the live range structure, and their
spills.
** Support multiple register classes
I want to distinguish register classes using ranges, i.e. on x86-64,
0-15 are GPR, 16-31 would be FPR. The trick is mostly:
*** Find out if register selection for FPRs is supported
*** Support register buffers per class
** Generalized 3-operand to 2-operand conversion
Already implemented for direct-memory binary ops, but needs to be
extended to take into account indirect-access ops and memory base +
indexed ops.
More to the point, I'd like this to be a restriction we can build into
the allocator itself, so it doesn't need last-minute patchup.
*** Use register stack rather than ring buffer
Ring buffers register allocation 'cycle' through registers and thereby
cause more moves than a stack would.
** Reduce spills
*** Maintain memory backed positions
Currently, when we need to spill a value, we always treat it as if it
were a temporary, i.e. we store it to a *new* location in the local
memory buffer. We increment the local memory buffer, too. This is
suboptimal for values that are not temporaries, i.e. values that are
stored to the local value buffer anyway.
+ stored to a local value
+ directly retrieved from a local value
There are two classes of such values:
There is no need to ever spill such values to memory.
#+BEGIN_SRC c
/* Return -1 if not a local store, 0 <= i <= frame->work_size if it is */
MVMint32 is_local_store(MVMJitExprTree *tree, MVMint32 node) {
if (tree->nodes[node] != MVM_JIT_STORE)
return -1;
node = tree->nodes[node + 1];
if (tree->nodes[node] != MVM_JIT_ADDR)
return -1;
if (tree->nodes[tree->nodes[node + 1]] != MVM_JIT_LOCAL)
return -1;
return tree->nodes[node+2];
}
MVMint32 has_local_location(MVMJitExprTree *tree, MVMint32 node) {
MVMSpeshIns *ins = tree->info[node].spesh_ins;
if (ins == NULL || ins->op_info->num_operands == 0 ||
(ins->info->operands[0] & MVM_operand_rw_mask) != MVM_operand_write_reg)
return -1;
return ins->op_info->operands[0].reg.orig;
}
#+END_SRC
*** Don't spill-and-load directly between definition and use
Or rather, if we can prove that there can be no 'spills' inbetween a
definition and use (and they are in the same basic block), let's
'merge' the atomic live ranges.
*** Don't spill constants
- We can either do that as part of the optimizer, or as part of the
allocator, or both.
- It is *simpler* to do it for the allocator (if a value we're
spilling has a single definition, and that definition is a constant,
copy it)
- It might be more effective to do it in the expression optimizer
** Generalized register requirements
Bunch of options possible:
- it's a requirement for an output register
- the register is allocatable
- which is /free/, in which case we can just take it (how I do I
know it's free? by a register map, which we need to make)
- which is /not free/, in which case we need to /spill/ the
current register
- the register is not allocatable (e.g. %rax)
- I'm going to go ahead and assume that it is free nevertheless,
otherwise we'd have to record the set of non-allocatable
registers clobbered
- However, if the value is to live, it's probably best to copy it
to an allocatable register
- it is a requirement for an input register
- that is not yet a problem I have (because I made %rax the spare
register), but most of the considerations of clobbering described
below apply
- it is an existing problem for ARGLIST compilation, but there it is
handled seprately (although it is fairly similar, and might generalize!)
- it clobbers a register (not necessarily one it uses), e.g. div which
clobbers %rdx to store the modulo (and %rax for the quotient).
- if free, no problem whatever
- if non-free, we again need to start moving registers, but I'm not
sure this requires the full shuffling requirements of ARGLIST.
*** Precoloring
I'd like to try and figure out if we can add 'prefered registers' to
tiles based on definition or use in tile requirements.
** Try to use 'holes' for allocation.
Not 100% sure this is worth the additional complexity since it means
that a register can have multiple occupants, which means you'll want
to use a linked list, and a heap for maintaining the first-to-expire
set, or a double-ended priority queue, etc.
Simplest thing to do is try and prove that the live range will be
'embedded' within the hole in all cases. But this is tricky when there
might be a spill inbetween.
** Support loops in lifetime hole finding
Note that Wimmer's paper describes computing holes and live range
extents are implemented in a single step, so we might implement that
as well.
* Optimizer
Not implemented at all, so we need some new things.
** DONE An equivalence function
We have one (MVM_jit_expr_tree_eqv), but it's problem is that it tries
to iterate to the leaves, whereas it should iterate to a certain depth
(but how to specify the depth in different branches?)
** DONE A replacement 'function'
Basically we require the possibility to update all uses of a node with
another one, including roots, if necessary.
Now, there will never be more uses than nodes, so we can build a
'usage' table-of-linked-list from a single block of memory.
Walks should be single-visit.
** TODO Example optimizations
*** TODO common subexpression elimination
- idea: (hash) table of expr, node
- table is created bottom-up
- all children are replaced with equivalent (according to the table)
- then parent is itself 'hashed' to a record, an potentially
replaced
- barriers should clear out the table
*** DONE IDX CONST to ADDR conversion
Uses one register fewer, simpler operation
*** TODO ADD CONST to ADDR conversion
Only allowed if user is pointerlike (e.g. LOAD)
*** DONE COPY insertion
- Values that are LOAD-ed and used from multiple operations might
benefit from inserting a COPY, so they don't use indirect
operations, e.g.
- Basic idea: count number of users of 'load', if > 1, insert the
COPY node and replace the refs
- Possibly a pessimization because it requires more registers!
*** DONE COPY elimination
- possibly the first step, removing redundant copies
- especially for CONST nodes
*** TODO CONST copying
- A const never needs to be kept in memory, and it is just as well
to keep just a single reference to it. (This might be better in
the register allocator though)
|