In this lab, you will design and implement a register allocator
for your Tiger compiler, to allocate variables
to physical registers.
This lab consists of two parts: first, in part A, you will design and implement a baseline allocator to allocate variables to stack frames. In part B, you will implement linear scan register allocator.
First check out the source code we offered you for lab 5:
$ git commit -am 'my solution to lab4'
$ git checkout -b lab5 origin/lab5
these commands will first commit your changes to the lab4 branch of your local Git repository, and then create a local lab5 branch and check out the remote lab5 branch into the new local lab5 branch.
Again, you will need to merge your code from lab4 into the new lab5 branch:
$ git merge lab4
Do not forget to resolve any conflicts before commiting to the local lab5 branch:
$ git commit -am 'lab5 init'
You should first import the new lab5 code into your editor, and make sure the code compiles. There are a bunch of new files that you should browse through:
regalloc/*: register allocators
When you finished this lab, zip you code and submit to the online teaching system.
We start by building a baseline stack register allocator, to allocate
all variables to stack frames instead of physical registers
(i.e., all variables are spilled).
While this allocator generates inefficient code with
many memory load/stores, it is easier
to implement and serves as the baseline allocator in most production
compilers (i.e., the -O0 optimization level).
As we are allocating variables to stack slots in a stack
frame, it is essential to keep track of where each variable
has been allocated to. We use a data structure TempMap
to record the stack frame offset for each variable.
regalloc/TempMap.java, to
understand the data structure.
It should be noted that while this data structure
allows two options for allocation: InReg
and InStack, we only use InStack
in this part of the lab.
The frame data structure keeps track of how many slots have been allocated for a given function.
regalloc/Frame.java,
to understand the implementation of a frame.
You do not need to write any code for this exercise.
With these data structures, we can allocate variables. The allocation consists of three major steps: 1) allocating stack slots for variables (i.e., arguments and locals) in a given function; 2) rewriting each assembly instruction by inserting necessary load/stores of the variables from/to the stack frame; and 3) adding necessary function prologue and epilogue to adjust the function frame when entering and exiting the function.
The second step deserves more explanation. The instruction rewriting is syntax-directed by adding loads before each instruction and stores after it. Take a binary operation instruction as an example (for simplicity, we write instruction in TAC form instead of in assembly form without losing generality):
x = y + z; // uses = [y, z], defs = [x]
suppose the stack slot allocated for x,
y, and z are l_x,
l_y, and l_z, respectively. Then
the above instruction will be translated to:
r1 = M[l_y]; r2 = M[l_z]; r1 = r1 + r2; M[l_x] = r1;
which used 2 scratch registers r1 and
r2, where r1 is reused to hold
both the use y and the def x.
StackAllocator.java,
to allocate variables to physical registers.
You may also need to extend or modify
the pretty printer in regalloc/PpAssem.java.
A function prologue is a fragment of assembly code at the beginning of a function, which prepare the call stack and registers for use within the function. Similarly, a function epilogue appears at the end of the function, and restores the stack and registers to the state they were in before the function was called.
%rbp
as the stack frame base pointer.
To generate complete executables, we need link the Tiger-produced assembly against a runtime.
src/runtime/runtime.c
for the code of the runtime.
Modify and extend this code to add other runtime functions
that your Tiger-produced assembly can invoke.
Specifically, you may use any strategy you think appropriate
to implement memory allocations. In the next lab, you will
design and implement a garbage collector to manage the
Java heap.
To this point, your Tiger compiler can compile any MiniJava programs to produce assembly. Furthermore, when linked against the above runtime, your Tiger compiler can produce final executables which, when executed, should produce correct result. Do not forget to test your Tiger compiler, fix any potential bugs before continuing.
In this part, you will design and implement a linear scan register allocator, as described in this paper. Your should first read this paper carefully to understand the algorithm, before writing any code.
You will first write a liveness analysis to calculate, for each instruction and transfer in a function, the live-in and live-out variable sets.
live_in[s] = uses[s] ∪ (live_out[s] - defs[s])
live_out[s] = ∪(live_in[p]) // for any p ∈ succ[s];
with all live_in[] and live_out[] initialized to an empty set ∅.
live_in[b] = uses[b] ∪ (live_out[b] - defs[b])
live_out[b] = ∪(live_in[p]) // for any p ∈ succ[b];
where the live variable set uses[] and defs[]
for each block b is calculated by a one-pass backward
iteration of all statements in this block.
Implement this strategy to speed up your liveness analysis.
A
topological sorting
linearizes the basic blocks in a control-flow
graph, so that each basic block b (hence
each instruction or transfer
in that block) will have a unique linear order number n
with total order.
Given a specific topological order, we can calculate, for each
variable x, its live interval [low, high].
Intuitively, the variable x is "always" live in this
interval, hence requiring a unique
physical register r to hold its value.
x, its live interval
[low, high]. Think carefully about the underlying data
structure designs.
Variables can be allocated to registers by a linear scan
of all live intervals. Live intervals [low, high]
should be increasingly ordered according to its left coordinate
low.
To this point, your Tiger compiler can compile any MiniJava programs without spilling to assembly. Furthermore, when linked against the above runtime, your Tiger compiler can produce final executables which, when executed, should produce correct result. Do not forget to test your Tiger compiler, fix any potential bugs before continuing.
Variables that cannot be allocated to registers should be spilled to stack frames. Extra load/stores should be added for the spilled variables.
After implementing spilling, your Tiger compiler can compile any MiniJava programs to produce assembly. Test your Tiger compiler before continuing.
Coalescing is an optimization to remove move instructions where the source and target are the same. Specifically, for move instruction
movq %x, %y // %y = %x
if both variables x and y can be
allocated to the same physical register (say %r), then
the resulting move instruction:
movq %r, %r
is essentially a nop, hence can be removed subsequently.
all move instructions from
the programs being compiled.
To this point, your Tiger compiler should compile any MiniJava programs by allocating variables to registers. Do not forget to test your Tiger compiler and fix potential bugs.
This completes the lab. Remember to hand in your solution to the online teaching system.