Lab 5: Register Allocation


Overview

In this lab, you will design and implement a register allocator for your Tiger compiler, to allocate variables to physical registers.

This lab consists of two parts: first, in part A, you will design and implement a baseline allocator to allocate variables to stack frames. In part B, you will implement linear scan register allocator.

Getting Started

First check out the source code we offered you for lab 5:

    $ git commit -am 'my solution to lab4'
    $ git checkout -b lab5 origin/lab5

these commands will first commit your changes to the lab4 branch of your local Git repository, and then create a local lab5 branch and check out the remote lab5 branch into the new local lab5 branch.

Again, you will need to merge your code from lab4 into the new lab5 branch:

    $ git merge lab4

Do not forget to resolve any conflicts before commiting to the local lab5 branch:

    $ git commit -am 'lab5 init'

You should first import the new lab5 code into your editor, and make sure the code compiles. There are a bunch of new files that you should browse through:

    regalloc/*:    register allocators

Hand-in Procedure

When you finished this lab, zip you code and submit to the online teaching system.


Part A: A Stack Allocator

We start by building a baseline stack register allocator, to allocate all variables to stack frames instead of physical registers (i.e., all variables are spilled). While this allocator generates inefficient code with many memory load/stores, it is easier to implement and serves as the baseline allocator in most production compilers (i.e., the -O0 optimization level).

Temp Maps

As we are allocating variables to stack slots in a stack frame, it is essential to keep track of where each variable has been allocated to. We use a data structure TempMap to record the stack frame offset for each variable.

Exercise 1. Read the code in regalloc/TempMap.java, to understand the data structure. It should be noted that while this data structure allows two options for allocation: InReg and InStack, we only use InStack in this part of the lab.

Function Frames

The frame data structure keeps track of how many slots have been allocated for a given function.

Exercise 2. Read the frame data structure in regalloc/Frame.java, to understand the implementation of a frame. You do not need to write any code for this exercise.

Stack Allocation

With these data structures, we can allocate variables. The allocation consists of three major steps: 1) allocating stack slots for variables (i.e., arguments and locals) in a given function; 2) rewriting each assembly instruction by inserting necessary load/stores of the variables from/to the stack frame; and 3) adding necessary function prologue and epilogue to adjust the function frame when entering and exiting the function.

The second step deserves more explanation. The instruction rewriting is syntax-directed by adding loads before each instruction and stores after it. Take a binary operation instruction as an example (for simplicity, we write instruction in TAC form instead of in assembly form without losing generality):

x = y + z; // uses = [y, z], defs = [x]

suppose the stack slot allocated for x, y, and z are l_x, l_y, and l_z, respectively. Then the above instruction will be translated to:

r1 = M[l_y];
r2 = M[l_z];
r1 = r1 + r2;
M[l_x] = r1;

which used 2 scratch registers r1 and r2, where r1 is reused to hold both the use y and the def x.

Exercise 3. Finish the code in StackAllocator.java, to allocate variables to physical registers. You may also need to extend or modify the pretty printer in regalloc/PpAssem.java.

Prologue and Epilogue

A function prologue is a fragment of assembly code at the beginning of a function, which prepare the call stack and registers for use within the function. Similarly, a function epilogue appears at the end of the function, and restores the stack and registers to the state they were in before the function was called.

Exercise 4. Implement function prologue and epilogue for Tiger. You can treat the size of the function frame as 0 (which will be determined by the register allocator in the next lab). To keep things simple, you can save all callee-saved registers in the prologue and restore them in the epilogue, regardless of whether the variable is used in the callee. Furthermore, your Tiger compiler can always reserve %rbp as the stack frame base pointer.

A Runtime

To generate complete executables, we need link the Tiger-produced assembly against a runtime.

Exercise 5. Read the file src/runtime/runtime.c for the code of the runtime. Modify and extend this code to add other runtime functions that your Tiger-produced assembly can invoke. Specifically, you may use any strategy you think appropriate to implement memory allocations. In the next lab, you will design and implement a garbage collector to manage the Java heap.

To this point, your Tiger compiler can compile any MiniJava programs to produce assembly. Furthermore, when linked against the above runtime, your Tiger compiler can produce final executables which, when executed, should produce correct result. Do not forget to test your Tiger compiler, fix any potential bugs before continuing.


Part B: A Linear Scan Allocator

In this part, you will design and implement a linear scan register allocator, as described in this paper. Your should first read this paper carefully to understand the algorithm, before writing any code.

Liveness Analysis

You will first write a liveness analysis to calculate, for each instruction and transfer in a function, the live-in and live-out variable sets.

Exercise 6. Write a liveness analysis for Tiger, to calculate live-in and live-out information. We are free to propose any data structures and algorithms to calculate this liveness information. To keep things simple, you can write the analysis on a single instruction granularity, to use the following dataflow equation:
live_in[s]  = uses[s] ∪ (live_out[s] - defs[s])
live_out[s] = ∪(live_in[p])                     // for any p ∈ succ[s];
with all live_in[] and live_out[] initialized to an empty set ∅.
Challenge. Most dataflow analysis should follow an order of the flow of data. For liveness analysis, the caculation should be backward from the last instruction in a function to the first one. Implement this strategy to speed up your analysis.
Challenge. To speed up the liveness analysis, you may calculate liveness information on a block granularity instead of on an instruction granularity. That is, you can use the following dataflow equation:
live_in[b]  = uses[b] ∪ (live_out[b] - defs[b])
live_out[b] = ∪(live_in[p])                     // for any p ∈ succ[b];
where the live variable set uses[] and defs[] for each block b is calculated by a one-pass backward iteration of all statements in this block. Implement this strategy to speed up your liveness analysis.

Topological Sorting and Live Intervals

A topological sorting linearizes the basic blocks in a control-flow graph, so that each basic block b (hence each instruction or transfer in that block) will have a unique linear order number n with total order.

Exercise 7. Implement topological sorting of basic blocks to linearize them. It should be noted that, generally speaking, a control-flow graph might not have a strict topological sorting due to the existence of loops. Hence, your algorithm will really produce a quasi-topological order.

Given a specific topological order, we can calculate, for each variable x, its live interval [low, high]. Intuitively, the variable x is "always" live in this interval, hence requiring a unique physical register r to hold its value.

Exercise 8. Calculate, for each variable x, its live interval [low, high]. Think carefully about the underlying data structure designs.

Allocation

Variables can be allocated to registers by a linear scan of all live intervals. Live intervals [low, high] should be increasingly ordered according to its left coordinate low.

Exercise 9. Allocate variables to physical registers by implementing the linear scan algorithm. Rewrite the assembly instructions according to the allocation result.

To this point, your Tiger compiler can compile any MiniJava programs without spilling to assembly. Furthermore, when linked against the above runtime, your Tiger compiler can produce final executables which, when executed, should produce correct result. Do not forget to test your Tiger compiler, fix any potential bugs before continuing.

Spilling

Variables that cannot be allocated to registers should be spilled to stack frames. Extra load/stores should be added for the spilled variables.

Challenge. Implement spilling, so that your Tiger compiler can compile programs with arbitrary variable interferences.

After implementing spilling, your Tiger compiler can compile any MiniJava programs to produce assembly. Test your Tiger compiler before continuing.

Coalescing

Coalescing is an optimization to remove move instructions where the source and target are the same. Specifically, for move instruction

movq %x, %y     // %y = %x

if both variables x and y can be allocated to the same physical register (say %r), then the resulting move instruction:

movq %r, %r

is essentially a nop, hence can be removed subsequently.

Challenge. Implement coalescing, so that your Tiger compiler can eliminate practically all move instructions from the programs being compiled.

To this point, your Tiger compiler should compile any MiniJava programs by allocating variables to registers. Do not forget to test your Tiger compiler and fix potential bugs.


Hand-in

This completes the lab. Remember to hand in your solution to the online teaching system.