In this lab, you will design and implement an instruction
selector for your Tiger
compiler, to translate
MiniJava programs to x64 assembly.
This lab consists of three parts: first, in part A, you will design and implement data structures for the x64 assembly language serving as the target for instruction selection. Second, in part B, you will design and implement a simple object model for MiniJava and determine field offsets, method offsets, and class sizes, among others. Finally, in part C, you will implement a maximum munch algorithm to lower down the CFG to the x64 assembly.
First check out the source we offered you for lab 4:
$ git commit -am 'my solution to lab3' $ git checkout -b lab4 origin/lab4
these commands will first commit your changes to the lab3 branch of your local Git repository, and then create a local lab4 branch and check out the remote lab4 branch into the new local lab4 branch.
Again, you will need to merge your code from lab3 into the new lab4 branch:
$ git merge lab3
Do not forget to resolve any conflicts before commiting to the local lab4 branch:
$ git commit -am 'lab4 init'
You should first import the new lab4 code into your favorite IDE. There are a bunch of new files that you should browse through:
codegen/*: data structures and operations for x64
When you finished this lab, zip you code and submit to the online teaching system.
Intel x64 is a widely used CSIC instruction set architecture (ISA) we will generate code for in this lab. In this part of the lab, you will first design data structures and operations for x64.
While the x64 is a complex ISA, we will, in this lab, only use a small subset of it which is enough to compile MiniJava. You may want to refer to some manual for this subset, if you are not familiar with this ISA.
The data structures defining x64 is essentially an assembly-level control-flow graph (as we have discussed in the previous lab). The most interesting part is how we represent assembly instructions.
At this phase, an assembly instruction may still contain variables (we call them pseudo-registers), which will be allocated to physical registers by a register allocator (as will be discussed in the next lab). For example, an addition instruction may look like:
addq %y, %x // %x = %x + %y
where both %x
and %y
are
variables yet to be allocated to registers.
Hence, an assembly instruction in this phase
can be treated as a function mapping variables to
a concrete assembly instruction string in which
pseudo-registers substituted by physical registers.
For example, if we can determine this allocation:
[%x -> %rax, %y -> %rbx]
Then the aforementioned instruction can be converted into:
addq %rbx, %rax
With this key insight, we can encode an assembly instruction using a higher-order function taking varying components (i.e., variables) as arguments. Specifically, we will use a λ taking the variables as parameters to concatenate the final assembly string. Consider the above example, its λ encoding may look like:
λ([%x, %y]).(addq %y, %x)
Furthermore, to distinguish variable uses and defs, we put two arguments for a λ, one for the variable uses and the other one for the defs. With this elaboration, the above instruction can be encoded by the following two-argument λ:
λ([%x, %y], [%x]).(addq %y, %x)
where [%x, %y]
are uses, and [%x]
is def.
codegen/X64.java
, to
understand data structures defining x64.
Feel free to extend these data structures and operations
when necessary.
In this part of the lab, you will implement an object layout algorithm to determine the layout of each object, consisting of two sub-algorithms. First, for a class, your algorithm will determine the total size of that class as well as the layout of its virtual method table. Second, for an object, the algorithm will determine the layout of its containing fields, including implicit fields such as the virtual method table pointer.
In subsequent labs, you will extend this object layout to support other OO features, such as garbage collections (or monitors in full Java).
codegen/Layout.java
to calculate the size of a class as well as offsets
for each field in a class, respectively.
For simplicity, you can treat the size of an MiniJava
integer (i.e., type int
) as 8,
which is the same as a reference (pointer).
In this part of the lab, you will implement an instruction selector based on the maximum munch algorithm to lower down the CFG to x64 assembly.
codegen/Munch.java
,
to select x64 instructions for statements in the CFG.
A calling convention specifies how arguments and returned values are passed between a callee and a caller. Before implementing the calling convention for X64, You should first get familiarized yourself with the X64 ABI (Section 3.2). As MiniJava only has 64-bits integers and references, so make sure to answer these questions: 1) how arguments are passed from the caller to the callee? 2) how returned values are passed from the callee to the caller
To this point, you Tiger compiler should generate X64 assembly for all MiniJava programs. Do not forget to test your compiler before continuing.
This completes the lab. Remember to hand in your solution to the online teaching system.