Lab 4: Instruction Selection


Overview

In this lab, you will design and implement an instruction selector for your Tiger compiler, to translate MiniJava programs to x64 assembly.

This lab consists of three parts: first, in part A, you will design and implement data structures for the x64 assembly language serving as the target for instruction selection. Second, in part B, you will design and implement a simple object model for MiniJava and determine field offsets, method offsets, and class sizes, among others. Finally, in part C, you will implement a maximum munch algorithm to lower down the CFG to the x64 assembly.

Getting Started

First check out the source we offered you for lab 4:

    $ git commit -am 'my solution to lab3'
    $ git checkout -b lab4 origin/lab4

these commands will first commit your changes to the lab3 branch of your local Git repository, and then create a local lab4 branch and check out the remote lab4 branch into the new local lab4 branch.

Again, you will need to merge your code from lab3 into the new lab4 branch:

    $ git merge lab3

Do not forget to resolve any conflicts before commiting to the local lab4 branch:

    $ git commit -am 'lab4 init'

You should first import the new lab4 code into your favorite IDE. There are a bunch of new files that you should browse through:

    codegen/*:    data structures and operations for x64

Hand-in Procedure

When you finished this lab, zip you code and submit to the online teaching system.


Part A: X64 Assembly

Intel x64 is a widely used CSIC instruction set architecture (ISA) we will generate code for in this lab. In this part of the lab, you will first design data structures and operations for x64.

Data Structures

While the x64 is a complex ISA, we will, in this lab, only use a small subset of it which is enough to compile MiniJava. You may want to refer to some manual for this subset, if you are not familiar with this ISA.

The data structures defining x64 is essentially an assembly-level control-flow graph (as we have discussed in the previous lab). The most interesting part is how we represent assembly instructions.

At this phase, an assembly instruction may still contain variables (we call them pseudo-registers), which will be allocated to physical registers by a register allocator (as will be discussed in the next lab). For example, an addition instruction may look like:

addq %y, %x   // %x = %x + %y

where both %x and %y are variables yet to be allocated to registers. Hence, an assembly instruction in this phase can be treated as a function mapping variables to a concrete assembly instruction string in which pseudo-registers substituted by physical registers. For example, if we can determine this allocation:

[%x -> %rax, %y -> %rbx]

Then the aforementioned instruction can be converted into:

addq %rbx, %rax

With this key insight, we can encode an assembly instruction using a higher-order function taking varying components (i.e., variables) as arguments. Specifically, we will use a λ taking the variables as parameters to concatenate the final assembly string. Consider the above example, its λ encoding may look like:

λ([%x, %y]).(addq %y, %x)

Furthermore, to distinguish variable uses and defs, we put two arguments for a λ, one for the variable uses and the other one for the defs. With this elaboration, the above instruction can be encoded by the following two-argument λ:

λ([%x, %y], [%x]).(addq %y, %x)

where [%x, %y] are uses, and [%x] is def.

Exercise 1. Read the code in codegen/X64.java, to understand data structures defining x64. Feel free to extend these data structures and operations when necessary.

Part B: Object Layout

In this part of the lab, you will implement an object layout algorithm to determine the layout of each object, consisting of two sub-algorithms. First, for a class, your algorithm will determine the total size of that class as well as the layout of its virtual method table. Second, for an object, the algorithm will determine the layout of its containing fields, including implicit fields such as the virtual method table pointer.

In subsequent labs, you will extend this object layout to support other OO features, such as garbage collections (or monitors in full Java).

Exercise 2. Finish the code in the file codegen/Layout.java to calculate the size of a class as well as offsets for each field in a class, respectively. For simplicity, you can treat the size of an MiniJava integer (i.e., type int) as 8, which is the same as a reference (pointer).

Part C: Instruction Selection

In this part of the lab, you will implement an instruction selector based on the maximum munch algorithm to lower down the CFG to x64 assembly.

Maximum Munch Algorithm

Exercise 3. Fill in the code in the file codegen/Munch.java, to select x64 instructions for statements in the CFG.

Calling Convention

A calling convention specifies how arguments and returned values are passed between a callee and a caller. Before implementing the calling convention for X64, You should first get familiarized yourself with the X64 ABI (Section 3.2). As MiniJava only has 64-bits integers and references, so make sure to answer these questions: 1) how arguments are passed from the caller to the callee? 2) how returned values are passed from the callee to the caller

Exercise 4. Extend Tiger's calling convention implementation to support arguments more than 6, so that your Tiger compiler can compile programs with arbitrary number of arguments.
Challenge. The red-zone is a special area on the stack frame. Propose your strategy to exploit the red-zone, to improve efficiency of the generated code. Implement your idea into your Tiger compiler.

To this point, you Tiger compiler should generate X64 assembly for all MiniJava programs. Do not forget to test your compiler before continuing.


Hand-in

This completes the lab. Remember to hand in your solution to the online teaching system.