Lab 3: Intermediate Representation


Overview

In this lab, you will design and implement intermediate representations for your Tiger compiler, and lower down MiniJava programs by translating the abstract syntax trees to the intermediate representation you designed.

This lab consists of four parts. In part A, you will design and implement a specific intermediate representation: a control-flow graph (CFG). In part B, you will eliminate the object-oriented features in MiniJava by using a prefixing algorithm. In part C, you will implement a translator to lower down the abstract syntax tree to the control-flow graph. Finally, in part D, your will design and implement several classical program analysis and optimizations for CFG.

Getting Started

First check out the source we offered you for lab 3:

    $ git commit -am 'my solution to lab2'
    $ git checkout -b lab3 origin/lab3

these commands will first commit your changes to the lab2 branch of your local Git repository, and then create a local lab3 branch and check out the remote lab3 branch into the new local lab3 branch.

Again, you will need to merge your code from lab2 into the new lab3 branch:

    $ git merge lab2

Do not forget to resolve any conflicts before commiting to the local lab3 branch:

    $ git commit -am 'lab3 init'

You should first import the new Lab3 code into your editor, and make sure the code compiles. There are a bunch of new files that you should browse through:

    cfg/*:    control-flow graph data structures and operations

Hand-in Procedure

When you finished this lab, zip you code and submit to the online teaching system.


Part A: Control-flow Graph

A control-flow graph (CFG) is a graph-based program intermediate representation with basic blocks as nodes and control transfers between blocks as edges. A control-flow graph is a good intermediate representation as it makes every aspect of program control- and data-flow properties explicit thus easier to compute. As a result, in modern compilers, CFGs serve as not only carriers for compiler optimizations, but also backbones for more advanced intermediate representations such as static single-assignment forms (SSA).

Data Structures

In this part of the lab, you will first design data structures defining the control-flow graph.

Exercise 1. Read the code in cfg/Cfg.java, make sure you understand the Java code defining CFG data structures we offered you. Your job is to extend the CFG data structure to support all language features of MiniJava. You can work iteratively, that is, you can continue to the subsequent exercises, and go back to extend the CFG data structure when necessary.

To inspect the generated data structure, you might compile the test case we provide. For example, you can compile the test case test/SumRec.java to inspect the generated CFG:

    $ java --enable-preview -cp ./bin Tiger -builtin SumRec.java
    -trace cfg.Translate.doitProgram

It is often important to measure the sizes of the generated CFG, which reflects the memory occupation. For example, after you have implemented a compiler optimization to shrink the target CFG, you can evaluate the effectiveness of your optimization by comparing the sizes before and after that optimization.

Exercise 2. Write a size() method for each syntactic category in the file cfg/Cfg.java, to measure the size of the CFG in terms of the numbers of functions, basic blocks, and statements. The output might consist of a list of triples <methodNam, nBlock, nStms>. For example, the following output
<#methods, #blocks, #statements>
---------------------------------
<"f1", 5,  300>
<"g",  30, 400>
...
---------------------------------
subtotal 140, 25000
specifies the method f1 has 5 basic blocks and 300 statements, whereas the method g has 30 basic blocks and 400 statements, and so on. In a summary, the program has 140 basic blocks with 25,000 statements.

Graph Visualization

It will be nice to visualize a control-flow graph, making subsequent graph analysis more intuitive. Nevertheless, to draw a figure prettily is a nontrivial task and may require too much programming effort, especially for large and complex graphs. Fortunately, there are many off-the-shelf graph drawing utilities, so that we do not need to reinvent the wheels. Specifically, your Tiger compiler will make use of the Graphviz, a very popular graph visualization software, to visualize the control-flow graphs generated by your Tiger compiler.

Exercise 3. Download graphviz and install it on your machine. Do not forget to add it to your PATH. To make sure you have installed graphviz correctly, you can run the following command on your prompt:
$ dot --help
Usage: dot [-Vv?] [-(GNE)name=val] [-(KTlso)<val>] <dot files>
(additional options for neato)    [-x] [-n<v>]
(additional options for fdp)      [-L(gO)] [-L(nUCT)<val>]
(additional options for config)  [-cv]
...
Also, you can take a look at its manual, if you are interested in.

The Tiger compiler has rudimentary support to visualize a CFG, that is, it can draw a graph for each function in a program, respectively.

Exercise 4. Finish the code dot() in the file cfg/Cfg.java, to visualize each function. You can run the test case test/SumRec.java:
$ java --enable-preview -cp ./bin Tiger -builtin SumRec.java -dot cfg
to draw a sample CFG. Your task is to extend the dot() to draw statements and transfers in each block besides block labels.

Do not forget to test your Tiger compiler extensively before continuing. Especially, make sure your Tiger compiler can compile all testcases under the directory ./test/.


Part B: Class Elimination

The class elimination pass implements object-oriented features in MiniJava by lowering down them to low-level constructs. Specifically, this pass finishes three sub-tasks: 1) it eliminates classes by translating them to C-style structures; 2) it closes each (non-static) method by introducing an explicit this as the first parameter; and 3) it turns each method invocation into a virtual function call. In the following, we will discuss the first two steps and leave the last step to the next section (part C).

The Prefixing Algorithm

The prefixing algorithm builds an inheritance tree, and eliminates class inheritance by a level-order tree walking.

Exercise 5. Finish the method buildInheritTree0() to build an inherit tree, with each tree node containing a class. When finished, you might visualize the tree to test your implementation.
Exercise 6. Finish the method prefixOneClass() to implement the prefixing algorithm.

To this, a MiniJava class will be translated into two components: a structure (like struct in C), a virtual function table holding a list of function pointers to all the constituting methods in that class. Test your Tiger compiler and fix any bugs before continuing.

Closing Method

Exercise 7. Extend your implementation of the method prefixOneClass(), to close each method by introducing an extra this argument. Pay special attention to its type. For simplicity, you can generate an empty CFG function for each AST method serving as place-holders to be filled in subsequently.

Part C: CFG Generation

The CFG generation phase generates a CFG representation for the given abstract syntax tree. The generation makes use of a recursive-decedent algorithm.

Exercise 8. Finish the methods doitMethod(), doitStm(), and doitExp(), to translate a given method, statement, and expression to its corresponding CFG representation, respectively.

Do not forget to test your Tiger compiler using the test cases in test/ as well as your own ones. Fix any bugs before continuing.


Part D: CFG-based Program Analysis and Optimizations

In this part of the lab, you will familiarize yourself with program analysis and compiler optimizations by writing several classical program analysis (e.g., liveness analysis, reaching definitions, available expressions), and optimizations (e.g., constant propagation, copy propagation, dead code elimination).

You will write most data-flow analysis algorithms with the following general template:

data_flow_analysis()
    calculate the gen and kill sets for each statement and transfer;
    calculate the in and out sets for each statement and transfer.   // fix-point

For specific data-flow algorithms you will write next, the above template differs in 1) how gen and kill sets are defined, 2) how in and out sets are calculated, 3) in what order the iteration should be conducted; and 4) when to stop. Hence, you will need to modify this algorithm template to finish the specific analysis.

Liveness Analysis

A liveness analysis calculates, for each statement and transfer in a function, the live-in and live-out variable sets. In liveness analysis, you should calculate in and live_out for the in and out separately; and the fix-point code should calculate the in and out sets for each basic block in a reverse topo-sort order. But for other data-flow analysis, these details may be different.

Exercise 9. Finish the liveness analysis code in the source file cfg/Liveness.java by implementing the aforementioned algorithm.

A particularly interesting application of liveness analysis is to detect uninitialized variables. An uninitialized variable is used before assigned a value, which always indicate some kind of programming errors.

Exercise 10. Design and implement an algorithm to detect uninitialized variables, by leveraging the results of liveness analysis.

Dead-code Elimination

An assignment statement x = v is dead, if the variable x does not live out this statement. Hence, this statement can be safely removed.

Exercise 11. Finish the dead code elimination in cfg/DeadCode.java by implementing the aforementioned algorithm. You should avoid the pitfalls discussed in the Tiger book by not removing live statements. Also note that removing one dead code may have cascading effects to make other code dead, so make sure that your dead-code elimination optimization can eliminate all dead code.

Reaching Definitions

Intuitively, a reaching definition analyzes which definition can reach a given use.

Exercise 12. Implement the reaching definition analysis cfg/ReachDef.java, as described in Table 17.2 of the Tiger book. This algorithm is similar to the liveness analysis algorithm, except for it performs the analysis in a forward manner.

Constant Propagation

A constant propagation propagates a constant definition to it uses. For example, for any statement s of the form x = y op z, where the variable y is defined in by a statement t like y = c where c is a constant. And suppose the definition t is the unique definition that reaches the statement s. Then we can replace, in the statement s, the variable y by the constant c.

Exercise 13. Finish the constant propagation code in the file cfg/ConstProp.java. Note that to this point, the Tiger compiler will perform all these analysis and optimizations in a linear order, however, this is not the desired order for optimizations. So, you may want to modify the cfg/Optimize.java code to use a fancy order.

Copy Propagation

Copy propagation is like constant propagation. Suppose, for any statement s of the form x = y or x = y op z, where y is defined in some statement t like y = m with a variable m. And suppose the definition t is the unique definition that reaches the statement s. Then you can replace the variable y, in the statement s, with the variable m from the statement t.

Exercise 14. Finish the constant propagation code in the file cfg/CopyProp.java.

Available Expressions

Consider the statement x = y op z, if the expression y op z has been computed before this statement, then the expression y op z can reuse the computed result. Hence, you can compute whether the expression y op z is available here.

Exercise 15. Finish the available expression analysis in the file cfg/AvailExp.java.

Common Sub-Expression Elimination (CSE)

For the statement x = y op z, if the expression y op z is available. We can substitute the expression y op z with the most recent definition using y op z. You may refer to the Tiger book section 17.3 for this algorithm.

Exercise 16. Implement common sub-expression elimination in cfg/Cse.java.

Hand-in

This completes the lab. Remember to hand in your solution to the online teaching system.