In this lab, you will design and implement intermediate
representations for your Tiger
compiler, and lower
down MiniJava programs by translating
the abstract syntax trees to the intermediate representation
you designed.
This lab consists of four parts. In part A, you will design and implement a specific intermediate representation: a control-flow graph (CFG). In part B, you will eliminate the object-oriented features in MiniJava by using a prefixing algorithm. In part C, you will implement a translator to lower down the abstract syntax tree to the control-flow graph. Finally, in part D, your will design and implement several classical program analysis and optimizations for CFG.
First check out the source we offered you for lab 3:
$ git commit -am 'my solution to lab2' $ git checkout -b lab3 origin/lab3
these commands will first commit your changes to the lab2 branch of your local Git repository, and then create a local lab3 branch and check out the remote lab3 branch into the new local lab3 branch.
Again, you will need to merge your code from lab2 into the new lab3 branch:
$ git merge lab2
Do not forget to resolve any conflicts before commiting to the local lab3 branch:
$ git commit -am 'lab3 init'
You should first import the new Lab3 code into your editor, and make sure the code compiles. There are a bunch of new files that you should browse through:
cfg/*: control-flow graph data structures and operations
When you finished this lab, zip you code and submit to the online teaching system.
A control-flow graph (CFG) is a graph-based program intermediate representation with basic blocks as nodes and control transfers between blocks as edges. A control-flow graph is a good intermediate representation as it makes every aspect of program control- and data-flow properties explicit thus easier to compute. As a result, in modern compilers, CFGs serve as not only carriers for compiler optimizations, but also backbones for more advanced intermediate representations such as static single-assignment forms (SSA).
In this part of the lab, you will first design data structures defining the control-flow graph.
cfg/Cfg.java
, make sure you understand
the Java code defining CFG data structures we offered you.
Your job is to extend the CFG
data structure to support all language
features of MiniJava. You can work iteratively, that is,
you can continue to the subsequent
exercises, and go back to extend the CFG data structure when necessary.
To inspect the generated data structure, you might compile the test
case we provide. For example,
you can compile the test case test/SumRec.java
to
inspect the generated CFG:
$ java --enable-preview -cp ./bin Tiger -builtin SumRec.java -trace cfg.Translate.doitProgram
It is often important to measure the sizes of the generated CFG, which reflects the memory occupation. For example, after you have implemented a compiler optimization to shrink the target CFG, you can evaluate the effectiveness of your optimization by comparing the sizes before and after that optimization.
size()
method for each syntactic
category in the file
cfg/Cfg.java
, to measure the size of the CFG
in terms of the numbers of functions,
basic blocks, and statements. The output might
consist of a list of triples <methodNam, nBlock, nStms>.
For example, the following output
<#methods, #blocks, #statements> --------------------------------- <"f1", 5, 300> <"g", 30, 400> ... --------------------------------- subtotal 140, 25000specifies the method
f1
has
5 basic blocks and 300 statements, whereas the method
g
has 30 basic blocks and 400 statements, and
so on.
In a summary, the program has 140 basic blocks
with 25,000 statements.
It will be nice to visualize a control-flow graph, making subsequent graph analysis more intuitive. Nevertheless, to draw a figure prettily is a nontrivial task and may require too much programming effort, especially for large and complex graphs. Fortunately, there are many off-the-shelf graph drawing utilities, so that we do not need to reinvent the wheels. Specifically, your Tiger compiler will make use of the Graphviz, a very popular graph visualization software, to visualize the control-flow graphs generated by your Tiger compiler.
$ dot --help
Usage: dot [-Vv?] [-(GNE)name=val] [-(KTlso)<val>] <dot files> (additional options for neato) [-x] [-n<v>] (additional options for fdp) [-L(gO)] [-L(nUCT)<val>] (additional options for config) [-cv] ...Also, you can take a look at its manual, if you are interested in.
The Tiger compiler has rudimentary support to visualize a CFG, that is, it can draw a graph for each function in a program, respectively.
dot()
in the file
cfg/Cfg.java
, to visualize
each function. You can run the test case test/SumRec.java
:
$ java --enable-preview -cp ./bin Tiger -builtin SumRec.java -dot cfgto draw a sample CFG. Your task is to extend the
dot()
to draw statements and transfers in each block besides block labels.
Do not forget to test your Tiger compiler extensively
before continuing. Especially, make sure your Tiger compiler
can compile all testcases under the directory ./test/
.
The class elimination pass implements object-oriented features
in MiniJava by lowering down them to low-level constructs.
Specifically, this pass finishes three sub-tasks:
1) it eliminates
classes by translating them to C-style structures;
2) it closes each (non-static) method by introducing
an explicit this
as the first parameter; and
3) it turns each method invocation into a virtual function call.
In the following, we will discuss the first two steps and leave
the last step to the next section (part C).
The prefixing algorithm builds an inheritance tree, and eliminates class inheritance by a level-order tree walking.
buildInheritTree0()
to build an inherit
tree, with each tree node containing a class.
When finished, you might visualize the tree to test
your implementation.
prefixOneClass()
to implement the
prefixing algorithm.
To this, a MiniJava class will be translated into
two components: a structure (like struct
in C),
a virtual function table holding a list of function pointers to
all the constituting methods in that class.
Test your Tiger compiler and fix any bugs before continuing.
prefixOneClass()
, to close each method by introducing
an extra this
argument.
Pay special attention to its type. For simplicity, you can
generate an empty CFG function for each AST method serving as
place-holders to be filled in subsequently.
The CFG generation phase generates a CFG representation for the given abstract syntax tree. The generation makes use of a recursive-decedent algorithm.
doitMethod()
, doitStm()
, and
doitExp()
, to translate a given method, statement, and
expression to its corresponding CFG representation, respectively.
Do not forget to test your Tiger compiler using
the test cases in test/
as well as your own
ones.
Fix any bugs before continuing.
In this part of the lab, you will familiarize yourself with program analysis and compiler optimizations by writing several classical program analysis (e.g., liveness analysis, reaching definitions, available expressions), and optimizations (e.g., constant propagation, copy propagation, dead code elimination).
You will write most data-flow analysis algorithms with the following general template:
data_flow_analysis() calculate the gen and kill sets for each statement and transfer; calculate the in and out sets for each statement and transfer. // fix-point
For specific data-flow algorithms you will write next,
the above template differs in
1) how gen
and kill sets
are defined, 2) how in
and out
sets are calculated, 3) in what order the iteration
should be conducted; and 4) when to stop.
Hence, you will need to
modify this algorithm template to finish the specific analysis.
A
liveness analysis
calculates, for each statement
and transfer in a function, the live-in and live-out variable sets.
In liveness analysis, you should calculate
in
and live_out
for the in
and out
separately; and the fix-point code should
calculate the in
and out
sets for
each basic block in a reverse topo-sort order.
But for other data-flow analysis, these details may be different.
cfg/Liveness.java
by implementing the
aforementioned algorithm.
A particularly interesting application of liveness analysis is to detect uninitialized variables. An uninitialized variable is used before assigned a value, which always indicate some kind of programming errors.
An assignment statement x = v
is dead,
if the variable x
does not live out this
statement. Hence, this statement can be safely
removed.
cfg/DeadCode.java
by implementing the
aforementioned algorithm. You should avoid the pitfalls
discussed in the Tiger book by not removing
live statements. Also note that removing one dead code may
have cascading effects to make
other code dead, so make sure that your dead-code elimination
optimization can eliminate all dead code.
Intuitively, a reaching definition analyzes which definition can reach a given use.
cfg/ReachDef.java
, as described
in Table 17.2 of the Tiger book.
This
algorithm is similar to the liveness analysis algorithm, except for
it performs the analysis in a forward manner.
A constant propagation propagates a constant definition
to it uses. For example, for any statement s
of the form x = y op z
, where the variable
y
is defined in
by a statement t like y = c
where c
is a constant. And
suppose the definition t is the unique definition that reaches
the statement s. Then we can replace, in
the statement s, the variable y
by
the constant c
.
cfg/ConstProp.java
. Note that to this point, the
Tiger compiler will perform all these analysis and optimizations
in a linear order, however, this is not the desired order
for optimizations. So, you may want to modify the
cfg/Optimize.java
code to use a
fancy order.
Copy propagation is like constant propagation.
Suppose, for any statement s
of the form x = y
or x = y op z
, where y
is defined in
some statement t like y = m
with a variable m
. And
suppose the definition t is the unique definition that reaches
the statement s. Then you can replace the variable y
, in
the statement s, with the variable m
from the statement t.
cfg/CopyProp.java
.
Consider the statement x = y op z
, if the expression
y op z
has been computed before this statement, then
the expression y op z
can reuse the computed result.
Hence, you can compute whether the expression y op z
is available here.
cfg/AvailExp.java
.
For the statement x = y op z
, if the expression
y op z
is available. We can substitute the
expression y op z
with the most recent definition
using y op z
. You may refer to the Tiger book
section 17.3 for this algorithm.
cfg/Cse.java
.
This completes the lab. Remember to hand in your solution to the online teaching system.