In this course, you will implement a series of classical program analyses and optimizations on compiler intermediate representations (IRs) using the provided Abstract Syntax Tree (AST) and Control Flow Graph (CFG) data structures. You will use Java as the implementation language. By the end of this course, you will not only learn the theory and practice of compilers, but also gain a deeper understanding about Java.
There are 6 labs planned:
This is the lab 1. In this lab, you will design a translator to transfer AST to CFG, and implement several CFG operations and optimizations.
In this lab, you will use Java as your implementation language, please first check the reference page to install necessary software, if you have not installed the Java platform. Make sure to install the latest Java compiler and runtime (version 24), other versions of Java do NOT work as the Dragon compiler makes use of some latest Java feature in version 24 (e.g., pattern matching or string templates). You can use any Java IDEs (e.g., IntelliJ, or VS code).
In each lab, we will supply skeleton code for you to start with and modify. These files are controlled and distributed using the Git version control system. So, to access these files, you will first need a Git client installed on your machine. On Linux or Mac, it is easy to install a Git client:
$ sudo apt install git
$ brew install git
On Windows, there are many Git clients available, among which msysgit is a very popular one, you can install and try it if you are using Windows.
For those who have no prior experience of using Git, do not forget to refer to the official manual when in doubt.
This year, we have put the skeleton code repository on GitHub (hence this courseware is open-source). To check out the code repository for lab 1 to your machine, you should run the following commands:
$ git clone https://github.com/csslab-ustc/dragon.git
which will create a new folder dragon
in
which reside all the source files just checked out for the
lab 1 branch (the default branch).
Git is a distributed version control system, allowing one to work with the local version of repository on your machine. For instance, if you have finished the exercise 5 in lab 1, you can commit your changes by:
$ git commit -am 'my solution to lab 1, exercise 5'
which will commit the changes to your local repository.
Now import the project into your favorite Java IDE and browse the source code:
src/ast/*: the data structures of Abstract Syntax Tree src/cfg/*: the data structures of Control-flow Graph src/frontend/*: the lexer and parser src/control/*: to control the behavior of the compiler src/main/Dragon.java: the "Main" class serving as compiler entry src/util/*: utility classes test/*: C programs as testcases (so do not compile them as compiler sources)
Now, build this project on your machine. When the
building process finished, a new bin
folder
would be created. Note that the name of the output folder
depends on your specific IDE configurations.
For example, the folder may be named out
or
something else. But we will always name it bin
in the following and subsequent labs.
When you finished your lab, zip you solutions and submit to the USTC online teaching system.
In a compiler, source files undergo two processing steps: lexical analysis and syntax analysis, to be converted into an intermediate representation called Abstract Syntax Tree (AST). The compiler then performs semantic analysis on AST to check for program semantic errors. If the AST passes semantic analysis, the compiler considers the program valid and proceeds to optimization phases. However, AST is generally not ideal data structures for program optimization. To facilitate subsequent optimizations, the compiler converts the AST into another intermediate representation called Control Flow Graph (CFG) that better suits optimization needs.
In our Dragon MiniC compiler, we have already implemented the lexical analysis,
syntax analysis, and semantic analysis (located in src/frontend
)
for you, and translated the source code into AST.
In this part of the lab, you will do some programming exercises to translate
AST into CFG. Don't worry, we've prepared most of the code for you.
You can study the existing codes and fill in the missing parts.
src/ast/Ast.java
and src/cfg/Cfg.java
,
make sure you understand them and how they are created and used.
As a concrete example, you can see file src/ast/SamplePrograms.java
to see how AST is used. You do not need to write any code for this exercise.
src/cfg/lab1/Translate.java
, fill in the
missing parts, and make sure you can translate all AST classes to their
corresponding CFG classes. Do not forget to test your code when finished,
and fix any bugs before continuing.
Once we have CFG, many awsome optimization operations can be applied on it to improve program efficiency. For example, we can reduce code size by removing unreachable code that will never execute. But, before diving into more advanced topics, let's first learn how to use CFG with a simple exercise.
CFGInstrCounter
in file src/cfg/lab1/CFGInstrCounter.java
,
print the number of instructions for each functions, and basic blocks.
For example, for CFG
int add(int a, int b) { int %x_1; L_1: %x_1 = +(a, b, ); ret %x_1; }the output looks like
Function add: - Basic block L_1: 2 - Total: 2 Total instructions in program: 2
In the process of analyzing and optimizing control flow graphs, compilers need to perform various operations and transformations on them. Since the structure of a control flow graph is inherently a directed graph, all operations applicable to directed graphs can be applied to control flow graphs. In this part, you need to implement several CFG operation algorithms.
Depth-first traversal is a crucial graph algorithm and one of the fundamental algorithms for control flow graph operations.
src/util/Graph.java
,
complete the method dfsDoit()
.
You should use method testDFS()
in file
src/util/UtilTest.java
to test your code.
Fix any bugs before continuing.
The analysis algorithms for control flow graphs (e.g., data flow analysis) often require traversing nodes of directed graphs in specific orders. To explicitly specify the desired traversal order, linear ordering of nodes in the control flow graph is typically necessary. To represent the sequential execution order of nodes, we typically employ topological sorting. However, traditional topological sorting algorithms can only handle directed acyclic graphs (DAGs) and fail to process control flow graphs with cyclic structures. Thus, a pseudo-topological sorting must be implemented based on the concept of depth-first traversal.
topoSortDoit()
in file
src/util/Graph.java
.
You should use method testTopoSort()
in file
src/util/UtilTest.java
to test your code.
Fix any bugs before continuing.
It will be nice to visualize a control-flow graph, making subsequent graph analysis more intuitive. Nevertheless, to draw a figure prettily is a nontrivial task and may require too much programming effort, especially for large and complex graphs. Fortunately, there are many off-the-shelf graph drawing utilities, so that we do not need to reinvent the wheels. Specifically, your Dragon compiler will make use of the Graphviz, a very popular graph visualization software, to visualize the control-flow graphs generated by your Dragon compiler.
$ dot --help
Usage: dot [-Vv?] [-(GNE)name=val] [-(KTlso)<val>] <dot files> (additional options for neato) [-x] [-n<v>] (additional options for fdp) [-L(gO)] [-L(nUCT)<val>] (additional options for config) [-cv] ...Also, you can take a look at its manual, if you are interested in.
In a control flow graph, the edge <a, b> is a critical edge if its predecessor node a has multiple successors and its successor node b has multiple predecessors. Critical edges in the control flow graph may cause errors in certain operations, therefore, they should be split before performing the operations. The following Exercise 7 require the use of Graphviz, so make sure to complete Exercise 6.
findCriticalEdges()
and
splitEdge()
in file src/util/Graph.java
.
Think carefully how to handle the parameters dataSupplier
and
dataConnector
.
You should use method testCriticalEdgeSplit()
in file
src/util/UtilTest.java
to test your code before continuing.
Once the test is passed, you will obtain a file test-graph.dot
and a graph test-graph.dot.png
, check them.
This part explores essential compiler optimization techniques to improve program performance. You will implement local value numbering, which eliminates redundant computations by identifying and reusing equivalent expressions, and function inlining, which replaces function calls with their actual body to reduce call overhead and enable further optimizations.
cfg/Cfg.java
, make sure you understand
the Java code defining CFG data structures we offered you.
Then finish the code in cfg/lab1/ValueNumber.java
,
to eliminate redundant computations.
Your task is to identify and replace equivalent expressions within a basic block.
Do not forget to test your code when finished, and fix any bugs before continuing.
cfg/lab1/Inline.java
,
to reduce function call overhead and improve performance.
Do not forget to test your code when finished,
and fix any bugs before continuing.
This completes the lab. Remember to submit your solution to the online teaching system.