Lab 1: Control-flow Graphs


Overview

In this course, you will implement a series of classical program analyses and optimizations on compiler intermediate representations (IRs) using the provided Abstract Syntax Tree (AST) and Control Flow Graph (CFG) data structures. You will use Java as the implementation language. By the end of this course, you will not only learn the theory and practice of compilers, but also gain a deeper understanding about Java.

There are 6 labs planned:

This is the lab 1. In this lab, you will design a translator to transfer AST to CFG, and implement several CFG operations and optimizations.

Software Setup

In this lab, you will use Java as your implementation language, please first check the reference page to install necessary software, if you have not installed the Java platform. Make sure to install the latest Java compiler and runtime (version 24), other versions of Java do NOT work as the Dragon compiler makes use of some latest Java feature in version 24 (e.g., pattern matching or string templates). You can use any Java IDEs (e.g., IntelliJ, or VS code).

Version control and Git

In each lab, we will supply skeleton code for you to start with and modify. These files are controlled and distributed using the Git version control system. So, to access these files, you will first need a Git client installed on your machine. On Linux or Mac, it is easy to install a Git client:

On Windows, there are many Git clients available, among which msysgit is a very popular one, you can install and try it if you are using Windows.

For those who have no prior experience of using Git, do not forget to refer to the official manual when in doubt.

Getting Started

This year, we have put the skeleton code repository on GitHub (hence this courseware is open-source). To check out the code repository for lab 1 to your machine, you should run the following commands:

    $ git clone https://github.com/csslab-ustc/dragon.git

which will create a new folder dragon in which reside all the source files just checked out for the lab 1 branch (the default branch).

Git is a distributed version control system, allowing one to work with the local version of repository on your machine. For instance, if you have finished the exercise 5 in lab 1, you can commit your changes by:

    $ git commit -am 'my solution to lab 1, exercise 5'

which will commit the changes to your local repository.

Now import the project into your favorite Java IDE and browse the source code:

    src/ast/*:              the data structures of Abstract Syntax Tree
    src/cfg/*:              the data structures of Control-flow Graph
    src/frontend/*:         the lexer and parser
    src/control/*:          to control the behavior of the compiler
    src/main/Dragon.java:   the "Main" class serving as compiler entry
    src/util/*:             utility classes
    test/*:                 C programs as testcases (so do not compile them as compiler sources)

Now, build this project on your machine. When the building process finished, a new bin folder would be created. Note that the name of the output folder depends on your specific IDE configurations. For example, the folder may be named out or something else. But we will always name it bin in the following and subsequent labs.

Hand-in Procedure

When you finished your lab, zip you solutions and submit to the USTC online teaching system.


Part A: Translate into CFG

In a compiler, source files undergo two processing steps: lexical analysis and syntax analysis, to be converted into an intermediate representation called Abstract Syntax Tree (AST). The compiler then performs semantic analysis on AST to check for program semantic errors. If the AST passes semantic analysis, the compiler considers the program valid and proceeds to optimization phases. However, AST is generally not ideal data structures for program optimization. To facilitate subsequent optimizations, the compiler converts the AST into another intermediate representation called Control Flow Graph (CFG) that better suits optimization needs.

In our Dragon MiniC compiler, we have already implemented the lexical analysis, syntax analysis, and semantic analysis (located in src/frontend) for you, and translated the source code into AST. In this part of the lab, you will do some programming exercises to translate AST into CFG. Don't worry, we've prepared most of the code for you. You can study the existing codes and fill in the missing parts.

Exercise 1. Read the class definitions of AST and CFG in files src/ast/Ast.java and src/cfg/Cfg.java, make sure you understand them and how they are created and used. As a concrete example, you can see file src/ast/SamplePrograms.java to see how AST is used. You do not need to write any code for this exercise.
Exercise 2. Understand the code in file src/cfg/lab1/Translate.java, fill in the missing parts, and make sure you can translate all AST classes to their corresponding CFG classes. Do not forget to test your code when finished, and fix any bugs before continuing.

Once we have CFG, many awsome optimization operations can be applied on it to improve program efficiency. For example, we can reduce code size by removing unreachable code that will never execute. But, before diving into more advanced topics, let's first learn how to use CFG with a simple exercise.

Exercise 3. As a simple exercise of CFG, you can write a class to count total instructions in the program. Finish the class CFGInstrCounter in file src/cfg/lab1/CFGInstrCounter.java, print the number of instructions for each functions, and basic blocks. For example, for CFG
    int add(int a, int b) {
        int %x_1;
        L_1:
            %x_1 = +(a, b, );
            ret %x_1;
    }
the output looks like
    Function add:
        - Basic block L_1: 2
        - Total: 2
    Total instructions in program: 2

Part B: CFG Operations

In the process of analyzing and optimizing control flow graphs, compilers need to perform various operations and transformations on them. Since the structure of a control flow graph is inherently a directed graph, all operations applicable to directed graphs can be applied to control flow graphs. In this part, you need to implement several CFG operation algorithms.

Depth-first Traversal

Depth-first traversal is a crucial graph algorithm and one of the fundamental algorithms for control flow graph operations.

Exercise 4. Understand the code in file src/util/Graph.java, complete the method dfsDoit(). You should use method testDFS() in file src/util/UtilTest.java to test your code. Fix any bugs before continuing.

Node Sorting

The analysis algorithms for control flow graphs (e.g., data flow analysis) often require traversing nodes of directed graphs in specific orders. To explicitly specify the desired traversal order, linear ordering of nodes in the control flow graph is typically necessary. To represent the sequential execution order of nodes, we typically employ topological sorting. However, traditional topological sorting algorithms can only handle directed acyclic graphs (DAGs) and fail to process control flow graphs with cyclic structures. Thus, a pseudo-topological sorting must be implemented based on the concept of depth-first traversal.

Exercise 5. Complete the method topoSortDoit() in file src/util/Graph.java. You should use method testTopoSort() in file src/util/UtilTest.java to test your code. Fix any bugs before continuing.

Graph Visualization

It will be nice to visualize a control-flow graph, making subsequent graph analysis more intuitive. Nevertheless, to draw a figure prettily is a nontrivial task and may require too much programming effort, especially for large and complex graphs. Fortunately, there are many off-the-shelf graph drawing utilities, so that we do not need to reinvent the wheels. Specifically, your Dragon compiler will make use of the Graphviz, a very popular graph visualization software, to visualize the control-flow graphs generated by your Dragon compiler.

Exercise 6. Download graphviz and install it on your machine. Do not forget to add it to your PATH. To make sure you have installed graphviz correctly, you can run the following command on your prompt:
$ dot --help
Usage: dot [-Vv?] [-(GNE)name=val] [-(KTlso)<val>] <dot files>
(additional options for neato)    [-x] [-n<v>]
(additional options for fdp)      [-L(gO)] [-L(nUCT)<val>]
(additional options for config)  [-cv]
...
Also, you can take a look at its manual, if you are interested in.

Critical Edge Split

In a control flow graph, the edge <a, b> is a critical edge if its predecessor node a has multiple successors and its successor node b has multiple predecessors. Critical edges in the control flow graph may cause errors in certain operations, therefore, they should be split before performing the operations. The following Exercise 7 require the use of Graphviz, so make sure to complete Exercise 6.

Exercise 7. Complete the method findCriticalEdges() and splitEdge() in file src/util/Graph.java. Think carefully how to handle the parameters dataSupplier and dataConnector. You should use method testCriticalEdgeSplit() in file src/util/UtilTest.java to test your code before continuing. Once the test is passed, you will obtain a file test-graph.dot and a graph test-graph.dot.png, check them.

Part C: Optimizations

This part explores essential compiler optimization techniques to improve program performance. You will implement local value numbering, which eliminates redundant computations by identifying and reusing equivalent expressions, and function inlining, which replaces function calls with their actual body to reduce call overhead and enable further optimizations.

Exercise 8. Read the code in cfg/Cfg.java, make sure you understand the Java code defining CFG data structures we offered you. Then finish the code in cfg/lab1/ValueNumber.java, to eliminate redundant computations. Your task is to identify and replace equivalent expressions within a basic block. Do not forget to test your code when finished, and fix any bugs before continuing.
Exercise 9. Suppose we inline a callee function into its caller only when the following conditions are met: Finish the code in cfg/lab1/Inline.java, to reduce function call overhead and improve performance. Do not forget to test your code when finished, and fix any bugs before continuing.

Hand-in

This completes the lab. Remember to submit your solution to the online teaching system.