Lab 2: Data-flow analysis


Overview

In this lab, you will design and implement a dominator tree algorithm, a foundational tool for code optimization. Additionally, you will develop a liveness analysis, a classical backward data flow analysis algorithm, and a reaching definitions analysis, a classical forward data flow analysis algorithm.

Getting Started

First check out the code skeleton for this lab:

    $ git commit -am 'my solution to lab1'
    $ git fetch
    $ git checkout -b lab2 origin/lab2

which will commit your changes to 'lab1' branch to your local Git repository, and create and check out a new lab2 branch from the remote lab2 branch.

You will now need to merge the lab1 into the lab2:

    $ git merge lab1

In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g., if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will report you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files:

    $ git commit -am 'lab2 init'

Lab 2 contains the following new source files, which you should browse through:

    cfg/lab2/Dominator.java:            dominator tree algorithm framework
    cfg/lab2/Liveness.java:             liveness analyses framework
    cfg/lab2/ReachDefinition.java:      reaching definition analyses framework

Hand-in Procedure

When you finished your lab, zip you code and submit to the online teaching system.


Part A: Dominator Tree

In program optimization and analysis, Dominator Tree is a tree structure used to represent the dominance relationship between nodes in a Control Flow Graph (CFG). It is an important tool in compiler design and static program analysis, especially in optimizations (such as loop optimization, dead code elimination) and data flow analysis.

Let DOM[n] represent the dominator of node n. A simple and intuitive way to construct a dominating tree is to utilize the following data-flow equation:

        DOM[n] = {n} ⋃ (⋂p∈ pred[n] DOM[p])
where DOM[n0] = {n0} for the entry block n0.

While the data-flow equation based algorithm is easy to engineer, it is worst-time complexity is O(n2) due to its use of iterations to reach the fixpoint. In this part, you should implement a more efficient algorithm to construct dominator tree using Cooper's algorithm.

Exercise 1. Complete the missing code in file src/cfg/lab2/Dominator.java and build a dominator tree from the control flow graph. You can refer to the readings we provided and this paper to learn how Cooper's algorithm works.

Part B: Liveness Analysis

A liveness analysis calculates, for each statement and transfer in a function, the live-in and live-out variable sets. In liveness analysis, you should calculate in and live_out for the in and out separately; and the fix-point code should calculate the in and out sets for each basic block in a reverse topo-sort order. But for other data-flow analysis, these details may be different.

Exercise 2. Finish the liveness analysis code in the source file cfg/lab2/Liveness.java by implementing the aforementioned algorithm.

Part C: Reaching Definition Analysis

In this part of the lab, You will implement a reaching definition analysis , which tracks the values of variables in a program. Through reaching definition analysis, we can detect the use of undefined variables in the program or perform constant propagation and copy propagation.

The reaching definition analysis algorithm we designed primarily consists of two steps: analyzing variable definition points and iteratively solving data flow equations.

Variable Definition Points Analysis

The analysis of variable definition points is primarily performed to compute the information required for data flow equations. This step involves traversing each function and recording every definition point of variables within the function.

Exercise 3. Read and understand the source code in the file cfg/lab2/ReachDefinition.java, then fill in the function defsForFunction().
You can utilize defSitesProp to record variable definition site.

Gen-Kill Computation

Prior to solving data flow equations, we must also implement functions to compute the gen/kill sets for each program statement. For each statement d: t = x1 op x2, the GEN set is computed as Gen[d] = {d}, and the KILL set is calculated by Kill[d]=defs[t]-{d}.

Exercise 4. Read and understand the source code in the file cfg/lab2/ReachDefinition.java, then fill in the function gkStm().

Solving Data Flow Equations

The solution of data flow equations typically employs a fixed-point algorithm. This involves maintaining a state table where each iteration recomputes the current state. If the state in the current iteration differs from the previous iteration, the algorithm proceeds to the next round until a stable convergence state—referred to as the fixed point—is reached.
It is noteworthy that reaching definition analysis is a forward data flow analysis. For each statement s, its In set is the union of the Out sets from all its predecessor statements: In[s] = ⋃p∈pred[s]Out[pred], while its Out set is computed by Out[s] = Gen[s] ⋃ (In[s] - Kill[s]).

Exercise 5. Implement the fixed-point algorithm in the doitFunction() and complete the doitStm() and doitTransfer() functions to compute the In/Out sets for each program statement/transfer. Afterwards, you can validate your implementation by invoking the test() function within the embedded class UnitTest.

Hand-in

This completes the lab. Remember to hand in your solution to the online teaching system.