Lab 4: Theory of EUF


Overview

In this lab, you will be familiarizing yourself with the theory of equality and uninterpreted functions (EUF), using Z3.

This lab is divided into two parts, each of which contains both some tutorials and exercises. The first part covers some basic SMT problems for EUF; and the second part is about building a translation validator for a simple optimizing compiler. Some problems are tagged with Exercise, which you should solve. And several problems are tagged with Challenge, which are optional. Download this code template to start with.

Hand-in Procedure

When you finished the lab, zip you code files with file name <ID>-lab-4.zip (e.g., SA19225789-lab-4.zip), and submit it to USTC Online Teaching Platform.


Part A: Basic EUF theory

In this part, let us start to learn how to solve basic SMT problems in EUF theory with Z3. To declare an uninterpreted function and term, you can declare a new sort, and terms and functions on that sort:

    S = DeclareSort('S')
    e = Const('e', S)
    f = Function('f', S, S)
    g = Function('g', S, S, S)

the above code declared a new abstract sort S, and a term e on S, and a function f with signature S->S, that is, function f accepts argument of type S, and returns a value of type S. And function g is very much like the function f, except for function g accepts two arguments, both of them are type S.

With the above declarations, we can ask Z3 to solve EUF constraints. For example, running the following Z3 constraints

    solve(e != e)
    solve(e == f(e))
    solve(e == f(g(e, e))

will generate the following output:

    no solution
    [e = S!val!0, f = [else -> S!val!0]]
    [e = S!val!1, g = [else -> S!val!0], f = [else -> S!val!1]]

which indicates that:

  1. the first constraint has no solution;
  2. the second constraint has a solution that the variable e is equal to a constant S!val!0; and
  3. the third and final constraint has a solution that the variable e is equal to the constant S!val!1, the function g and f maps all arguments to the constants S!val!0 and S!val!1, respectively.
Exercise 1: Find euf.py file and run it. Explain the output to yourself.

The most important rule to reason about EUF is the congruence rule:

            s1=t1      ...      sn=tn
    ------------------------------------------- (Congruence)
         f(s1, ..., sn) = f(t1, ..., tn)

which is often used to prove the equivalence of different implementations of the same algorithm. In the class, we discuss such an example to calculate the integer powers, in the following exercise, you'll be implementing the equivalence of the two algorithms.

Exercise 2: Finish the code in the equiv.py file, to prove the equivalence of the two algorithms.

Part B: Translation validation

In this part, you'll be implementing a translation validator for an optimizing compiler for a small language.

A compiler is a software translating a program in a source language into an equivalent program in a target language. For example, a C compiler may translate source C programs into functional equivalently x86 target programs. Furthermore, an optimizing compiler will optimize the program during the translation, in order to improve the code (e.g., to improve execution efficiency or to reduce code size).

Nevertheless, a compiler is one of the most important pieces of the software stack, which we must guarantee the high confidence of its correctness. For this purpose, we can engineer a verified compiler, like the CompCert, in which every phase of a compiler is mechanically proved to be correct. Though this approach offers a very high confidence of correctness, the engineering effort is considerable, due to the fact that a significant amount of the correctness proofs must be carried out manually.

In contrast, a translation validation approach does not try to the correctness of the translation, but to try to validate the output from each compiler run is equivalent with the input program. This can be summarized in the following figure:

    input(source program) ---> compiler translation ---> output(target program)
      |                                                       |
      ------------------------> translation validator <--------

It is not hard to see, this is weaker than the fully verified approach, because we do not prove that the compiler is fully correct, but just verify the output of each compiler run. Though this is a lightweight approach, the key advantage of this approach is this can be fully automated, saving considerable engineering efforts.

The Source Language

In this part of the lab, the source language we want to compile is called Calc, which is a shorthand for the calculator language. The context free grammar for this language is given by:

    bop ::= + | - | * | /
    E   ::= x | E bop E
    S   ::= x=E
    F   ::= f(x1, ..., xn){S;* return E;}

The non-terminal bop stands for a binary operator, consisting of a bunch of syntactic forms: addition, subtraction, multiplication, and division.

The non-terminal E stands for an expression, consisting of a basic variable x, or binary operation.

A statement S contains only one syntactic form: the assigment.

A function F has list of statements S, followed by a return statement.

The first step is to encode the syntax in Python, this is included in the calc.py file.

Exercise 3: Read the calc.py and counter.py files. Your task is to finish the pretty printing function pp_func, pp_stm and pp_exp.

As we discussed in the class, in order to reason about programs in this language, we should convert programs in this language into static single assignment form, or SSA form. SSA conversion, generally speaking, is a little tricky to do, but for the purpose of this language Calc, the conversion is not very complicated, as Calc does not contain control-flow constructs. The key algorithm is to generate a fresh variable name for each variable being assigned to, and rewrite expressions to refer to the latest variable names.

Exercise 4: Continue on the calc.py file, and make sure you understand the functions to_ssa_func(), to_ssa_stm(), and to_ssa_exp(). We have provide you all code, so that you do not need to write any code.

Once we have the SSA form, we can generate Z3 constraints on that form.

Exercise 5: Continue on the calc.py file, and make sure you understand the functions gen_cons_func(), gen_cons_stm() and gen_cons_exp(). You don't need to write any code.

The Target Language

Now let us continue to discuss the target language. The language we defined is called Tac, which stands for three address code, one of the popular compiler intermediate languages. The context free grammar for Tac is presented below:

    S ::= x=y | x=y+z | x=y-z | x=y*z | x=y/z
    F ::= f(x1, ..., xn){S;* return x;}

a statement S consists of several syntactic forms: variable assignment, addition, subtraction, multiplication, and division. As the grammar shows: one essential feature of Tac is that all computation are atomic, that is, expressions are not nested.

Exercise 6: Read the tac.py, and make sure you understand the data structures Stm and Function for defining the syntax of Tac. Please finish the pretty printing method pp_func, pp_stm, and pp_exp. After finishing this part of code, do not forget to test your implementation.
Exercise 7: Continue on the tac.py file, and finish the SSA conversion function to_ssa_func(), to_ssa_stm(), and to_ssa_exp(). Test your implementation when finished.
Exercise 8: Continue on the tac.py file, and finish the Z3 constraint generation functions gen_cons_func(), gen_cons_stm(), and gen_cons_exp(). Test your implementation when finished.

The Compiler

Having both the source language Calc and the target language Tac, we can now implement a compiler translating from a source program to a target program. The compiler implementation is in the file compiler.py, with three key functions:

Exercise 9: Finish the compiler implementation by filling in the missing code in compile_func(), compile_stm(), and compile_exp() in the source file compiler.py. Test your implementation when finished.

Now, we have finished the implementation of the compiler, we can implement the translation validator for the compiler. Essentially, a translation validator will check the equivalence of the constraints generated from the source program and the target program generated by the compiler.

Exercise 10: Finish the translation validator in compiler.py by filling in the missing code in the function translation_validation(). Test your implementation when finished.
Challenge: Scaling the translation validation techniques to real production optimizing compilers like GCC or LLVM. You can first extend the source and target languages, by supporting more language features; then you will extend your compiler. Although the principle is essentially the same, the compiler being validated will get complicated to support more language features. You can take a look at this paper to get some ideas. (Note that in different translation validators, the notion of program equivalence may be different.)


This completes this lab. Remember to zip you homework with file name <ID>-lab-4.zip (e.g., SA19225789-lab-4.zip), and submit it to USTC Online Teaching Platform.

Happy hacking!