Lab 6: Theories for Data Structures


Overview

In the class, we've discussed several important theories to reason about data structures: bit vectors, arrays, pointers. etc., which are widely used in practical reasoning and verifications.

In this assignment, you'll be familiarizing yourself with these theories, with the reasoning of practical data structures, using Z3's support for these theories. To be specific, this assignment is divided into three parts, each of which contains both some tutorials and problems. The first part is about the theory of bit vectors; the second part covers theory of arrays; and the third part covers the theory about pointers. Some problems are tagged with Exercise, which you should solve. And several problems are tagged with Challenge, which are optional. Download this code templates to start with.

Before getting started with this assignment, make sure you've finished Software Setup in the lab 1, and have Z3 and Python (3.12+ recommended) properly installed on your computer. For any problems, please feel free to contact us for help.

Hand-in Procedure

When you finished the lab, zip you code files with file name <ID>-lab-6.zip (e.g., SA19225789-lab-6.zip), and submit it to USTC Online Teaching Platform. Any late submission will NOT be accepted.


Part A: Bit vectors

A bit vector is a very compact data structure with every bit 0 or 1, in a fixed constant width \(l\): $$ b_{l-1}, \ldots, b_0 $$ where \(b_0\) is the least-significant bit, and \(b_{l-1}\) is the most significant bit.

Bit vectors support a large group of operations, as we will discuss in the following. So from the data structure point of view, a bit vector is a concrete data type (CDT).

Basic properties

To build the models and reason the properties of bit vectors, Z3 has built-in support for bit vectors, for instance, the following code:

    x, y = BitVecs('x y', 2)
    solve(x+y==9)

declares two variable bit vectors x and y with fixed-width \(2\), then asks Z3 whether the equality x+y==9 can be satisfiable. And Z3 will generate the result.

Exercise 1: Run this code for this exercise we give in the bit_vectors.py Python file, what output do you get? Is the number "9" representable using just 2 bits? And explain to yourself how the modulo semantics work here. You don't need to write any code here.

Z3 has a very rich set of operations for bit vectors, you can refer to the official APIs. For your convenience, here is a table containing most operations:

Operations Signed Unsigned Sample code
Addition: + + x + y
Subtraction: - - x - y
Multiplication: * * x * y
Division: / UDiv x/y, UDiv(x, y)
Bitwise-and: & & x&y
Bitwise-or: | | x|y
Bitwise-xor: ^ ^ x^y
Bitwise-not: ~ ~ ~x
EQ: == == x==y
NE: != != x!=y
GE: >= UGE x>=y, UGE(x, y)
GT: > UGT x>y, UGT(x, y)
LE: <= ULE x<=y, ULE(x, y)
LT: < ULT x < y, ULT(x, y)
Left Shift: << << x << y
Arithmetic Right Shift: >> >> x >> y
Logical Right Shift: LShR LShR LShR(x,y)
Mod: % % x%y
Remainder: SRem URem SRem(x, y),URem(x, y)
Position: + + +x
Negation: - - -x
Bit-vector Concatenation: Concat Concat Concat(x,y)
Bit-vector Extraction: Extract Extract Extract(x,y,z)/Extract(StringVal,x,y)
Rotate to the Left: RotateLeft RotateLeft RotateLeft(x,y)
Rotate to the right: RotateRight RotateRight RotateRight(x,y)
Extra sign-bits: SignExt SignExt SignExt(x,y)
Extra zero-bits: ZeroExt ZeroExt ZeroExt(x,y)
RepeatBitVec: RepeatBitVec RepeatBitVec RepeatBitVec(x,y)
Reduction-and: BVRedAnd BVRedAnd BVRedAnd(x)
Reduction-or: BVRedOr BVRedOr BVRedOr(x)
Bit-vector addition does not overflow: BVAddNoOverflow BVAddNoOverflow BVAddNoOverflow(x,y,signed)
Bit-vector addition does not underflow: BVAddNoUnderflow BVAddNoUnderflow BVAddNoUnderflow(x,y)
Bit-vector multiplication does not overflow: BVMulNoOverflow BVMulNoOverflow BVMulNoOverflow(x,y)
Bit-vector signed multiplication does not underflow: BVMulNoUnderflow BVMulNoUnderflow BVMulNoUnderflow(x,y,signed)
Bit-vector signed division does not overflow: BVSDivNoOverflow BVSDivNoOverflow BVMulNoUnderflow(x,y)
Bit-vector unary negation does not overflow: BVSNegNoOverflow BVSNegNoOverflow BVSNegNoOverflow(x)
Bit-vector subtraction does not overflow: BVSubNoOverflow BVSubNoOverflow BVSubNoOverflow(x,y)
Bit-vector subtraction does not underflow: BVSubNoUnderflow BVSubNoUnderflow BVSubNoUnderflow(x,y,signed)

In the following, you'll often need to refer to this table when necessary.

Exercise 2: In this exercise, you will write a function count_one_in_bit_vector() to count the amount of 1 in the bit vector. You can start with the code bit_vectors.py we provide you.

Integer overflows

As bit vectors have only fixed width, so they have very different properties from the normal mathematical integers (with arbitrary width), hence, it's very tricky to reason about bit vectors.

Recall that in the class, we discussed a buggy version of the binary search algorithm reported by Joshua Bloch. The bug is demonstrated by the following Java code:

    // To search the value "target", in the input array "arr".
    // Return the index when searching succeeded, or return -1 when failed.
    // This is the implementation in an early version of the "java.util.Arrays"
    int binarySearch(int[] arr, int target){
        int low = 0;
        int high = arr.length - 1;

        while(low <= high){
            int middle = (low + high)/2;

            if(arr[middle] == target)
                return middle;

            if(arr[middle] > target)
                high = middle - 1;
            else
                low = middle + 1;
        }
        return -1;
    }
Exercise 3: In this exercise, you're required to find the bug about the integer overflow in the Java code showing above and using z3's bit-vector to reproduce the bug. You can start with the code bit_vectors.py we provide you. To compute the correct result of integer average, we've given a correct function int_average_correct() and a checking function check_average(), to check whether or not a given input function "f" that your provide is correct.
Exercise 4: Read the code
        check_average(int_average_v1, True)
    
to check whether the function int_average_v1() we give in the bit_vectors.py Python file is correct. Does Z3 complain? Why or why not?

Given a Java source code which accepts two parameters, that is provided by z3 after running the function check_average():

    public class IntAverage {
        public static void main(String[] args) {
            String x = args[0];
            String y = args[1];

            int_average(Integer.parseInt(x), Integer.parseInt(y));
        }

        private static int int_average(int x, int y) {
            int z = (x + y) / 2;
            assert z >= 0 : "Generating a integer overflow bug!";
            return z;
        }
    }
Exercise 5: We build the Java source code into a Jar and provide a python function invoke_java_int_average in the bit_vectors.py to automatically get the z3's output and invoke this Jar. Run the code, and what's your observation? What conclusion you can draw from this code.

Joshua J. Bloch proposed the following solution to solve integer overflow problem:

    def int_average_v2(x, y):
        return x + (y-x)/2
Exercise 6: Read the code in the bit_vectors.py, and to check whether the above function is correct. Does Z3 complain? Why or why not?

Joshua J. Bloch proposed a second solution to solve integer overflow problem: $$ (x+y) \ggg 1 \tag{A1}\label{A1} $$

Exercise 7: Complete the function int_average_v3() which implement formula \(\eqref{A1}\) in the bit_vectors.py file, (note that \(\ggg\) means logical right shift).

In the function int_average_v3(), we only provide bit vectors which are all non-negation, and what will happen, if we provided negation bit vectors.

Exercise 8: Read the code in the bit_vectors.py, and to check whether the function int_average_v3() is correct. Does Z3 complain? Why or why not?

How to write a correct function, which takes as input two bit vectors x and y, to compute the average of two arbitrary integers (negative or non-negative)?

Exercise 9: In this exercise, you're required to write a function int_average(x, y), which takes as input two bit vectors x and y, to compute the average of two arbitrary integers (negative or non-negative). You can start with the code bit_vectors.py we provide you. Note that we have given you, in that code, a correct implementation of the integer average, you can read it for some idea, but you must write you own code, and don't copy the version we provide.

Hint 1: this exercise is harder than you may imagine, you may want to search online for the correct implementation of average on fix-width integers. For instance, you can refer to the Hacker's Delight book (page 9, section 2.5) by Henry S. Warren (this is a very good book containing many delighting programming tricks).

Integer overflows are a notorious source of bugs and are very difficult to debug. Read the following C code:

    int myfunction(int *array, int len){
        int *arr, i;

        arr = malloc(len * sizeof(int));    /* [1] */
        if(arr == NULL){
            return -1;
        }

        for(i = 0; i < len; i++){              /* [2] */
            arr[i] = array[i];
        }

        return arr;
    }

There is an Integer overflow bug at [1], which will cause buffer overflow at [2].

Exercise 10: Explain to yourself why there exists an integer overflow bug at [1], and why it triggers a buffer overflow at [2]. Then read the multi_overflow() method in bit_vectors.py Python file, to understand how we can use Z3 to justify the existence of overflows. You don't need to write any code in this exercise.
As we have shown, Z3 is powerful enough to detect such integer overflows. The idea is to model this program using Z3, and asks Z3 whether some tricky constraints can be satisfied.
Exercise 11: Read the code in bit_vectors.py, complete the multi_with_overflow() which has the capability to detect overflows.

In the class, we discussed the famous Fermat's Last Theorem, which states the equation: $$ x^n + y^n = z^n $$ has no integer solution for n>=3. With aid of computers, for any given n, we can enumerate all possible values for x, y and z, and to check whether the equation helds. Although due to the limited computing resources of modern computer, we can not verify this equation for any n, however, we can at least verify the equation for large enough x, y and z. And this at least give us more evidence that this theorem does not have solution. (Of course, if we do find a solution, then we have proved Fermat's Last Theorem does not hold at all.)

Exercise 12: Read the code in bit_vectors.py, and finish the constraint generation function fermat(x, y, z, n) for any n. For n==3, how long does your program run to obtain the result?
Challenge: So far, we use Z3py's API to solve problem related to bit vector theory. In this part of the assignment, we'll implement the bit-blasting algorithm, as described in the class. First, let's define the basic data structures to represent bit vector syntax:
    E ::= c | x | E&E | E|E | ~E
    R ::= E=E | E!=E
    P ::= R | P/\P
    
Read the bit_blast.py Python file, complete the missing code to implement the bit-blasting algorithm.

Part B: Arrays

Z3 has a very rich set of operations for arrays, you can refer to the official APIs. For your convenience, here is a table containing most operations:

Definition Operations Sample code
Store a value into array: Store Store(Array, index , value)
Get a value from array: Select Select(Array, index)

In the following, you'll often need to refer to this table when necessary.

In order to formulate a programme of a mathematical theory for computation, McCarthy proposed a basic theory of arrays as characterized by the select-store axioms. The expression $$ Select(A, i) \tag{B1}\label{B1} $$ returns the value stored at position \(i\) of the array \(A\); and the expression $$ Store(A, i, v) \tag{B2}\label{B2} $$ returns a new array identical to the original array \(A\), except for on position \(i\) it contains the value \(v\). In Z3Py, we can write expression \(\eqref{B1}\) as Select(Array, index) or Array[i], and write expression \(\eqref{B2}\) as Store(Array, index, value).

Now look at the following code:

    # Use I as an alias for IntSort()
    I = IntSort()
    # A is an array from integer to integer
    A = Array('A', I, I)
    x = Int('x')
    print(A[x])
    print(Select(A, x))
    print(Store(A, x, 10))
    print(simplify(Select(Store(A, 2, x+1), 2)))
Exercise 13: Read the code above and run the array_test() method it in array.py, try to get familiarize yourself with the basic usage of Array in Z3. For more details, you may want to refer to the official ArrayRef documents.

In class, we've discussed the formula: $$ Store(A, i, x)[i] \ge x \tag{B3}\label{B3} $$ In order to validate it, we can make use of the array support in Z3 directly.

Exercise 14: Complete the missing code of array_proof() method in the array.py Python file, make it can prove formulae \(\eqref{B3}\)

In class, we also discussed the array property, which poses some restrictions on the possible shape of index, etc. One implication of the array property is that some array formulae are undecidable. For instance, the formulae like the following $$ Store(A, i*i - i*i, x)[0] \ge x \tag{B4}\label{B4} $$ is undecidable, as it contain non-linear arithmetic. In general, for such formulae, Z3 may not solve it and will return the result unknown.

Exercise 15: Read the array_non_linear_proof() method in array.py Python file. Try to convert formulae \(\eqref{B4}\) into a Z3 constraint and investigate the output. Is the result unknown? Explain your observation.

In the class, we've discussed that the array interfaces can be implemented by reducing these interfaces to functions. To understand this, you'll implement a small array module by using anonymous function: lambdas, in Python.

Exercise 16: Read the lambda_array() method in array.py Python file. The lambda_array() method implemented an array interface using lambda (function) by 3 APIs:
  1. array_new(): create a new array
  2. array_select(): array reading
  3. array_store(): array writing
complete the missing code in array_store() method.
Challenge: So far, we use Z3py's API to solve problem related to array theory. In this part of the assignment, we'll implement the array elimination algorithm, as described in the class. First, let's define the basic data structures to represent the array syntax:
    E ::= c | x | select(E, E) | update(E, E, E)
    R ::= E = E | E != E
    P ::= R | P/\P
    
Read the array_elim.py Python file, complete the missing code to implement the array elimination algorithm.

Part C: Pointers

Z3 hasn't built-in support for pointer structure. In this part, we'll design and implement a translator which converts pointer logic into EUF and array problem. After translation, we can use Z3 to solve the target formulae.

Before implementing the translator, we need to design the structure of pointer logic first. Recall the syntax of pointers we mentioned in the lecture is:

    T ::= x | T + E | &x | &*T | *T | NULL
    E ::= x | n | E + E | E - E | *T
    R ::= T = T | T < T | E = E | E < E
    P ::= R
        | ~R
        | P ∧ P
Exercise 17: Read the code in pointer.py, try to understand how to represent the above syntax using Python. And finished count_stars() method which count amount of star symbols (*) in our pointer logic. Test your implementation by the unit test provided.

Next step is the translating procedure, The basic idea is to eliminate pointers by encoding their semantics using the store S and the heap H. We'll be using the semantics of pointers discussed in the lecture to do the elimination. Here are the rules to eliminate a pointer \(T\): $$ \begin{align*} [\![x]\!] &= H(S(x)) \\ [\![T + E]\!] &= [\![T]\!] + [\![E]\!] \\ [\![\&x]\!] &= S(x) \\ [\![\&*T]\!] &= [\![T]\!] \\ [\![*T]\!] &= H([\![T]\!]) \\ [\![{\tt NULL}]\!] &= 0 \end{align*} $$ Here are the rules to eliminate an expression \(E\): $$ \begin{align*} [\![n]\!] &= n \\ [\![x]\!] &= H(S(x)) \\ [\![E + E]\!] &= [\![E]\!] + [\![E]\!] \\ [\![E - E]\!] &= [\![E]\!] - [\![E]\!] \\ [\![*T]\!] &= H([\![T]\!]) \end{align*} $$ Here are the rules to eliminate a relation \(R\): $$ \begin{align*} [\![E = E]\!] &= [\![E]\!] = [\![E]\!] \\ [\![E < E]\!] &= [\![E]\!] < [\![E]\!] \\ [\![T=T]\!] &= [\![T]\!] = [\![T]\!] \\ [\![T < T]\!] &= [\![T]\!] < [\![T]\!] \end{align*} $$ And finally, here are the rules to eliminate a proposition \(P\): $$ \begin{align*} [\![P \land Q]\!] &= [\![P]\!] \land [\![Q]\!] \\ [\![\neg R]\!] &= \neg [\![R]\!] \end{align*} $$

For instance, the formulae: $$ ((p = \&a) \land (a = 1)) \to (*p = 1) $$ will be translated into the following EUF (with application ->, but does not matter): $$ (H(S(p)) = S(a)) \land (H(S(a)) = 1) \to (H(H(S(p))) = 1), $$ where the \(H\) and \(S\) are uninterrupted functions. The uninterrupted functions are decided by the memory model we use. In exercise 8, we assume that the heap H only contains the Integer type, and the address is also of Integer type. So, the uninterrupted functions \(H\) and \(S\) can be defined in z3:

    S = Function("S", IntSort(), IntSort())
    H = Function("H", IntSort(), IntSort())
Exercise 18: Based on the semantics of pointers logic above, finish the to_z3() methods in the file pointer.py to generate Z3 constraints. Test your implementation by the unit test provided.

To simplify the problem, our implementation assumes that the heap H only contains Integer type, and we represent memory address using Integer type, though simple and enough to model many practical problem, this model is a little bit coarse. To model practical memory more accurately, we can design a fine-grained memory model, thus we can reason more fancy properties of the memory.

Challenge: Try to implement pointer logic which variables can take difference types. We can partition the heap H into at least three regions: address's heap, real number's heap, integer number's heap, with each partition storing variables of different types. You can start with the code in pointer.py, or you can write your own code from scratch.


This completes this lab. Remember to zip you homework with file name <ID>-lab-6.zip (e.g., SA19225789-lab-6.zip), and submit it to USTC Online Teaching Platform .

Happy hacking!