In the class, we've discussed several important theories to reason about data structures: bit vectors, arrays, pointers. etc., which are widely used in practical reasoning and verifications.
In this assignment, you'll be familiarizing yourself with these theories, with the reasoning of practical data structures, using Z3's support for these theories. To be specific, this assignment is divided into three parts, each of which contains both some tutorials and problems. The first part is about the theory of bit vectors; the second part covers theory of arrays; and the third part covers the theory about pointers. Some problems are tagged with Exercise, which you should solve. And several problems are tagged with Challenge, which are optional. Download this code templates to start with.
Before getting started with this assignment, make sure you've finished Software Setup in the lab 1, and have Z3 and Python (3.12+ recommended) properly installed on your computer. For any problems, please feel free to contact us for help.
When you finished the lab, zip you code files with file name <ID>-lab-6.zip (e.g., SA19225789-lab-6.zip), and submit it to USTC Online Teaching Platform. Any late submission will NOT be accepted.
A bit vector is a very compact data structure with every bit 0 or 1, in a fixed constant width \(l\): $$ b_{l-1}, \ldots, b_0 $$ where \(b_0\) is the least-significant bit, and \(b_{l-1}\) is the most significant bit.
Bit vectors support a large group of operations, as we will discuss in the following. So from the data structure point of view, a bit vector is a concrete data type (CDT).
To build the models and reason the properties of bit vectors, Z3 has built-in support for bit vectors, for instance, the following code:
x, y = BitVecs('x y', 2) solve(x+y==9)
declares two variable bit vectors x and y with fixed-width \(2\), then asks Z3 whether the equality x+y==9 can be satisfiable. And Z3 will generate the result.
Z3 has a very rich set of operations for bit vectors, you can refer to the official APIs. For your convenience, here is a table containing most operations:
Operations | Signed | Unsigned | Sample code |
Addition: | + | + | x + y |
Subtraction: | - | - | x - y |
Multiplication: | * | * | x * y |
Division: | / | UDiv | x/y, UDiv(x, y) |
Bitwise-and: | & | & | x&y |
Bitwise-or: | | | | | x|y |
Bitwise-xor: | ^ | ^ | x^y |
Bitwise-not: | ~ | ~ | ~x |
EQ: | == | == | x==y |
NE: | != | != | x!=y |
GE: | >= | UGE | x>=y, UGE(x, y) |
GT: | > | UGT | x>y, UGT(x, y) |
LE: | <= | ULE | x<=y, ULE(x, y) |
LT: | < | ULT | x < y, ULT(x, y) |
Left Shift: | << | << | x << y |
Arithmetic Right Shift: | >> | >> | x >> y |
Logical Right Shift: | LShR | LShR | LShR(x,y) |
Mod: | % | % | x%y |
Remainder: | SRem | URem | SRem(x, y),URem(x, y) |
Position: | + | + | +x |
Negation: | - | - | -x |
Bit-vector Concatenation: | Concat | Concat | Concat(x,y) |
Bit-vector Extraction: | Extract | Extract | Extract(x,y,z)/Extract(StringVal,x,y) |
Rotate to the Left: | RotateLeft | RotateLeft | RotateLeft(x,y) |
Rotate to the right: | RotateRight | RotateRight | RotateRight(x,y) |
Extra sign-bits: | SignExt | SignExt | SignExt(x,y) |
Extra zero-bits: | ZeroExt | ZeroExt | ZeroExt(x,y) |
RepeatBitVec: | RepeatBitVec | RepeatBitVec | RepeatBitVec(x,y) |
Reduction-and: | BVRedAnd | BVRedAnd | BVRedAnd(x) |
Reduction-or: | BVRedOr | BVRedOr | BVRedOr(x) |
Bit-vector addition does not overflow: | BVAddNoOverflow | BVAddNoOverflow | BVAddNoOverflow(x,y,signed) |
Bit-vector addition does not underflow: | BVAddNoUnderflow | BVAddNoUnderflow | BVAddNoUnderflow(x,y) |
Bit-vector multiplication does not overflow: | BVMulNoOverflow | BVMulNoOverflow | BVMulNoOverflow(x,y) |
Bit-vector signed multiplication does not underflow: | BVMulNoUnderflow | BVMulNoUnderflow | BVMulNoUnderflow(x,y,signed) |
Bit-vector signed division does not overflow: | BVSDivNoOverflow | BVSDivNoOverflow | BVMulNoUnderflow(x,y) |
Bit-vector unary negation does not overflow: | BVSNegNoOverflow | BVSNegNoOverflow | BVSNegNoOverflow(x) |
Bit-vector subtraction does not overflow: | BVSubNoOverflow | BVSubNoOverflow | BVSubNoOverflow(x,y) |
Bit-vector subtraction does not underflow: | BVSubNoUnderflow | BVSubNoUnderflow | BVSubNoUnderflow(x,y,signed) |
In the following, you'll often need to refer to this table when necessary.
As bit vectors have only fixed width, so they have very different properties from the normal mathematical integers (with arbitrary width), hence, it's very tricky to reason about bit vectors.
Recall that in the class, we discussed a buggy version of the binary search algorithm reported by Joshua Bloch. The bug is demonstrated by the following Java code:
// To search the value "target", in the input array "arr". // Return the index when searching succeeded, or return -1 when failed. // This is the implementation in an early version of the "java.util.Arrays" int binarySearch(int[] arr, int target){ int low = 0; int high = arr.length - 1; while(low <= high){ int middle = (low + high)/2; if(arr[middle] == target) return middle; if(arr[middle] > target) high = middle - 1; else low = middle + 1; } return -1; }
check_average(int_average_v1, True)to check whether the function int_average_v1() we give in the bit_vectors.py Python file is correct. Does Z3 complain? Why or why not?
Given a Java source code which accepts two parameters, that is provided by z3 after running the function check_average():
public class IntAverage { public static void main(String[] args) { String x = args[0]; String y = args[1]; int_average(Integer.parseInt(x), Integer.parseInt(y)); } private static int int_average(int x, int y) { int z = (x + y) / 2; assert z >= 0 : "Generating a integer overflow bug!"; return z; } }
Joshua J. Bloch proposed the following solution to solve integer overflow problem:
def int_average_v2(x, y): return x + (y-x)/2
Joshua J. Bloch proposed a second solution to solve integer overflow problem: $$ (x+y) \ggg 1 \tag{A1}\label{A1} $$
In the function int_average_v3(), we only provide bit vectors which are all non-negation, and what will happen, if we provided negation bit vectors.
How to write a correct function, which takes as input two bit vectors x and y, to compute the average of two arbitrary integers (negative or non-negative)?
Integer overflows are a notorious source of bugs and are very difficult to debug. Read the following C code:
int myfunction(int *array, int len){ int *arr, i; arr = malloc(len * sizeof(int)); /* [1] */ if(arr == NULL){ return -1; } for(i = 0; i < len; i++){ /* [2] */ arr[i] = array[i]; } return arr; }
There is an Integer overflow bug at [1], which will cause buffer overflow at [2].
In the class, we discussed the famous Fermat's Last Theorem, which states the equation: $$ x^n + y^n = z^n $$ has no integer solution for n>=3. With aid of computers, for any given n, we can enumerate all possible values for x, y and z, and to check whether the equation helds. Although due to the limited computing resources of modern computer, we can not verify this equation for any n, however, we can at least verify the equation for large enough x, y and z. And this at least give us more evidence that this theorem does not have solution. (Of course, if we do find a solution, then we have proved Fermat's Last Theorem does not hold at all.)
E ::= c | x | E&E | E|E | ~E R ::= E=E | E!=E P ::= R | P/\PRead the bit_blast.py Python file, complete the missing code to implement the bit-blasting algorithm.
Z3 has a very rich set of operations for arrays, you can refer to the official APIs. For your convenience, here is a table containing most operations:
Definition | Operations | Sample code |
Store a value into array: | Store | Store(Array, index , value) |
Get a value from array: | Select | Select(Array, index) |
In the following, you'll often need to refer to this table when necessary.
In order to formulate a programme of a mathematical theory for computation, McCarthy proposed a basic theory of arrays as characterized by the select-store axioms. The expression $$ Select(A, i) \tag{B1}\label{B1} $$ returns the value stored at position \(i\) of the array \(A\); and the expression $$ Store(A, i, v) \tag{B2}\label{B2} $$ returns a new array identical to the original array \(A\), except for on position \(i\) it contains the value \(v\). In Z3Py, we can write expression \(\eqref{B1}\) as Select(Array, index) or Array[i], and write expression \(\eqref{B2}\) as Store(Array, index, value).
Now look at the following code:
# Use I as an alias for IntSort() I = IntSort() # A is an array from integer to integer A = Array('A', I, I) x = Int('x') print(A[x]) print(Select(A, x)) print(Store(A, x, 10)) print(simplify(Select(Store(A, 2, x+1), 2)))
In class, we've discussed the formula: $$ Store(A, i, x)[i] \ge x \tag{B3}\label{B3} $$ In order to validate it, we can make use of the array support in Z3 directly.
In class, we also discussed the array property, which poses some restrictions on the possible shape of index, etc. One implication of the array property is that some array formulae are undecidable. For instance, the formulae like the following $$ Store(A, i*i - i*i, x)[0] \ge x \tag{B4}\label{B4} $$ is undecidable, as it contain non-linear arithmetic. In general, for such formulae, Z3 may not solve it and will return the result unknown.
In the class, we've discussed that the array interfaces can be implemented by reducing these interfaces to functions. To understand this, you'll implement a small array module by using anonymous function: lambdas, in Python.
E ::= c | x | select(E, E) | update(E, E, E) R ::= E = E | E != E P ::= R | P/\PRead the array_elim.py Python file, complete the missing code to implement the array elimination algorithm.
Z3 hasn't built-in support for pointer structure. In this part, we'll design and implement a translator which converts pointer logic into EUF and array problem. After translation, we can use Z3 to solve the target formulae.
Before implementing the translator, we need to design the structure of pointer logic first. Recall the syntax of pointers we mentioned in the lecture is:
T ::= x | T + E | &x | &*T | *T | NULL E ::= x | n | E + E | E - E | *T R ::= T = T | T < T | E = E | E < E P ::= R | ~R | P ∧ P
Next step is the translating procedure, The basic idea is to eliminate pointers by encoding their semantics using the store S and the heap H. We'll be using the semantics of pointers discussed in the lecture to do the elimination. Here are the rules to eliminate a pointer \(T\): $$ \begin{align*} [\![x]\!] &= H(S(x)) \\ [\![T + E]\!] &= [\![T]\!] + [\![E]\!] \\ [\![\&x]\!] &= S(x) \\ [\![\&*T]\!] &= [\![T]\!] \\ [\![*T]\!] &= H([\![T]\!]) \\ [\![{\tt NULL}]\!] &= 0 \end{align*} $$ Here are the rules to eliminate an expression \(E\): $$ \begin{align*} [\![n]\!] &= n \\ [\![x]\!] &= H(S(x)) \\ [\![E + E]\!] &= [\![E]\!] + [\![E]\!] \\ [\![E - E]\!] &= [\![E]\!] - [\![E]\!] \\ [\![*T]\!] &= H([\![T]\!]) \end{align*} $$ Here are the rules to eliminate a relation \(R\): $$ \begin{align*} [\![E = E]\!] &= [\![E]\!] = [\![E]\!] \\ [\![E < E]\!] &= [\![E]\!] < [\![E]\!] \\ [\![T=T]\!] &= [\![T]\!] = [\![T]\!] \\ [\![T < T]\!] &= [\![T]\!] < [\![T]\!] \end{align*} $$ And finally, here are the rules to eliminate a proposition \(P\): $$ \begin{align*} [\![P \land Q]\!] &= [\![P]\!] \land [\![Q]\!] \\ [\![\neg R]\!] &= \neg [\![R]\!] \end{align*} $$
For instance, the formulae: $$ ((p = \&a) \land (a = 1)) \to (*p = 1) $$ will be translated into the following EUF (with application ->, but does not matter): $$ (H(S(p)) = S(a)) \land (H(S(a)) = 1) \to (H(H(S(p))) = 1), $$ where the \(H\) and \(S\) are uninterrupted functions. The uninterrupted functions are decided by the memory model we use. In exercise 8, we assume that the heap H only contains the Integer type, and the address is also of Integer type. So, the uninterrupted functions \(H\) and \(S\) can be defined in z3:
S = Function("S", IntSort(), IntSort()) H = Function("H", IntSort(), IntSort())
To simplify the problem, our implementation assumes that the heap H only contains Integer type, and we represent memory address using Integer type, though simple and enough to model many practical problem, this model is a little bit coarse. To model practical memory more accurately, we can design a fine-grained memory model, thus we can reason more fancy properties of the memory.
This completes this lab. Remember to zip you homework with file name <ID>-lab-6.zip (e.g., SA19225789-lab-6.zip), and submit it to USTC Online Teaching Platform .
Happy hacking!