Lab 2: Abstract Syntax Tree and Type Checker

In this lab, you will design and implement an abstract syntax tree as well as a type checker for Tiger. To implement an abstract syntax tree, you will design a bunch of Java classes, and add semantic actions to the parser to generate abstract syntax trees automatically. To implement the type checker, you will design symbol tables as well as type checking rules.

Getting Started

which will commit your changes to 'lab1' branch to your local Git repository, and create and check out a new lab2 branch from the remote lab2 branch.

In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g., if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will report you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files:

Hand-in Procedure

Part A: Abstract Syntax Trees

In this part of the lab, you will first design and implement an abstract syntax tree for Tiger.

Read and understand the source code in the file ast/Ast.java, which encodes the abstract syntax for MiniJava. Refer to the MiniJava reference manual for the MiniJava syntax.

Program Representation as Trees

Given the data structure definitions for the MiniJava syntax, one can (in principle) manually encode any MiniJava program. While this method is, at first glance, boring and error-prone to do in everyday programming practice, it does allow a direct representation of program structures, and also is the dominant programming styles in some functional languages, like Lisp or Scheme.

In the file ast/SamplePrograms.java, we have offered you some code to encode several sample MiniJava program. Read the code and make sure you understand how those sample programs are represented. Then write code to encode the MiniJava program test/Sum.java.

Pretty Printing

As you have done in lab 1, you will implement a pretty printer in this part, which makes use of the pattern matching feature in Java.

Read the code in the file ast/PrettyPrinter.java, finish the pretty printer by filling in the missing code. Do not forget to test your code when you finished, you can start with the unit test code in ast/Test.java.

While the pretty printer you have implemented is powerful enough to process MiniJava code, it may be not that general and powerful. Study some general pretty printing algorithms and implement them into your Tiger compilers. For example, you may start with Wadler's printer.

Tree Generation

The parser can construct the abstract syntax tree automatically by its semantic action attached with the parsing methods.

Modify the parser src/parser/Parser.java to add semantic actions to each parsing method. As the result of parsing, your Tiger compiler will return an abstract syntax tree for subsequent processing.

To this point, your Tiger compiler should parse all legal MiniJava programs, generate abstract syntax trees, and perform pretty printing on the trees. Do not forget to test your compiler:

If you have tried parser generators from the previous lab, then you may generate abstract syntax trees from that parser by adding semantic actions into productions.

Part B: Type Checker

A type checker inspects the abstract syntax trees for well-formedness, according to the MiniJava language specification which is in turn based on Java specification. Well-formedness guarantees that the input MiniJava program obey all constraint rules. For example, a variable must be declared before its use; the "+" operator must apply to two operands of integer type; the methods being invoked on an object must have been defined in its class or some superclass; and so on. In this part, you will be implementing a type checker for Tiger.

Symbol Tables

Just like the pretty printing, a type checker performs some kind of (post-order) tree traversal. However, there is a key difference: the type checking traversal is context-sensitive, that is, it must record necessary context information. We can use dictionary data structures to record these information. For historical reason, such dictionaries are called symbol tables.

The design of the symbol table is highly-dependent on the language being compiled. For MiniJava, we have designed two symbol tables: 1) a class table; and 2) a method table. A class table maps a class name to its to instance variables and methods. And a method table map a method name to its parameters and locals variables.

Read the code defining symbol table data structures in src/checker/*. We have provided you all code, make sure to understand the overall organization of these data structures. To debug your compiler implementation, you often need to dump the class table or method table. You can run your Tiger compiler with:

 $ java --enable-preview -cp bin Tiger -dump classtable
           -dump methodtable <inputFile>

Implementing Type Checking Rules

A type checker performs a post-order traversal of an abstract syntax tree. That is, a type checker first inspects each subtree and calculates a type for that subtree, respectively. When the type checker returns to the root of the tree, it compares the types returned from all subtrees and guarantees those types can be combined properly into a result type before returning. For example, for the tree node e1+e2, the type checker first inspects the left subtree e1 to calculate a type, say t1, then inspects the right subtree e2 to calculate a type t2. At the type checker returns to the root node "+", it will check that both t1 and t2 are int, before returning a type int as the final result.

Finish the code in the type checker src/checker/Checker.java, by filling in the missing code. In implementing the type checker, you can run the following command

 $ java --enable-preview -cp bin Tiger -builtin SumRec.java

in case that your parser has bugs.

To this point, your Tiger compiler can type check all MiniJava programs. Do not forget to test your compiler and fix any bugs before continuing.

Error Recovery

A type checker should report type errors to the programmers and should recover from type errors to detect and report as many errors as possible. The error recovery strategy in Tiger's current implementation is preliminary in that it only reports one error before exiting, but does not try to recover from an error.

Propose a better error recovery strategy and implement it in your Tiger compiler.

Declaration and Usage

Java, like many static languages, obeys a declaration-before-use rule, meaning that each variable in a program should have a unique static declaration site before all its usages. However, a variable declaration may have not been used at all. For such scenarios, the variable is said to be an unused variable. A good compiler might generate warning like:

Extend your Tiger compiler to issue warnings for unused variables.

In well-behaved Java programs, every variable should have been initialized before its usages, otherwise the usage of such a variable might produce unpredictable results. For uninitialized variables, good compilers might generate the following error:

        Error: Variable 'x' might not have been initialized

Propose a strategy to detect uninitialized variables, and implement your idea into your Tiger compiler.

Hand-in

This completes the lab. Remember to hand in your solution to the online teaching system.