Lab 1: Compiler


Overview

A compiler is an important software infrastructure to translate source programs developed using high-level language (e.g., C, Python, or Java) into semantic equivalent binary programs in low-level languages (e.g., x86, RISC, or ARM). Furthermore, the generated binary programs by a compiler are linked against libraries to create the final executables.

In this lab, you will gain a deeper understanding of compiler by creating a toy compiler from a minimal subset of C to RISC-V. In the meanwhile, you will implement a minimal subset of C libraries to understand how to create language libraries (runtimes) in general.

Hand-in Procedure

When you finished the lab, zip your code files with the file name student ID-lab-1.zip (e.g., SA19225789-lab-1.zip) and submit it to the Online Teaching Platform. The deadline is 23:59, Oct. 13, 2023 (Beijing time). Any late submission will NOT be accepted.


Part A: Libraries

C language library files are a collection of precompiled functions and data that can be reused in programs. Library files provide a modular way to organize and share code for reuse in different programs. In this section, let us start by implementing a minimal C library: mini-lib. You should download this code skeleton to start with.

The following figure depicts the overall structure of the software stack: on top of the underlying hardware are operating systems which manages not only resources but also applications; libraries reside in the middle level which utilized operating system resources via system calls; applications reside on top of the software stack, which may invoke functions in libraries or operating systems.

Exercise 1: We have provided the implementation of a library puts(). In this exercise, your job is to understand the implementation the function, but do not need to write any new code. The puts() function sends a string to the standard output. You can find the code in the mini-lib.c file in lab1.
    int puts(char *s) {
        long n = strlen(s);
        long r;
        asm(CALL(SYS_WRITE)
        "movq $1, %%rdi\n"
        "movq %1, %%rsi\n"
        "movq %2, %%rdx\n"
        "syscall\n"
        "movq %%rax, %0\n"
        : "=r"(r)
        : "r"(s), "r"(n)
        : "%rax", "%rdi", "%rsi", "%rdx");
        return (int)r;
    }
    
This library makes use of system calls to implement its functionalities. Specifically, we make use of the Linux x86-64 system calls, which is useful to understand this code. You may also want to refer to the inline assembly specification.

Here, we make a self-contained explanation to the above code: this code implements the system call

        write(fd, buf, size)
    
according to the system call specification, the system call number (i.e., $1) is loaded into the %rax register, and the arguments fd, buf, and size are loaded into registers %rdi, %rsi, and %rdx, respectively. When the system call returns, the return value resides in the %rax register.
Exercise 2: Implement the exit() function, which terminates the execution of a program and exits, with the following prototype:
    void exit(int status);
    
Exercise 3: Implement the alarm() functions, which sets up a timer that fires a signal (SIGALARM) after a specified time interval, with the following prototype:
    int alarm(unsigned int seconds);
    

After implementing the puts(), exit(), and alarm() functions, compile and run the project to justify your answer.

    css-lab@tiger:~$ ./notmain.out
    

The result are similar to the output below.

    hello, world
    hello, world
    hello, world
    hello, world
    hello, world
    Alarm clock
    
Challenge. Implement the output function printf().
Hint: you may find the var-arg feature useful.
Challenge. Implement sleep().
Hint: recent Linux uses nanosleep, which takes argument of type struct time spec.

Part B: The mcc Compiler

In this section, we are going to read and understand a compiler called mcc. mcc is a toy compiler from MiniC (a minimal subset of C), to x86-64. Then, you will extend the mcc compiler to support RISC-V, a recent and popular RISC ISA.

Software setup

Before writing any code, you should first set up required software. Specifically, we need flex and bison to build the compiler front-end: flex is a tool used for generating lexical analyzers, whereas Bison is a tool used for generating syntax analyzers.

    css-lab@tiger:~$ sudo apt-get install flex
    css-lab@tiger:~$ sudo apt-get install bison

To double-check the installation is successful:

    css-lab@tiger:~$ flex --version
    css-lab@tiger:~$ bison --version

You do not need to write any code for flex and bison in this lab. But you can refer to the flex document and bison document, if you are interested.

To use mcc compile code, you need make it first, and then you can use mcc to compile your code. For example:

    css-lab@tiger:~/lab1/mcc$ make
    css-lab@tiger:~/lab1/mcc$ ./mcc input.c

mcc will compile input.c and generate an executable named a.out in current directory.

Exercise 4. Read and understand the code for compiler pipeline. What does the compiler do when the parameters "-o", "-S", "-c", "-m" and "-v" are added or not, and how can these parameters be used in combination?
Exercise 5. First, you should read the code generation for x86-64, and understand the code generation algorithm. How does mcc implement an assignment statement like a = a + 5. What about an output statement print(a)?
Exercise 6. Implement code generation for RISC-V.

Hint: You should install the RISC-V environment on your Ubuntu.
Install RISC-V cross-compiler tool:

        css-lab@tiger:~$ sudo apt install gcc-riscv64-linux-gnu
Select a directory to pull the source of spike and compile it:
        css-lab@tiger:~$ sudo apt install g++
        css-lab@tiger:~$ sudo apt-get install device-tree-compiler
        css-lab@tiger:~$ git clone https://github.com/riscv/riscv-isa-sim
        css-lab@tiger:~$ cd riscv-isa-sim
        css-lab@tiger:~/riscv-isa-sim$ mkdir build
        css-lab@tiger:~/riscv-isa-sim$ cd build
        css-lab@tiger:~/riscv-isa-sim/build$ ../configure
        css-lab@tiger:~/riscv-isa-sim/build$ make
        css-lab@tiger:~/riscv-isa-sim/build$ sudo make install
    
To double-check the installation is successful:
        css-lab@tiger:~$ spike --help
    
Use static link to generate final executables. For example, to compiler the file hello.c:
        css-lab@tiger:~/lab1/mcc$ riscv64-linux-gnu-gcc -static hello.c
        css-lab@tiger:~/lab1/mcc$ spike pk a.out
        bbl loader
        hello, world
    
Challenge. Implement JIT for mcc.


This completes this lab. Remember to zip your homework with the file name student ID-lab-1.zip (e.g., SA19225789-lab-1.zip), and submit it to the Online Teaching Platform.

Happy hacking!