MPCS 51300 Compilers
This class teaches the theory and practice of how to write a compiler, including lexical analysis, grammars, lexers and parsers, type checking, and code generation. For decades, compilers have been the most dynamic and challenging branch in computer science. The main part of this class will focus on providing the basics of the different phases of compilation. Through the course, students will develop appreciation for the implementation strategies behind making an efficient and robust compiler.
The objective of this class is to learn:
the implementation of a real compiler from scratch
the answers of how programming language works
the technical challenges that compilers face
the basics of reaching performance
This course meets once a week for lectures and once a week for an optional TA session. Even if not recorded, attendance in class is strongly encouraged. Most of the coursework revolves around 4 homework assignments, as the primary focus will be for students to code. There will be one presentation (Week 5) and one Final Exam (Week 10). The course calendar, including the contents of each lecture and programming assignment due dates, is shown at the end of this page.
On Week 11, volunteers will present a topic on compilers (the grade will provide extra points to your homework assignment).
Grune, Dick. Modern Compiler Design. New York, NY: Springer, 2012.
Cooper, Keith D. (Keith Daniel), and Linda Torczon. Engineering a Compiler. San Francisco, Calif.: Morgan Kaufmann, 2004.
Aho, Alfred V. Compilers: Principles, Techniques, & Tools. Boston: Pearson Addison-Wesley, 2007.
The final grade will be based on the following:
Written assignments: 15%
Programming assignments: 50%
Midterm Exam (project presentation): 10%
Final Exam: 25%
We will assign simple written problems at the end of each lecture that will be due the following week. The goal of these assignments is to make you understand the course better. Some content can be hard to appreciate in class, and we encourage you to work them at home the same day. They will constitute 15% of the course grade.
Compiler Programming Project
We will assign four programming assignments over the course of the term. You can choose to work on the programming assignment alone, however, it is strongly recommended to work by group of 2 people. You will need to inform us of your choice by Week 2. These assignments will typically due on Sunday 5pm CT. The goal of this assignment will be to make you create your own compiler. The assignments will be interconnected; they will guide you to the creation of your own compiler. They will constitute 50% of the course grade.
The assignments 1, 2, 3, 4 will be due on Week 3, Week 6, Week 8, and Week 10 respectively.
Week 5: A presentation of your work on your own compiler. This presentation will not be longer than 10 minutes. The grade of this presentation will constitute 10% of the course grade.
During this presentation, you will talk about:
- assumptions (grammar, type checking,...)
- technical challenges
- future work
Prorated points will be applied according to the number of late days.
1 day late: 75% of the grade will be considered.
2 days late: 50% of the grade will be considered.
3 days late: 25% of the grade will be considered.
After 3 days, no assignment will be accepted.
We will be giving a final exam at Week 10. The final exam is worth 25% of the grade.
The final exam will be a closed-book written exam on the whole content of the class.
The questions of this exam will be on the content of the lecture and the compiler project.
Week 1: General Introduction; Compiler structure, different phases
(Grune: Chapter 1)
What is a compiler? (e.g. GNU c compiler) How is it designed? How have compilers evolved since 1945? We cover the different phases of compilations from the human language to the binary (code source, pre-processor, compiler, assembler, linker, intermediate representation); compiler fundamentals (trust, reliability, efficiency, time-limited optimization, predictable; and notions of front-end, back-end, compiler porting, addressing mode. A refresher on memory structure (stack, dynamic data, static data) will be conducted.
Week 2: Lexical analysis, Regular Expressions, Lex
(Grune: Chapter 2)
What is a regular expression? How do we create them? We introduce Finite Automaton (states, alphabet, transition functions, start state, final states), build a language scanner with a finite-state machine, and discuss the curse of dimensionality and methods for reducing the number of states. We experiment with Lex on a small example defining simple token (integer, floating, string).
Week 3: Parsing, Grammars, LL/LR grammar, LR Parsing
(Grune: Chapter 3, Sections 3.1-3.4)
What is parsing? How does it interact with a token scanner and the symbol table? We learn how to define a grammar and use the example of the LL grammar (derivation tree). We show the limitation of left recursion grammar and how to transform them. We differentiate the LL and LR grammar.
Week 4: Symbol tables, Type Checking
We show why a symbol table is critical in compilation and how to design one efficiently. We explain the different steps (storage, verification, scope determination) and how to have different locality, overloading (using same names at different places). We use the table to implement the type checking. Then we define a typing system (type equivalent, cast)
Week 5: Mid-term
We will review the programming assignment progress of each group during a session where each project will be presented.
Week 6: Memory management, stack/heap/register allocation, Function calls
(Aho: Chapter 7)
This part explains how complex data structure are handled at the machine level (since the machine only knows memory addressing). We introduce the representation of basic type and more complex type such as arrays and explain the two addressing type. We define static/dynamic allocation and end with function calls.
Week 7: Naive Code generation, Stack Machine (PDA) vs Register Machine
(Aho: Chapter 8)
We discuss implementation of: arithmetic expression, logic expression, conditional instructions, loops, function calls. We demonstrate the difference between a 1-pass generation and many-pass generation.
Week 8: Advanced Code Generation, Intermediate representation, Registers allocation
(Aho: Chapter 8)
We show the importance of having an intermediate representation for optimizations: facilitating code generation, code handling/transformations, abstraction level. We describe the SSA form.
Week 9: Code optimization
Optimizations are critical for performance. We demonstrate how to modify the compiler to make code generation more efficient: control flow analysis, control data analysis.
We develop how to save branches, registers, and how to increase the degree of parallelism.
Week 10: Final Exam
Week 11: Additional Presentations
Grading Group (% of the whole grade)
< 70: Dealt on a case-by-case basis
20% - Part 1
20% - Part 2
50% - Part 3
10% - Part 4
Core - Systems
Tuesdays 5:30 - 8:30