Lessons from building a compiler (and other semi-related musings)
Monday, June 15th, 2009Note: I had a hard time thinking of a title, my apologies
This past Spring, I took a course on Programming Languages and Compilers which has been far and away the most challenging class I’ve taken in college. My friend (and project partner) Gary hit the nail on the head when he said something like: the project is insane, tests are incredibly hard, and even homeworks are non-trivial. However, after going through the course, both of us will tell you that we found the course incredibly valuabe. I feel like I’ve matured as a programmer and computer scientist over the course of the last semester and it’s largely because of my compilers class. I wish I could have taken it earlier because now I feel like I’ll get a lot more out of my future courses. Of course, had I taken it earlier, I wouldn’t have been able to keep up, and maybe wouldn’t have gotten as much out of it, introducing an interesting paradox.
As a background, the course covers a lot of the basics of compiler design and implementation. We don’t officially have a textbook, and our professor (Paul Hilfinger) wrote up notes that served as a text for the first half of the course. Unfortunately, as the class project heated up he spent more time working on that and didn’t get around to updating the notes. Not to rag on him, he spent more time working on the project than many of the groups and we frequently saw him responding to newsgroup posts as late as 2 AM. I hope that the next time he teaches the course, he does get to finish the notes as I like the material that he presented. Anyway, we covered most of the basic topics including lexical analysis, parsing, static semantics, types and type inference, code generation, runtime structures (functions, classes, exceptions) and some code optimizations. I actually found most of the course material very interesting which is unusual as typically I lose interest in certain parts of a course.
I mentioned we had a class project. The project was in a nutshell to build a compiler for python. Of course the project was broken into phases (lexing and parsing, static analysis, and code generation) to make sure we didn’t procrastinate and utterly fail. After each phase, the professor would release his implementation, which we could use to build the next phase, just so that we didn’t shoot ourselves in the foot with bad programming practices (which my group did for phase 1). The professor also provided us a runtime library and a virtual machine with a clean instruction set that we could use as a python interpreter, to split the challenges of code generation into two “easier” tasks.
I wanted to write a bit about this project because I feel like it’s one of my crowning acheivements so far as a young programmer. It’s one of the larger code bases I’ve worked intimately with, meaning I could tell you anything about any part of the code base without really looking at it. It’s a project that I spent so much time with in such a short period of time, that as nerdy as it sounds, I found myself dreaming about problems we had and waking up with solutions to them. And it’s definitely one of the most complex systems I’ve developed and completed successfully.
Of course, I didn’t do it alone. My partners in crime were Gary (mentioned above) and Brian, both hard workers and amicable people, the latter quality being especially important because we spent a LOT of time together. Our final project was due the Monday after the last week of class, and that weekend, we pretty much lived in our computer science building. Our hard work paid off and we submitted a mostly working final project (mostly working in the sense that we didn’t do some tricky things like using function values as keys to dictionaries, because we modified the function representation).
Apart from compiler design, our professor emphasized good programming practices that he drilled in by looking at our source code and grading us on our cleanliness. I learned some valuable lessons about documenting code, refactoring all the time, repeating yourself, thorough and systematic testing etc. Lessons that I read about all the time, but that really only get ingrained when you work on a large project and NEED to do them to maintain your productivity (and your sanity when it’s late at night). I’m happy to say that now days, I use comments wisely and am always looking for ways to avoid repeating myself.
Another auxiliary skill I developed while building this compiler was the ability to hand write ia32 assembly code. My only experience with an assembly language had been with MIPS and the two languages are very different. Hilfinger is big on practical knowledge, and since few (if any?) use MIPS outside of academia, he figured it’d be better for us to learn ia32 and use it as the target language. I, being very results oriented, wanted to learn only as much of ia32 as I needed to compile python code but we naturally ran into a lot of problems in the compilation. The only way we could figure out what we were doing wrong was to look at the assembly, figure out what it was doing, and then pinpoint what it was doing wrong, so that involved me learning a lot of the details that I didn’t set out to learn. By the end of it, my way of debugging was to look at a python program, hand-write the assembly for it, and then work to make our compiler produce similar output. I actually got pretty fast at it, much to my
surpise.
The most rewarding thing about the project was that it was really cool to see it work correctly. We tested by comparing the output of our binaries with the standard python interpreter (which actually made things really easy to test), but it was really cool to see all of our test programs compile and run correctly. When we finally got all of our tests to pass, I definitely felt a sense of accomplishment from completing such a monumental task. I don’t know if you get that sense in real software development, because your software is never “done” but the relief/joy/pride in completing almost made all the effort worthwhile.
The project isn’t something you can show off to a lot of people (like our pacman projects from AI), because I wouldn’t even expected non-computer-saavy people to know what a compiler is, but I find this to be further validation of how cool the project is. When you build a website and show it to people, they generally react positively, but when you build something that people don’t understand and they stare blankely at you, it validates that you are good at what you do. It’s similar to how my friends were awestruck when they found out their GSI (graduate student instructor) was studying quasi-smooth derived manifolds. When you are studying something that “no one” knows about and you’re successful, it’s very rewarding.
Everyone at Berkeley warns against taking compilers, because of how hard it’s supposed to be. However, my advice to Berkeley CS undergrads: definitely take the compilers class. I think it’s the single class that I’ve learned the most in, and it’s actually pretty rewarding too. But be prepared to work really hard.