Archive for the ‘projects’ Category

Capsules: Designing Web Applications For Review

Sunday, May 2nd, 2010

I just came back from WWW 2010 where I presented a research project I’ve been working on for the majority of my undergraduate years. The project is about building web applications with high level security properties that can be verified in a code review. This post is about the project and why I think it’s really cool.

I’d like to start off by convincing you that we need web applications with verifiable high-level security properties. First, what is a high level security property? It’s an application-specific guarantee about the privacy and integrity of user data maintained by the application. For example, as a user of an online banking application, I’d like some assurance that I cannot lose money unless I have authorized a transaction. I’d also like some guarantee that only I can view my account balances and transaction histories. These are high-level properties regarding the integrity and privacy of my data. And these are the kinds of guarantees that we want with our web applications; it’s not enough to just defend an app from XSS and CSRF.

Since these properties can be violated by the application itself (as opposed to external attackers), we have to make sure that the application does not violate them. This requires a code review. Unfortunately, with state-of-the-art technologies, and with sophisticated, complex applications, these code review are incredibly challenging. I would argue that they are infeasible. Why? Because verifying that a high-level property is achieved involves an exhaustive review of the application. With current application architectures, every object has the privileges necessary to violate whatever security property we’re interested in. In order guarantee that the application satisfies a property, we therefore have to make sure that the entire application does not violate it. Since applications are enormous, this is very challenging.

My project looked at making these code review easier by partitioning an application into components and granting only specific privileges to each component. Partitioning an application and exposing limited privileges facilitates a code review because now only parts of the application have the privileges needed to violate any particular security property. Auditors may not even have to look at certain application modules because they can guarantee a priori that those modules cannot violate the properties we are interested in.

All we’re doing here is bringing the idea of least privilege to web applications. We used an object capability approach to achieving least privilege in application components. Our goal was to confine each application component to a reduced-privilege context. We took a multi-faceted approach. First, we prevented application components from constructing additional privilege. We did this by requiring that applications are written in an object-capability language (in our case Joe-E). Second, we prevented the application from maintaining state outside of a semi-persistent session object (By semi-persistent, I mean that it lives in memory but is maintained across multiple HTTP requests). Combined, these two properties imply that all privileges to user data and resources must reside within the session object. Finally, we use wrapper objects to expose only a subset of the session object to each application component. This effectively confines each application component into a reduced-privilege context.

In terms of implementation, we built Capsules, a prototype framework that extends the Java Servlet Framework with these ideas. As mentioned, we require that applications are written in Joe-E, an object capability subset of Java. We use several Joe-E features to achieve the three aspects of our approach. First, Joe-E prevents objects from constructing privileges from scratch. Secondly, Joe-E allows us to declare application components (called Servlets) as immutable, which, in short, means that they cannot maintain state. Finally, Joe-E allows us to construct wrapper objects that actually encapsulate their internal state, so that Servlets must go through the interfaces exposed by the wrapper rather than using reflection to obtain a reference to the underlying session object. In this way, Joe-E helps us establish these reduced-privilege contexts.

We also conducted an evaluation of this framework by building a simple web mail application and verifying that the application maintains the privacy and integrity of user mailboxes. In this analysis, we discovered that there were several application components that we could completely ignore, simply because they had no way to violate the privacy and integrity properties. While our application was simple, we believe that this kind of analysis will also apply to more sophisticated applications, making it more practical to review these kinds of high-level properties.

So that’s a overview of the Capsules project. I’ve ommited most of the technical details so that I could concisely convey the main points. If you are interested, I encourage you to read our paper or see the slides for my talk (although I don’t think the slides will be very helpful apart from the pretty pictures). Finally, please feel free to contact me if you have questions or are interested in talking to me about the project.

Lessons from building a compiler (and other semi-related musings)

Monday, June 15th, 2009

Note: I had a hard time thinking of a title, my apologies

This past Spring, I took a course on Programming Languages and Compilers which has been far and away the most challenging class I’ve taken in college. My friend (and project partner) Gary hit the nail on the head when he said something like: the project is insane, tests are incredibly hard, and even homeworks are non-trivial. However, after going through the course, both of us will tell you that we found the course incredibly valuabe. I feel like I’ve matured as a programmer and computer scientist over the course of the last semester and it’s largely because of my compilers class. I wish I could have taken it earlier because now I feel like I’ll get a lot more out of my future courses. Of course, had I taken it earlier, I wouldn’t have been able to keep up, and maybe wouldn’t have gotten as much out of it, introducing an interesting paradox.

As a background, the course covers a lot of the basics of compiler design and implementation. We don’t officially have a textbook, and our professor (Paul Hilfinger) wrote up notes that served as a text for the first half of the course. Unfortunately, as the class project heated up he spent more time working on that and didn’t get around to updating the notes. Not to rag on him, he spent more time working on the project than many of the groups and we frequently saw him responding to newsgroup posts as late as 2 AM. I hope that the next time he teaches the course, he does get to finish the notes as I like the material that he presented. Anyway, we covered most of the basic topics including lexical analysis, parsing, static semantics, types and type inference, code generation, runtime structures (functions, classes, exceptions) and some code optimizations. I actually found most of the course material very interesting which is unusual as typically I lose interest in certain parts of a course.

I mentioned we had a class project. The project was in a nutshell to build a compiler for python. Of course the project was broken into phases (lexing and parsing, static analysis, and code generation) to make sure we didn’t procrastinate and utterly fail. After each phase, the professor would release his implementation, which we could use to build the next phase, just so that we didn’t shoot ourselves in the foot with bad programming practices (which my group did for phase 1). The professor also provided us a runtime library and a virtual machine with a clean instruction set that we could use as a python interpreter, to split the challenges of code generation into two “easier” tasks.

I wanted to write a bit about this project because I feel like it’s one of my crowning acheivements so far as a young programmer. It’s one of the larger code bases I’ve worked intimately with, meaning I could tell you anything about any part of the code base without really looking at it. It’s a project that I spent so much time with in such a short period of time, that as nerdy as it sounds, I found myself dreaming about problems we had and waking up with solutions to them. And it’s definitely one of the most complex systems I’ve developed and completed successfully.

Of course, I didn’t do it alone. My partners in crime were Gary (mentioned above) and Brian, both hard workers and amicable people, the latter quality being especially important because we spent a LOT of time together. Our final project was due the Monday after the last week of class, and that weekend, we pretty much lived in our computer science building. Our hard work paid off and we submitted a mostly working final project (mostly working in the sense that we didn’t do some tricky things like using function values as keys to dictionaries, because we modified the function representation).

Apart from compiler design, our professor emphasized good programming practices that he drilled in by looking at our source code and grading us on our cleanliness. I learned some valuable lessons about documenting code, refactoring all the time, repeating yourself, thorough and systematic testing etc. Lessons that I read about all the time, but that really only get ingrained when you work on a large project and NEED to do them to maintain your productivity (and your sanity when it’s late at night). I’m happy to say that now days, I use comments wisely and am always looking for ways to avoid repeating myself.

Another auxiliary skill I developed while building this compiler was the ability to hand write ia32 assembly code. My only experience with an assembly language had been with MIPS and the two languages are very different. Hilfinger is big on practical knowledge, and since few (if any?) use MIPS outside of academia, he figured it’d be better for us to learn ia32 and use it as the target language. I, being very results oriented, wanted to learn only as much of ia32 as I needed to compile python code but we naturally ran into a lot of problems in the compilation. The only way we could figure out what we were doing wrong was to look at the assembly, figure out what it was doing, and then pinpoint what it was doing wrong, so that involved me learning a lot of the details that I didn’t set out to learn. By the end of it, my way of debugging was to look at a python program, hand-write the assembly for it, and then work to make our compiler produce similar output. I actually got pretty fast at it, much to my
surpise.

The most rewarding thing about the project was that it was really cool to see it work correctly. We tested by comparing the output of our binaries with the standard python interpreter (which actually made things really easy to test), but it was really cool to see all of our test programs compile and run correctly. When we finally got all of our tests to pass, I definitely felt a sense of accomplishment from completing such a monumental task. I don’t know if you get that sense in real software development, because your software is never “done” but the relief/joy/pride in completing almost made all the effort worthwhile.

The project isn’t something you can show off to a lot of people (like our pacman projects from AI), because I wouldn’t even expected non-computer-saavy people to know what a compiler is, but I find this to be further validation of how cool the project is. When you build a website and show it to people, they generally react positively, but when you build something that people don’t understand and they stare blankely at you, it validates that you are good at what you do. It’s similar to how my friends were awestruck when they found out their GSI (graduate student instructor) was studying quasi-smooth derived manifolds. When you are studying something that “no one” knows about and you’re successful, it’s very rewarding.

Everyone at Berkeley warns against taking compilers, because of how hard it’s supposed to be. However, my advice to Berkeley CS undergrads: definitely take the compilers class. I think it’s the single class that I’ve learned the most in, and it’s actually pretty rewarding too. But be prepared to work really hard.

Crazy Python Features

Friday, May 15th, 2009

I just finished taking a compilers class (well except for the final), and it’s been the single hardest class I’ve taken in college. Homeworks were non-trivial, exams were very difficult, and the class project was to write a compiler for a subset of python. As far as class projects go, this one has been completely ridiculous, my project partners and I have devoted weeks to building this compiler and fortunately we managed to finish it, but many groups I believe did not.

Anyway, in designing the compiler, I developed a new understanding of python features, despite having programmed in python for over a year.  My group and I found a lot of really crazy features of python that lead to problems when designing our compiler. I’m not going to talk about all the cool things that everyone knows about python, like lambdas, function values, nested functions and so on. I thought it would be fun to mention some of the really cool stuff here.

Dynamically Adding to Classes

Python basically implements classes as dictionaries. Instances are simply copies of the dictionary (well technically they are copied lazily, using copy on write), but instances naturally contain all of the mappings that the class contains (as they should). However, you can dynamically add mappings to classes so I can do something like this:

class A(object):
    x = 3
a = A()
print a.x ==> 3
print a.y ==> AttributeError
A.y = 7
print a.y ==>7

And I can even do this for methods:

class A(object):
    x = 3
A.foo = (lambda self,y: (self.x, y))
a = A()
print a.foo(6) ==> (3, 6)

When we discovered this, my project partners and I were amazed. It’s a really cool feature but I can hardly see a use for it (if you do see one, please let me know). Further, it makes implementing classes pretty inefficient since you can’t use a virtual table approach that C++ and Java can use.

Differentiating functions and methods within classes

We submitted our project after thinking we were done and “thoroughly” testing everything. A couple of hours later, we got feedback from the auto-grading system saying we failed (among a couple others) a test case that did something like this:

class A(object):
    x = 4
def g(n):
    print n
a = A()
a.x = g
a.x(5) ==> 5

We failed this test case because we assumed that any call to an attribute reference was a method call, and so we implicitly pushed on a as the first argument. This isn’t the correct behavior as we then pushed on one more argument than we should have. When we saw this, we were again amazed. Python somehow knows that a.x is a function and not a method, so it knows when it should implicitly push on self.

We hacked together a fix that I thought was pretty cool. We were representing all functions as boxed 12 byte function values, where the first byte pointed to a dummy virtual table, the second byte pointed to the code for the function and the 3rd byte was the static link for that function. Since from this perspective, functions and methods were exactly the same thing, we had no way to tell whether the object we had was a function value or a method value, so we were pretty stuck. Instead, we added a byte to all function values that kept track of whether they were functions or methods (yeah this is a huge waste of space, but performance wasn’t our priority at this point). Then at runtime, we are forced to check whether the value we’re trying to call is a function or a method, and behave appropriately in each case.

Unfortunately there’s a huge increase in code size. Now every call in python requires the check mentioned above, and two distinct call instructions, of which only one would get executed (because we were required to push on the number of arguments as a parameter to the callee function). We discovered this bug like 2 hours before the deadline, and we weren’t being graded on performance, so we didn’t really care about this issue. I am interested to know how python does it. My guess is that in the class dictionary, methods aren’t stored as “function values” so you would be able to tell right away, but I haven’t looked it up. If you know how python does it, let me know.

If-Expressions

I didn’t really use If Expressions before this class, and now that I’m more familiar with them, I think they are actually quite useful in writing concise code. I’m not talking about the standard if-elif-else clauses, but more of the analogue to x ? y : z syntax in C. And this feature isn’t that crazy, in fact it’s what I expect python would do. So an if expression looks like x if y else z and the semantics are that this statement returns x if y evaluates to True and otherwise it returns z. So it’s just like the ?: construct in many other languages. It’s cool when you use it with function values. For example:

def foo(x): print x
def bar(x): print -x
y = True
(foo if y else bar)(4) ==> 4
z = False
(foo if y else bar)(4) ==> -4

And that’s really cool. I can really concisely choose what function I want to call at runtime. I haven’t thought of a lot of uses for this exactly, but I can definitely see myself taking advantage of it’s presence.

The global keyword
There are some really cool features with global that I think are pretty standard in most languages but that I’m not used to seeing. For example:

def g():
    global a
    a = 3
print a ==> NameError
g()
print a ==> 3

This means that the when you declare a as global, if it’s not already bound in the global environment, you make a binding for it. I’m not exactly sure how this works in other languages, but I can totally see a language throwing an error if a is not bound in the global environment. Here though, python is smart enough to make the binding for you, and then you can use it after the call to g. This means that there are some really interesting ways that you can dynamically add names to the global environment.

So I’m sure there are more really interesting features of python that I haven’t discovered, and I’m pretty sure there are some that I have found that I’m just no remembering right now but these are some of the coolest language features I’ve seen. As I mentioned throughout, some are definitely more useful than others and I’m sure that in some cases these may be undesirable. If you know of any crazy python features that I didn’t highlight, I’d be interested in hearing about them.

dude… use python

Thursday, July 31st, 2008

So haven’t written a techy/nerdy post in awhile and I presume that’s because my life outside of work hasn’t been that nerdy during the summer. But I thought it’s time to return to my normal self and so… lets talk about python.

This article is titled the way it is because I’ve heard that phrase uttered so many times in the past year or two that it’s insanse. So I spent a little bit of time last summer learning python and the interns and I worked on a project in python, but it wasn’t too involved and I promptly forgot a lot of it in the months to come. I still used php or perl for a lot of my scripting purposes because I was familiar with them and my friends would continuously nag me to make the move over to python, but I never really got around to it.

So this summer, I was initially given the option to write in whatever language I wanted and thought that it would be a good time to learn python, especially because it’s used in a class I’m taking next semester anyway. I took about a week in the beginning of my internship to learn about python and ported some of my old code over to python. I already knew that python was pretty cool, but it seriously makes programming much much faster. And even though I ended up using perl for my intern project, I’m pretty happy that I learned python and I think I’ll be using it more in the years to come.

Ok so why is python so cool. First there are the reasons that everyone says, it’s like writing pseudo-code, but then it actually works. That’s pretty cool because programmers learn to think in pseudo-code but then have to translate that to whatever language. The closer the language is to pseudo-code the better.

It’s interpreted, which has pros and cons. Interpreted languages are generally slower than compiled languages, and yes I think python is slower than C, but for internal scripts where performance isn’t that critical, python being interpreted is pretty helpful.

Python is a lot like scheme/lisp with lambda expressions and functions as first-class data. I was disappointed that you can only have 1 line lambdas, but they’re still pretty useful. And passing functions around is amazing and extremely useful. Also has built in functions like map() and filter() which just make list manipulation so much easier.

Consider this, lets say I have a list of numbers and I want to make a sorted list of all the even numbers. In python…
sorted ( filter (lambda x: return x % 2 == 0, myList))
In most other languages (except lisp)… I’m pretty sure it would be a lot less concise. And yes the same sort of thing can be done in scheme, but I like scheme a lot too.

I like the way that python allows me to include my test cases of a module within that actual module. The if __name__='__main__': feature is pretty cool. You can’t really do that in perl and it’s annoying because I need to have a testModule.pl script for every Module.pm. With python I pretty much reduce the number of files I have to worry about by a factor of 2.

Also python is pretty easy to read and that makes your programs easier to maintain. I’ve been experiencing this problem where perl isn’t that easy to read (especially if you use a lot of the uncommon features) and now that I have to go back and make changes to my code it’s pretty hard to figure out where I’m supposed to look and how I’m supposed to make the change. And not that I’ve done this, but I think it’d be a lot easier if I wrote in python.

Anyway, I’m not really an expert on python or programming languages in general. In fact I haven’t even used python for a substantial project or anything, but I have been reading about it and playing with it so I found some things that I like. And I think I’ll be using python more in the future so I’ll become more and more familiar with it and probably find more things I like and also some things that I can’t stand.

Disclaimer: If my facts are wrong here please feel free to correct me because these were the impressions that I got and it would be good if I got them corrected.

Rumor Mill

Wednesday, January 9th, 2008

During this break, I’ve had a pretty hard to keeping myself occupied. I’ve been playing a lot of music and hanging out with my friends a lot, but I haven’t really been in touch with my nerdy programming self apart from this company that I’m working on. At the same time, my brother has been talking about this whole “viral” movement and we’ve been discussing strategies to get your product out to tons of users with minimal effort. One of the obvious solutions is to make a Facebook application which is immediately available to the however-many-million users there are on Facebook and is also easily spreadable via the invitation process. I decided that I too would jump on this all-too-techie bandwagon and make myself a Facebook application.

Over the summer, I was very against the whole Facebook platform movement (you can read about it here) and I’m still not really happy with a lot of things that I’ve heard about Facebook, but I made this app more for entertainment than for anything else. I’m also not very good at using client libraries and api’s and whatnot (I’m really impatient and don’t like to read things) so I figured this would give some practice at working with an api that I definitely needed for the success of my app.

I’ll cut to the chase. You should add this application. Now you may ask: what exactly is it? Well, it’s called Rumor Mill and essentially it’s a way for you to make up, or publicize rumors about your friends. It’ll also keep you up to date with all the latest rumors (real or not) about your friends and let you determine if the rumor is true or not. Once you make this judgment, it’ll show you what other people thought about the rumor. So, why should you add it? Well first of all, what’s the harm? It’s very unobtrusive; it takes no space on your profile page and it remains dormant until someone makes up a rumor about you, when you’ll get a mini-feed notification about aforementioned rumor. Secondly, it’s fun; who doesn’t enjoy make up random stories about your friends and seeing what other people think about them? I think that there’s tons of potential in this application and hope to see the subscriptions skyrocket in the near future.

In reality, I didn’t spend that much time making the application (Sunday and Monday) and I don’t really plan to spend much more fixing it or doing any other Facebook development. I kind of wanted to see what all the commotion about Facebook development was about. I thought I could see how popular it gets (you should add it! and use it!). I also want to switch hosting companies and this other company has a discount (it’s free) if you’re hosting a facebook app on your server.

Anyway that’s my little bit of publicity, add the Rumor Mill!