The news is making me depressed

July 19th, 2009

I started reading the news recently (yes I used to not read the news or really care what was going on in the world) and it’s making me really depressed. I subscribed to the New York Times’ RSS feed and here’s what I read about today.

I guess there is some light in all of this but it doesn’t seem like much consolation

  • Mauritians See Chance to Break Cycle of Coups - After years of living under military regimes, the Mauritians appear to be having a legitimate election. Of course, the current ruler is a pretty strong favorite to win, and he first came to power via a coup, so while this is good, it’s not all that amazing.
  • Signs of Hope Emerge in West Bank - The Palestians, no longer under Hamas, is experience some economic growth and some form of peace. There has been a lot less violence between Israel and Palestine over the last year or so and I personally am pretty hopeful for some sort of peace agreement.

And while most of the world is as chaotic as ever, Israel seems to be relatively peaceful. Of course it’s a pretty precarious peace here, but there haven’t been any commotion since I’ve been here, so that’s pretty good.

I think this is partly why I don’t like to read the news. I guess it’s good to know what’s going on in the world, but when the world is as messed up as it is, it just makes me depressed.

Israel V: The Northern Coast

July 18th, 2009

Wow so I wanted to write about my adventures in Israel every week but that totally has not happened. I’m thinking that this is probably a good thing, because it means I’ve been too busy to find time to write in the past couple of weeks. So, what exactly have I been up to? Unfortunately, since I’ve been up to a lot, this post will only cover really one experience.

When we left off I had just recovered from being sick and miserable, and was starting to have a good time with the friends that I’ve made here. I was planning a trip to the Sea of Galilee and I was really excited to see more of Israel.

I didn’t end up going to the Sea of Galilee, but instead went north to Rosh Hanikra and Akko. Rosh Hanikra is a kind of “natural wonder” right on the border between Israel and Lebanon. The Mediterranean Sea has carved these grottos into the cliff wall and you can walk into them and explore these really cool caves (check out some of my pictures pictures). Akko is also a pretty interesting place; it’s a town at the across the bay from Haifa that was used as a kind of fortress during the crusader period. It was also occupied by the Ottomans, so there are a bunch of artifacts from that period too (pictures).

I took a day off of work, because it’s much easier to travel on weekdays than on weekends here, and took a train from Tel Aviv University to Naharriya, one of the northern-most coastal towns. From here I needed to take a bus to Rosh Hanikra, which is about 10km north of Naharriya. Naturally, I didn’t really know how to do this, so I asked a couple of people and they told me to wait at this bus stop. I waited and waited, and buses passed by, but none of them were going to Rosh Hanikra. Eventually, this taxi stopped and asked me where I was going and said he’d take me to Rosh Hanikra. The taxi, although it looked like a normal taxi, was some kind of shared-taxi. He stopped at all of the bus stops and picked up one or two people, and dropped them wherever they wanted to go. I guess it was good because it was much cheaper than a regular taxi.

Anyway, he dropped me at Rosh Hanikra and I went to go ask about admission tickets (It’s kind of a touristy place so of course you have to pay to see the grottos). You take this cable car down from the top of the cliff to where the grottos are and on the cable car I met a bunch of South African high schoolers, who were on some sort of summer trip through Israel. Their “chaperone” was a college student so I hung out with him and his group for a bit before parting ways and wandering around on my own through the grottos. Of course the grottos were really impressive, but it’s hard to capture that in words, so check out my pictures instead.

After wandering around, I took the cable car back up to the top of the cliff and asked about how to get back to Naharriya. Everyone I asked didn’t really seem to know how, so I kind of started freaking out, but one person told me to go wait at the bus stop and a bus would eventually show up. This was still pretty worrisome as I wanted to get to Akko in time for lunch, and it was already approaching noon. There were around 20 army girls waiting at the bus station, apparently also going back to Naharriya, so I started talking to them asking about how they were getting into town. They had called a sherut (a different kind of shared taxi) and invited me to come along with them. Long story short, I met a bunch of army girls and made it to Akko in time for lunch.

Akko was pretty interesting, but not as incredible as I imagined it would be. On the plus side, the markets were really cool, and there were some interesting artifacts from the crusader and ottoman times. I had heard that there was this amazing hummus place in the Akko market so that was my first stop for lunch. These hummus places are funny, because if your party doesn’t fill up a table, they’ll sit other people with you. I was by myself, so I was seated with a German girl and a Mexican guy, who are studying in England. So we ate lunch together and it was pretty fun, they were both interesting people and had some good stories. In retrospect I really like the policy of sitting you with other people because it makes you’re meal a lot more social. And of course, lunch was amazing; seriously it was probably the best hummus I’ve ever eaten, on par (or better than) Abu Hassan in Yafo.

After lunch, I wandered around the old city, saw the harbor, some of the mosques, the sea wall, and many of the tourist attractions. I also wandered around the market a bit and got to see all of the colorful spice stands and experience some amazing smells from food stands. Afterwards, I wanted to go see the Knight’s Hall (where the Templar’s lived during the Crusades) and ran into this French guy who was literally motorcycling the mediterranean (from Algeria back to France). We hung out together and saw the Knight’s Hall and walked through the market again and then parted ways but he was a really cool guy. As for all of the things I saw, just check out my pictures.

As I was leaving, I walked through the market one last time and met an Arab-Israeli who was trying to buy some fish. He also was really friendly and I walked with him and watched him bargain for fish (which was really entertaining). He invited me to lunch, but as I was still really full from my earlier meal and politely declined. Finally, on my way out I ran into the South African kids again and walked with them for a bit. Then I got on a train from Akko back to Tel Aviv University, took the bus back to my dorm and pretty much just passed out. It was a long, busy day, but totally worth it.

One thing I really liked about this day in particular (and most of my time in Israel) is that you meet a lot of random people and all of them are really interesting. Maybe it’s partly me becoming more engaging with strangers, but I find that here I interact with “strangers” a lot more and consequently get to see new perspectives and learn about cool places and things to do. Like on this trip, I met and hung out with the South Africans, shared a taxi with some Israeli soldiers, had lunch with the Mexican and German, explored Akko with the French guy, and wandered the market with an Arab-Israeli. I cannot imagine meeting such a diverse group of people at school, and this is definitely something I’m going to miss about being here.

Definitely this wouldn’t have happened had I been with a big group, which is part of why I really like being here by myself. I’ve made a completely new group of friends, and we’re a small enough group that we still interact with new people. I think if I were here with a bunch of friends then I wouldn’t have met a lot of the people that I did, and consequently I would have missed out on a bunch of great stories and experiences.

So despite planning to spend the day by myself, I met some great people and had good company throughout my trip. I ate good food, saw some pretty spectacular things, met interesting people, and of course, skipped a day of work!

So I took this day-trip the week after I wrote my last post. A lot of time has passed since then so I still owe you many more stories. Work has been keeping me pretty busy but I’ll see if I can find some time to write more during this coming week.

NP-Completeness in the workplace

July 7th, 2009

My algorithms professor used to tell his students (including me) this story to motivate studying NP-complete problems and reductions. The story goes something like this: say you’re working as a software developer and your boss gives you this project where you need to come up with a efficient way to do some computation. You go to your desk and think about for awhile and then suspect that there is no efficient way to do the computation. So you can’t just go back to your boss and say, “I think this problem is NP-Hard, so I give up,” you need to show your boss that it’s NP-Hard and this motivates the studying of reductions. It further motivates the study of approximation algorithms and other techniques to cope with NP-Completeness.

I’ve heard this story a couple of times from my professor, but I never really felt like this is something that would happen. How often is it that you are given a problem that doesn’t look intractable immediately, and how much convincing will your boss really need? For that matter how often, in a typical job, do you encounter intractable problems? Like how often does this really happen at a typically development job? Now I haven’t spent too much time in industry and really in my time I never ran into NP-Hard problems so I don’t really know. Although the story was motivational for an algorithms course, the situation used to seem implausible to me. That is until I was challenged to do this myself

This summer, I’m working in a computational genomics lab refining some algorithms for analyzing gene expression data. The core problem we are working on is NP-Hard, but in my refinements, I ran into another NP-Hard problem that I did not immediately see as intractable. I guess this story may be a little bit skewed because I am basically doing algorithms research but I still thought it was pretty cool.

My work deals with finding connected components in a large gene network (an undirected graph). To simplify things, we have a scoring function and want to find connected components with good scores and because of the scoring function this turns out to be NP-Hard, read this for the exact problem and why it’s NP-Hard (The exact problem is a generalization of set cover). Anyway, the basic way we do this is to pick a vertex with a “good” score (called the “seed”) and then expand the module from that node.

In our experimentation, we decided to see what would happen if we allowed seeds to be multiple disconnected nodes and grew our modules from these multi-node seeds. Basically, here you choose k nodes to be your seed and you grow the module out from these seeds, but it is not a requirement that the result is a connected component, in this case it is ok if the resulting module consists of k connected components. The motivation was that from a biological perspective, a disease may affect two or more gene pathways and we wanted our algorithms to be able to account for that, hence allowing multiple connected components.

I set out to implement this feature and was thinking about ways to create these multi-node seeds. I started out by filtering away all of the “bad” vertices and was left with a set of 30-50 good vertices but it was (and still is) not clear what’s a good way to combine these single seeds into multi-node seeds.

My first approach was to simply enumerate all of the combinations and then choose the ones where the individual vertices had good scores. This approach besides having exponential time complexity was bad in that the seeds we selected were all very similar. This was because the individual nodes that were very good were in all of the multi-node seeds we selected. Thus this approach was not satisfactory. Another problem with this approach was that we typically did not see disjoint connected components in the resulting solutions.

To remedy the last problem, we decided on a new approach where we select multi-node seeds where each individual vertex is far from the rest. We compute this by looking at the neighborhoods (typically only 1 level of breadth first search) around each seed and computing the intersection of these neighborhoods. So this seemed like a good approach and we were hopeful this would produce disjoint connected components in the results so I set out to implement this.

Lets restate the problem: Given a set of vertices in a graph, where each vertex has an associated set of elements from some universe U. The goal is to find k vertices that minimizes the intersection of the sets, or such that the intersection of these sets is the empty set.

I thought about this problem for awhile and was quite confident that it was intractable. Before talking to the grad student I’m working with about it, I wanted to have at least a sketch of a reduction in mind. My first reaction was that I should be able to reduce set cover to this problem, possibly by “negating” all of the sets or something. I messed with this for awhile and I didn’t really get anywhere.

I mentioned this problem to some other people in my lab and we all thought about it for awhile. Eventually one guy found that he could reduce independent set to this problem quite easily. If you instead go back to the gene network graph and consider each of the “elements of the universe” as neighbors of the vertices in the graph, then you immediately see that this is the independent set problem. If you treat each neighborhood as a node in a graph, and there is a edge between two neighborhoods if they share an element in common, then you can solve this problem by running an independent set solver on this graph. Conversely, given a graph, you can define a universe of elements and a set to each vertex such that vertices that are not connected share no elements and vertices that are connected share some elements. Then you can look for multi-node seeds and solve independent set. The way I explained the problem above, and the way that I thought about it made it difficult to see that it was independent set, but if you go back to the original problem, it’s easier to see.

Now that I new the problem was NP-Hard, I started thinking about ways to get good enough multi-node seeds and eventually settled on a greedy algorithm that grows the seed sets by adding the node with minimal intersection. The resulting seeds are decent and we did start seeing solutions with multiple disjoint components which is exactly what we were looking for. Despite being faced with a “challenging” problem, we did end up finding a good working solution that gave us the results that we wanted.

This little anecdote, although quite silly in retrospect (because the reduction was pretty straightforward), really showed me that what I learned about reductions and NP-Completeness is important from a practical perspective. It also showed me that my professors story was actually quite plausible. Plus it was another situation where I got to apply things that I learned, and I always find that really cool.

Dynamic Programming in Haskell and why DP is useful

June 28th, 2009

I’m continuing to learn Haskell in my free time and as I like to learn by doing, I’ve been implementing common algorithms in the language. Last week I tackled depth first search and found it to be surprisingly challenging and consequently really educational. Today, I decided to try my hand at dynamic programming. I chose dynamic programming because these algorithms hinge on memoization, and I wanted to figure out how to do that in a purely functional way. Also, I thought that I could do some performance analysis, as implementing recursive, un-memoized algorithms to solve these problems seemed to be quite easy.

The first algorithm I tried to implement was the standard edit distance algorithm. It’s a classic DP algorithm and although no longer used much in practice, it is the basis for several modern techniques used in sequence alignment, so I figured it would be an interesting exercise. I spent an hour or so thinking about how to approach this and even started programming a solution, but I quickly got bogged down in maintaining a matrix of subproblem solutions. I found implementing all of these matrix operations to be quite mundane and decided to start with something a bit less tedious.

So I chose another classic DP problem, Longest Increasing Subsequence. Briefly, the problem formulation is: Given a list of (lets say) positive, unique integers, find the length of the longest subsequence such that the subsequence is strictly increasing. For example, given [1, 4, 2, 3, 9], the longest increasing subsequence would be [1, 2, 3, 9]. Read the wikipedia article for more details. I chose this problem because you don’t have to maintain a matrix of subproblem solutions, but rather a list, and the problem is quite a bit simpler, allowing me to focus on the algorithmic elements.

So, I started out by implementing a naive un-memoized algorithm, then went on to implement the dynamic programming algorithm that’s based on the one on the wikipedia page, but in Haskell it doesn’t really look the same at all. I implemented both so that I could do run some performance metrics which I’ll discuss about after presenting my code. First some auxiliary stuff:

return the item in the list indexed by i –
listGet :: [Int] -> Int -> Int
listGet lst i =
    if i == 0
       then (head lst)
            else
                listGet (tail lst) (i-1)

return the index of item i in the list
getIndex :: [Int] -> Int -> Int
getIndex lst i =
    if (head lst) == i
       then 0
            else
                1 + (getIndex (tail lst) i)

– given a list of integers, return the largest value in the list
listMax :: [Int] -> Int
listMax [] = 0
listMax lst =
    if (length lst)==1
    then (head lst)
    else max (head lst) (listMax (tail lst))

I admit I should have used arrays for this but I kind of started using lists before having to implement these and was too lazy to change it. I would change it now but I’m worried about breaking things right before posting and since I’m probably (read definitely) never going to look at this again it really doesn’t matter to me.

Anyway, so we have three functions that are all pretty straightforward. One essentially treats a list as an array and gets the i-th element from the array. Another does the reverse, it gets the index of an item in the array. Finally the third computes the max over a list.

My first algorithm is a naive approach that is just recursive with no memoization:

– compute the longest increasing subsequence starting from each index in the list
longestIncreasingSubsequence :: [Int] -> Int -> [Int]
longestIncreasingSubsequence [] _ = []
longestIncreasingSubsequence lst i =
    if (length lst) == i+1
       then [1]
            else
                [(1+(listMax
                (map (\x -> head (longestIncreasingSubsequence lst (getIndex lst x)))
                         (filter (\x -> (listGet lst i) < x)
                                 (filter (\x -> (getIndex lst x) > i) lst)))))]
                         ++ (longestIncreasingSubsequence lst (i+1))

– compute the longest increasing subsequence of the list
lis2 :: [Int] -> Int
lis2 lst =
    listMax (longestIncreasingSubsequence lst 0)

Ok so longestIncreasingSubsequence is a bit confusing and the comment is kind of inaccurate. If you call it with i == 0, then it’ll return a list where each j-th element is the value of the longest increasing subsequence that starts from and contains the jth item of lst. It basically computes the solution to all of the subproblems, but it doesn’t compute them by memoization. It’s similar if you start with i != 0 but the list only has a suffix of the subproblems. It computes these results by following the standard recursive formula: look at each j>i such that l[i]

This was a good exercise but not really what I wanted to do. So I tested that a bit and made sure it was correct, and proceeded to think about the dynamic programming algorithm. I found it a bit difficult to actually memoize the solutions, but here's what I came up with:

– given an array of subproblem, find the largest increasing subsequence starting at i –
dplis :: [Int] -> Int -> Array Int Int -> Int
dplis lst int arr =
    listMax
    ((map (\x -> 1 + arr ! (getIndex lst x))
          (filter (\x -> (listGet lst int) < x)
                  (filter (\x -> (getIndex lst x) > int) lst))) ++ [1])

– compute largest increasing subsequence starting at int –
getDPScores :: [Int] -> Int -> Array Int Int
getDPScores lst int =
    if (length lst) == int+1
       then array (0, ((length lst)-1)) ([(i, 0) | i <- [0..((length lst)-2)]]++[(((length lst)-1), 1)])
            else
                let arr = getDPScores lst (int+1) in
                arr // [(int, dplis lst int arr)]

– compute largest increasing subsequence of list
lis :: [Int] -> Int
lis lst =
    listMax (elems (getDPScores lst 0))

So I hope the functions make sense. dplis just does lookups in the array parameter to compute the longest increasing subsequence starting at and including i. getDPScores builds the array of subproblem solutions, starting from the smallest subproblems and relying on dplis. Finally lis is just a wrapper for calling getDPScores with the right parameters.

One noteworthy thing is the flow of the program. It’s tail recursive. So the first thing we do is get to the bottom of the recursion of getDPScores, then we really do our computation as we work our way out of that recursion stack. Each call to getDPScores after returning from recursion still has to call dplis. Of course there are a lot of tail recursive algorithm, but I didn’t really expect dynamic programming to naturally flow like this. Typically dynamic programming is done iteratively, but I suppose we can’t do that here.

So those are my two implementation, now on to performance stuff. I decided to experiment with this just to see how amazing dynamic programming really is. If you notice the first algorithm I implemented is definitely exponential (in the worst case, imagine a ascending list of length n, then each call has to spawn O(n) smaller calls). I think it’s O(n^n) but I haven’t thought too much about it so please correct me if I’m wrong. Anyway, the second algorithm is O(n^2), you just recurse straight down and pop all the way back up. At each recursive call you may have to look at O(n) neighbors and there are only n recursive calls.

Given the asymptotic running times, I expect the second algorithm to be far superior to the first in actual running time. Just for fun I decided to run this experiment and plot my results. I generated random lists of sizes ranging from 5 to 70 in increments of 5 and ran each algorithm on the same lists. Here are the results:

exponential vs quadratic times

exponential vs quadratic times

They are pretty much what I expected. Notice that the y axis is log-scaled, so that for input length 70, the exponential algorithm took around 10 minutes (600 seconds) where the dynamic programming one takes well under a second. Really amazing how dynamic programming helps that much. I had originally intended to run on larger input sizes but I simply couldn’t as the exponential algorithm wouldn’t terminate.

As I did in my last Haskell post, I did a quick search for other people who had looked at dynamic programming in Haskell. I found this article which walks through an implementation of the knapsack algorithm and it appears to be much more elegant than mine. Of course you should remember that this I’m still a Haskell newbie so I don’t have mastery over most of the features and I really don’t expect myself to be able to implement elegant solutions. However, he does mention that dynamic programming is quite natural in functional languages and that you can take advantage of lazy evaluation, which I can understand. Also the knapsack algorithm uses a matrix of subproblems, so it’s slightly different and more interesting.

I also found this article that implements the Needleman/Wunsch algorithm for global sequence alignment, another classic dynamic programming algorithm. This was also an interesting read, and he used some language features that I haven’t used and barely knew existed so it was definitely educational. Also, this algorithm also uses a matrix of subproblem solutions so I’m thinking that my next experiment will be with more one of these dynamic programming algorithms.

So, In conclusion, I’m feeling more confident in my functional thinking and my Haskell programming. In my previous post, Jedai commented that I shouldn’t consider arrays as cheating, and so I have adopted them into my repertoire here. Further, as proof of the importance of algorithm design and asymptotic efficiency, I generated some performance numbers to show how essential it is to have efficient algorithms.

Anyway… there’s a leak in my ceiling and water is dripping all over the place so, until next time!

Israel IV: Ups and Downs

June 26th, 2009

It’s been awhile since I wrote about Israel and what I’m up to so here goes…

If you read my last post about Israel, you’d know that I played in an ultimate tournament, met a bunch of people through ultimate and have been having a generally good time. So that was two weeks ago (the week of June 14-18th) and yeah that week was pretty awesome. I hung out a lot with people I had met through ultimate and had a lot of stuff to do.

That wednesday, I hung out with my friend Steve, and in part of our hanging out we went to a Falafel stand in town. Now this is ordinarily uneventful news, except on Thursday night, I went to play ultimate with a local team and got dropped off pretty close to my house, but I was so weak that I could barely make it up to my room. Fortunately, the next day was Friday (the weekend), so I spent that day pretty much sleeping. I think I slept like 18+ hours on Friday alone.

Turns out I got food poisoning. I’m pretty sure it was from the falafel, because that was the only out-of-the-ordinary thing I had eaten that whole week. So Friday I was really miserable but managed to sleep a lot and I felt quite a bit better by the next day. On Saturday, a family that I know here offered to pick me up and let me get better at their house. They called a doctor who, although he barely spoke english, managed to diagnose me and prescribed me some antibiotics. I spent the day and night at my friends’ house and they really took good care of me.

I took Sunday off, but I was feeling much better and moved back to my room. I decided to go to work on Monday. From then on the week was fairly uneventful, work had its share of ups and downs; there were times when I didn’t really have anything to do, and there were times when there was tons of interesting work to do. This coming week, Igor (the grad student who I am working with) and Professor Shamir will be away at a conference, so they gave me a bunch of stuff to do that should keep me busy.

Other than that, during the week, I really didn’t do much. I was still a bit shaken up from being sick and didn’t want to push myself and get worse. I pretty much hung out in my dormitory and kept myself busy.

Yesterday though, I went into town with Jake, my neighbor who is pretty much doing the same thing as I am in Israel. We wandered around for awhile and found a place to eat when my friend Jesse (from ultimate) called and said he was having a barbeque and invited us to stop by. It turned out the place we were eating was directly under his apartment so we ate and headed up to his place. The barbeque was awesome, I met a bunch of naturally english speaking med students, ate some really good food, and had a great time hanging out on Jesse’s roof.

And today, I went into Jaffo with two of my friends, went to an amazing hummus place, wandered around the flea market and then went to the beach. The beach was relaxing and I’m disappointed that I haven’t been spending more time there. I guess it wouldn’t be as much fun if you go by yourself, but now that I have friends to hang out with, I think I will be taking advantage of the beach that’s 10 minutes away!

So the last couple of days have been pretty jam-packed and fun, but the week preceding was horrible. And of course things are exaggerated because I was sick, but I think I’m going to have my ups and downs here. There’ll be times when I have a bunch of stuff to do and will barely realize I’m away from home, and there’ll be times when I’m just dying to fly back to the states. There seems to be an inverse correlation between how social I am and how much I want to go home. When I was sick and couldn’t hang out with people, I really just wanted summer to end and to go home. But these past couple of days, I’ve been kind of sad that I only have a couple of weeks left here.

And that’s the last couple of weeks. Up next, I’m working tomorrow so that I can take Wednesday off and go travel to the the Sea of Galilee. And I’m also planning a trip to Petra in Jordan, which is supposed to be amazing. Stay tuned!