Archive for August, 2007

Perl XML Parsing

Friday, August 31st, 2007

Originally, I built the framework for this blog entirely on PHP, using XML to store my articles, comments, and all other data (see this and this for a detailed explanation of the original structure). Unfortunately, this design didn’t lend itself to scalability or adding functionality in terms of XML parsing. In contrast, perl has a DOM-based parser that makes parsing xml nice and easy. I decided to have a go at writing an xml parser in perl that has identical functionality to the php one I’m using currently thinking that I’d be a lot easier to enhance my blog.

I’ll start off by briefly explaining both parsers. PHP’s is handle based, meaning that the parser object starts at the top of the document, and whenever it gets to a tag, it calls either a start tag handler, or an end tag handler. When the parser gets to text between tags, it calls a data handler. The parser is good for crawling a file top down, but it’s not that easy to use if you’re looking for specific tags scattered throughout the file. You have to have dummy handlers that don’t do anything and keep switching the handlers when you come across the tag that you’re looking for. It’s also not very good for counting the instances of a specific tag because you may have to use global variables (generally a bad practice that should be avoided if possible) as counters. What’s more, each handler function is essentially just an if/elseif ladder that does something depending on what tag it’s analyzing. What’s more, adding functionality meant changing every handler to recognize a certain tag, or to change behavior when at a certain tag. In my case, I had about 12 different handlers, things were really redundant, and adding functionality was enough of a pain that I just decided not to do it (not the right solution). Using the PHP parser lead to really sloppy code and made it hard to add features to the parser.

In contrast, the perl module XML::DOM is a DOM-based parser, it works a lot like javascript parsing of HTML pages. You can call methods like “getElementsByTagName”, “getAttributes”, and “getChildren” on any node to extract information from that node. It’s pretty simple, easy to use, and the online documentation at CPAN is very useful.

So why switch over from PHP to Perl when it involves hours of reprogramming already working code? For one, I want to add functionality. If I kept the parser in PHP, this would take a lot of time and cause me plenty of frustration. Now that things are in perl, adding a feature is a straightforward as writing a simple subroutine. What’s more, the perl parser eliminates all redundancies, I’ve modularized everything into subroutines so that I never have repeated code in my parser. My perl code is also a lot cleaner, I don’t have long if ladders, instead I have a hash that maps tag names to subroutines. Everything in the perl parser just seems a lot cleaner than that PHP one, and I don’t regret spending a couple of hours to fix up my code.

One of the cooler aspects of the perl XML Parser that I’ve written is that I’m using mutual recursion very liberally. I have a couple of main functions: one for generating the html for multiple stories, one for generating html for just one story, one for getting all the titles of the stories in a given feed, and one for retrieving the latest story it. The first two functions call a private parse_story method, which goes into the mutual recursion. Each tag name in the story is a key in a hash that maps tags to subroutines. I iterate through all the child tags, and execute that subroutine. Each of these subroutines spits out some html and then tries to parse all the children again, recursively. With this recursion, I only need one subroutine per valid tag name, and the XML is parsed similarly to performing recursive algorithms on a tree structure. To add support for a tag, I just have to add it to the hash table, and build it’s subroutine. With this mutual recursion, the parsing code is much simpler and a lot easier to build upon.

If you notice, all of my served pages are still in PHP, so how am I calling this perl parser from php? I’m using PHP’s shell_exec function, which executes a shell command and returns a string of all the output from that shell command. The perl parser returns a string of html that my web pages echo out to the client. I don’t think this is optimal in terms of performance because I have both php and perl running simultaneously, but I didn’t want to build my entire framework in perl, so it’s a hacked solution.

So perl’s the way to go for XML parsing. It’s relatively easy, and allows for maintainable code (unlike PHP’s parser). Unfortunately, perl’s not nearly as popular for web scripting as PHP is, so if I wanted to distribute this, I’d probably have to use a PHP parser. Anyway, I’ve had the perl parser running on the site for about two weeks now, and I haven’t received complaints or seen any bugs, so things seem to be working well.

Last.fm and the music revolution

Thursday, August 30th, 2007

Last.fm, a London based company that started a couple of years ago (I think), has pretty much revolutionized music for a lot of youngsters. Their website is essentially a social networking web application organized around music and the music people listen to.

Last.fm records and stores information about all the music that you listen to on your computer or portable music device and makes that information available online to other users. They use pretty interesting software to interface with a variety of media libraries and devices, and they’re constantly extending to support others. The “last.fm” application interfaces with your last.fm account and encompasses all of the “scrobbling” (collecting of song data). It also allows the user to listen to tailored radio stations that play only the music that the user is looking for. I’ve found that their radio stations work just as well as Pandora.

On top of all these interesting music features, their website provides all the standard social networking stuff. They have hundreds (maybe thousands) of groups, forums, messaging, public posting, profiles, blogging, and pretty much everything you could want in a social networking website. What’s interesting is that I think they’ve really thought through the social networking side of things and done it really well (at least a lot better than myspace). You don’t see a lot of flaming in discussion threads, and there’s very little vulgar content, but they still do a great job of connecting people with each other. They’ve clearly done something right on the social networking side of things.

I’ve read a couple of articles about social networking, and I know some of the common pitfalls and whatnot, but I haven’t been able to figure out how Last.fm works so nicely. Almost all content is universally readable and writable, and their user base is mostly teenagers who are usually the problem demography in social networks. What’s more, they don’t have any noticeable moderation of content. Everything seems to just work.

Moving away from the technical side of things, Last.fm really has some useful tools. They provide event listings (mostly concerts) that allow users to physically meet each other and socialize. They also have a really useful “related artist” feature that lists other artists that are musically similar to a given artist. I almost always use this feature to browse for new music that I may want to get. There are a lot more interesting things that Last.fm offers, but I can’t say I’m a very active user (apart from listening to a lot of music), so I don’t really know much about them.

I’ve used Last.fm for a couple of years now, and I’ve watched them grow steadily, adding features and increasing membership. They’ve done a good job, adding features that relate to their core goal rather than expanding into other sectors (which I think a lot of growing companies tend to do). All this time, they’ve been very successful. I highly recommend their service to anyone who listens to a lot of music.

Album Review: Paramore – Riot

Saturday, August 25th, 2007

Before getting this album, I’d only heard Paramore a couple of times on last.fm radio. I enjoyed what I heard there, but because I didn’t have any actual music from the band, I never really got into the group. Last week, two of my friends independently told me to get “Riot,” so I finally picked it up. Rather than plagiarize a biography of the band, see their wikipedia entry.

My first listen of “Riot” was similar to every other new album I listen to. It didn’t seem to be anything special. Songs were relatively catchy, but it sounded like a typical pop-punk album. I didn’t expect it to be as good as I think it is now, but as I do like pop-punk, I listened to it again… and again. Now after several listens, it’s definitely my favorite album, and I’m even recommending it to my friends (who don’t even listen to rock).

Not to be chauvinistic or anything, but there aren’t many good female rock vocalists singing in modern groups. However, I really liked Hayley’s voice, and she has an edgy sound that goes really well with the instrumentation, and the general effect that the band’s trying to get. In this context, I actually think that Hayley actually does a better job than a male singer could. Unfortunately, lyrics are actually pretty generic. Hayley sings about the same things as every other pop-punk band, and there isn’t much metaphor or poeticism to make things original.

Instrumentally, the album’s pretty good. Drumming is impressive, with nice riffs and fills. The drumming also adds a lot to the swells and movement of each song. Guitar work is pretty typical of pop-punk; lots of power chords, repetition, and very little soloing. Unlike other pop-punk bands though, the vocals don’t entirely overpower the guitar part, so you can actually hear the guitar riffs. Even though it’s pretty simple, the guitar only augments edgy effect the band is trying achieve, and is a bit unique in that you can actually hear it.

Overall, everything comes together very nicely. Regardless of how generic each individual instrument (vocals included) is, the end result is awesome. Part of it may be due to Hayley’s singing, which I really like, or it could be that I’m a fan of most pop-punk bands, but I’m really digging this band right now. I do think that it may get old pretty soon, but I still highly recommend it.

My rating: 7/10

Recommended Tracks: “Misery Business”, “Let the Flames Begin”

Paramore

Summer’s end

Thursday, August 23rd, 2007

School starts in 5 days, and I’m really excited to move in and go back. Summer was really awesome, and for the first time in my life, I’m actually content enough with my summer to look forward to classes starting. These last couple of months have been jam packed with stuff for me to do; activities ranging from working, coding, and writing, to running, watching movies, and surfing. Looking back, this summer was not only fun, but it was also really productive.

Okay, boring stuff first. My summer was really productive. I had a full time internship where I learned so much (see What I learned at Tellme). Not only that, I started, finished, and worked on a lot of my own projects. I started this blog, which I’m happy to say I’ve maintained pretty well for the past month or so. I spent a lot of time thinking about design ideas for WeNote, which unfortunately I don’t think I’m going to actually carry out just because I don’t see much of a future in it. I wrote a lot of much smaller scripts that I’ve started using and on top of all of that, I’ve gotten a lot more comfortable with everything involved in programming. The fact that I learned so much this summer really added to my overall experience and to be perfectly honest, made my summer complete.

Now that that’s out of the way, on to the fun stuff. So all the fun things I did: playing guitar, running, playing soccer (lots of soccer), playing basketball, watching movies, going swimming, going surfing, going to the beach, hanging out with friends, and meeting a lot of new people. Truly, all of my experiences this summer were incredible. I loved playing guitar with my friends whenever we got the chance. We played soccer so consistently throughout summer, but every pick-up game was fun, unique, and exhausting. I went surfing, for the first time, with some of my co-workers and it was amazing. Of course I enjoy spending time with my high-school friends, mini-golfing, getting dessert, or just hanging out; it’s always fun. Finally, I was really happy to meet so many new, cool people from work, from soccer, and from other really random places. Even though I didn’t do that much really interesting stuff, I definitely enjoyed all the not-so-interesting stuff that I did this summer.

Even with everything that I did, I still regret not doing anything really special. One of my friends just came back from hiking Half Dome in Yosemite, and she said it was an amazing, unforgettable experience. Apart from the surfing trip, a lot of my summer is pretty forgettable, regardless of how fun it was. And it wouldn’t have been difficult to plan a memorable experience, I was just too lazy or too preoccupied with everything else that was going on.

On the whole though, my summer was amazing. Still, good things need to come to an end, and I’m excited for school to start next week.

Perl vs. PHP

Wednesday, August 22nd, 2007

I learned PHP in January and February of this year, using it for a couple of websites that I’ve written (including this blog). Over the course of several projects, I’ve become pretty familiar with the language, the online documentation, and all the in’s and out’s of PHP. I also first touched perl at the beginning of this year, but only for a short little project. This summer, most of the programming I’ve done at work has been in perl, so I’ve gotten quite familiar with it as well. Now that I know both of these pretty well, I have a little more flexibility in designing and implementing my ideas. That being said, every time I start a project, I have to think about which language is ideal. Now, I’ve gotten pretty good at this, and I know when to use perl over PHP and vice versa.

For web scripting, I prefer PHP. I really like that PHP can be embedded into html. It’s really quick and easy to write some pretty powerful web pages with PHP, and I’m a lot more comfortable with web forms in php over perl. I also have an incentive to write web sites in php, because my local server doesn’t actually execute any perl scripts called by the browser. So for most of my web stuff, I use PHP.

On the other hand, shell scripts are a lot easier to write in Perl. With perl, it’s really easy to interact with the shell and execute shell commands, and I also like that you can easily pass in arguments when you call a perl script from the command line. What’s more, with the default file handlers in perl, it’s easy to take inputs and have a more interactive shell script in perl. Although both languages have the capabilities to run through the shell, I just find it a lot easier in perl.

I also prefer perl for text-processing, most because it’s a lot easier to use regular expressions in perl. They “~” syntax is very simple, and a lot more readable than the “preg” functions in php. I also find it a lot easier to work with files; the perl file-handle data type makes reading and writing to files really clean. Again, PHP has all the same functionality as perl, it’s just a lot easier in perl, which is why I prefer it for text-processing.

Along the same lines, XML parsing is a lot easier to do in perl. Actually that’s the reason why I decided to write this article. I used to be parsing my xml feeds to this site using PHP’s xml parser, but have recently switched to a perl parser that I wrote yesterday. Because perl has such extensive libraries (i.e. CPAN) there’s less need to re-invent the wheel in perl than there is in PHP. PHP is a pretty powerful language, but because perl has a huge developer community that contributes to CPAN, a lot of functionality is constantly being added to Perl, while PHP isn’t such a dynamic language. In that respect, I prefer to use perl for a lot of things.

So why use php at all? For one thing, php is a higher level language. Most of my code remains pretty clean and tidy, and I don’t have to worry about a lot of minor issues, like variable declarations, that I do have to worry about in perl. I also like how php function declarations include the input parameters, whereas in perl they’re passed in to the special @_ array. PHP is quite a bit simpler to use than perl, so I prefer it when I don’t need the additional power of perl.

Recently, I’ve taken to using perl over PHP because I’ve been using perl a lot more recently and I find that I can do everything that I want to in perl. In PHP, I still can do almost everything that I need to, but sometimes it’s a little syntactically awkward and messy, which is why I prefer perl. I’ve started to figure out which language is ideal for the task at hand. As I learn more languages (I’ve started looking at python), I’ll naturally be better equipped with solutions to a given problem, but it’ll be harder to decide which language to use. Personally, I see this as a good thing, because it’ll be a lot easier complete my tasks if I choose the correct tools.