Archive for the ‘projects’ Category

Perl XML Parsing

Friday, August 31st, 2007

Originally, I built the framework for this blog entirely on PHP, using XML to store my articles, comments, and all other data (see this and this for a detailed explanation of the original structure). Unfortunately, this design didn’t lend itself to scalability or adding functionality in terms of XML parsing. In contrast, perl has a DOM-based parser that makes parsing xml nice and easy. I decided to have a go at writing an xml parser in perl that has identical functionality to the php one I’m using currently thinking that I’d be a lot easier to enhance my blog.

I’ll start off by briefly explaining both parsers. PHP’s is handle based, meaning that the parser object starts at the top of the document, and whenever it gets to a tag, it calls either a start tag handler, or an end tag handler. When the parser gets to text between tags, it calls a data handler. The parser is good for crawling a file top down, but it’s not that easy to use if you’re looking for specific tags scattered throughout the file. You have to have dummy handlers that don’t do anything and keep switching the handlers when you come across the tag that you’re looking for. It’s also not very good for counting the instances of a specific tag because you may have to use global variables (generally a bad practice that should be avoided if possible) as counters. What’s more, each handler function is essentially just an if/elseif ladder that does something depending on what tag it’s analyzing. What’s more, adding functionality meant changing every handler to recognize a certain tag, or to change behavior when at a certain tag. In my case, I had about 12 different handlers, things were really redundant, and adding functionality was enough of a pain that I just decided not to do it (not the right solution). Using the PHP parser lead to really sloppy code and made it hard to add features to the parser.

In contrast, the perl module XML::DOM is a DOM-based parser, it works a lot like javascript parsing of HTML pages. You can call methods like “getElementsByTagName”, “getAttributes”, and “getChildren” on any node to extract information from that node. It’s pretty simple, easy to use, and the online documentation at CPAN is very useful.

So why switch over from PHP to Perl when it involves hours of reprogramming already working code? For one, I want to add functionality. If I kept the parser in PHP, this would take a lot of time and cause me plenty of frustration. Now that things are in perl, adding a feature is a straightforward as writing a simple subroutine. What’s more, the perl parser eliminates all redundancies, I’ve modularized everything into subroutines so that I never have repeated code in my parser. My perl code is also a lot cleaner, I don’t have long if ladders, instead I have a hash that maps tag names to subroutines. Everything in the perl parser just seems a lot cleaner than that PHP one, and I don’t regret spending a couple of hours to fix up my code.

One of the cooler aspects of the perl XML Parser that I’ve written is that I’m using mutual recursion very liberally. I have a couple of main functions: one for generating the html for multiple stories, one for generating html for just one story, one for getting all the titles of the stories in a given feed, and one for retrieving the latest story it. The first two functions call a private parse_story method, which goes into the mutual recursion. Each tag name in the story is a key in a hash that maps tags to subroutines. I iterate through all the child tags, and execute that subroutine. Each of these subroutines spits out some html and then tries to parse all the children again, recursively. With this recursion, I only need one subroutine per valid tag name, and the XML is parsed similarly to performing recursive algorithms on a tree structure. To add support for a tag, I just have to add it to the hash table, and build it’s subroutine. With this mutual recursion, the parsing code is much simpler and a lot easier to build upon.

If you notice, all of my served pages are still in PHP, so how am I calling this perl parser from php? I’m using PHP’s shell_exec function, which executes a shell command and returns a string of all the output from that shell command. The perl parser returns a string of html that my web pages echo out to the client. I don’t think this is optimal in terms of performance because I have both php and perl running simultaneously, but I didn’t want to build my entire framework in perl, so it’s a hacked solution.

So perl’s the way to go for XML parsing. It’s relatively easy, and allows for maintainable code (unlike PHP’s parser). Unfortunately, perl’s not nearly as popular for web scripting as PHP is, so if I wanted to distribute this, I’d probably have to use a PHP parser. Anyway, I’ve had the perl parser running on the site for about two weeks now, and I haven’t received complaints or seen any bugs, so things seem to be working well.

Perl vs. PHP

Wednesday, August 22nd, 2007

I learned PHP in January and February of this year, using it for a couple of websites that I’ve written (including this blog). Over the course of several projects, I’ve become pretty familiar with the language, the online documentation, and all the in’s and out’s of PHP. I also first touched perl at the beginning of this year, but only for a short little project. This summer, most of the programming I’ve done at work has been in perl, so I’ve gotten quite familiar with it as well. Now that I know both of these pretty well, I have a little more flexibility in designing and implementing my ideas. That being said, every time I start a project, I have to think about which language is ideal. Now, I’ve gotten pretty good at this, and I know when to use perl over PHP and vice versa.

For web scripting, I prefer PHP. I really like that PHP can be embedded into html. It’s really quick and easy to write some pretty powerful web pages with PHP, and I’m a lot more comfortable with web forms in php over perl. I also have an incentive to write web sites in php, because my local server doesn’t actually execute any perl scripts called by the browser. So for most of my web stuff, I use PHP.

On the other hand, shell scripts are a lot easier to write in Perl. With perl, it’s really easy to interact with the shell and execute shell commands, and I also like that you can easily pass in arguments when you call a perl script from the command line. What’s more, with the default file handlers in perl, it’s easy to take inputs and have a more interactive shell script in perl. Although both languages have the capabilities to run through the shell, I just find it a lot easier in perl.

I also prefer perl for text-processing, most because it’s a lot easier to use regular expressions in perl. They “~” syntax is very simple, and a lot more readable than the “preg” functions in php. I also find it a lot easier to work with files; the perl file-handle data type makes reading and writing to files really clean. Again, PHP has all the same functionality as perl, it’s just a lot easier in perl, which is why I prefer it for text-processing.

Along the same lines, XML parsing is a lot easier to do in perl. Actually that’s the reason why I decided to write this article. I used to be parsing my xml feeds to this site using PHP’s xml parser, but have recently switched to a perl parser that I wrote yesterday. Because perl has such extensive libraries (i.e. CPAN) there’s less need to re-invent the wheel in perl than there is in PHP. PHP is a pretty powerful language, but because perl has a huge developer community that contributes to CPAN, a lot of functionality is constantly being added to Perl, while PHP isn’t such a dynamic language. In that respect, I prefer to use perl for a lot of things.

So why use php at all? For one thing, php is a higher level language. Most of my code remains pretty clean and tidy, and I don’t have to worry about a lot of minor issues, like variable declarations, that I do have to worry about in perl. I also like how php function declarations include the input parameters, whereas in perl they’re passed in to the special @_ array. PHP is quite a bit simpler to use than perl, so I prefer it when I don’t need the additional power of perl.

Recently, I’ve taken to using perl over PHP because I’ve been using perl a lot more recently and I find that I can do everything that I want to in perl. In PHP, I still can do almost everything that I need to, but sometimes it’s a little syntactically awkward and messy, which is why I prefer perl. I’ve started to figure out which language is ideal for the task at hand. As I learn more languages (I’ve started looking at python), I’ll naturally be better equipped with solutions to a given problem, but it’ll be harder to decide which language to use. Personally, I see this as a good thing, because it’ll be a lot easier complete my tasks if I choose the correct tools.

What I Learned at Tellme: Part 1 (Related to Programming)

Sunday, August 19th, 2007

I just finished my summer internship at Tellme yesterday, and it was an amazing experience. Not only did I have a really fun time working there, but I definitely learned a lot and I also got to see the infrastructure of a moderately sized company. I was a applications developer intern, so I ended up doing a lot of coding, but everything I did there was very different from any of the coding projects I’ve worked on for fun or for school. A lot of the differences were essential to running the company, but some of them made things run a lot slower. Most of them however, were really good programming practices that I’m beginning to pick up and notice in my own coding. Here’s some of the things that I learned from my projects, the employees, and of course, my fellow interns:

1. Consistent Coding Style For the most part, code is pretty hard to read. You probably know that if you’ve ever wanted to add to something you wrote a couple of months ago, and then figured it’d be easier for you to start from scratch than to decipher your code and augment it. When working on bigger projects though, it’s a huge waste of time to start over, so you need to make your code as readable as possible. It’s even more important on team projects when other people will be reading your code and using it or adding to it. On these projects, it’s essential to have a consistent and defined coding style that’s used by all members of the team.

One of my projects at Tellme involved abstracting some preexisting code, so naturally I had to read through all of it. There was one file, that was about 2500 lines long, incredibly dense, and very unintelligible.

It took me about a two full days to go through this one file and figure out what was going on, and this was largely because the coding style sucked. Anyway to make a long story short, having a consistent coding style really adds to the productivity of a team, especially on large projects.

2. If it’s not tested, it’s broken I was amazed at the level of testing that went on at Tellme. Even things like content changes to their corporate website would be reviewed by a pair of developers and then by a QA engineer. It was a standard practice to review these minor changes on four or five different browsers, which I found to be a bit excessive, but also very thorough. It was a marked difference from the level of testing that goes into my own projects. On my projects, I used to just test until it works a couple of times, but after seeing how production-grade software is tested, I’ve started to write testing suites and actually test my projects.

3. Production Quality As I spent time at Tellme working on their projects, I realized that everything I’ve ever coded hasn’t been suitable for production at all. In fact a lot of what I do is take an idea to proof of concept, and then drop it. I never spend time on exception handling, error checking, parameter validation, documentation, or even optimization, but all of these are essential to making software actually usable. It’s interesting to see that programmers end up spending most of their time on these things, rather than the things that I end up doing. They code a lot, but very few of them are actually designing new things. A lot of the time they work on making things robust, and actually usable.

4. Bug Fixing Most of working on a development team is spent fixing bugs. It’s something I don’t really look forward to when I think about getting a job. I’d much rather be actually working on something new and innovative than fixing bugs for a career. Pretty much the first half of my internship was fixing bugs on Tellme’s website, and that was actually really boring. Granted it was important and needed to be done, but the work wasn’t very intellectually stimulating at all. I guess it goes with making production quality software, but it’s pretty monotonous to be fixing bugs all day.

Speaking of bug fixes, the other interesting thing I noticed was that Tellme would release software with bugs in it all of the time. Apparently it’s a pretty common practice for companies to do this so that they can meet their deadlines. It makes sense to me that they need to release software by a given date, so they purposely don’t address some bugs, but it’s also very odd for me, because whenever I see a bug in my software, I immediately fix it. I guess that’s one of the differences between a side project that you work on in your free time, and a project that you work on for a company.

5. Programming Overhead This is one of the things that makes me not really want to work for anything but a startup, but the developers at Tellme are actually only writing code for maybe 25% of the time they’re working. So much time is spent communicating, maintaining servers, interacting with huge code bases, and all sorts of other stuff, that they don’t actually spend that much time coding. A lot of what they do is necessary: people need to communicate on team projects, servers need to be maintained, and when you have a lot of code, you need to have some way to store it (i.e. version control systems like csv or svn). These just aren’t thing’s I really want to be doing for a career, at least not right now. I want to spend my time designing innovative technologies, and writing really cool software, rather than on maintenance. Seeing this at Tellme (which is already a pretty small company) made me really want to work for a start-up, where I wouldn’t have to waste some much time on overhead.

There definitely are more things, which is why I’ve titled this part 1, but I’m having some serious writer’s block right now, so I’ll get to the other stuff later. I also plan to write about the more general lessons (not related to programming) that I learned at Tellme. A lot of the things I didn’t like, aren’t really specific to Tellme, but more are causes of working at a company with a lot of people. These are all things that are essential to company infrastructure, but annoying to deal with as a developer. I’m sure I’ll get used to all of these things when I actually get a job, but as a student, they’re things that make me not really want to get a real job. Overall though, I really enjoyed my time at Tellme, and it was an amazing learning experience. I highly recommend getting an internship, and I’d be thrilled to get another one next summer.

File-Renaming Script

Wednesday, August 8th, 2007

Let me preface this by saying that I’m really uptight about my files (especially my media), how they’re named, and where they’re stored. All my music is organized by artist, then by album, and all files are in the form title – artist.mp3. It gets to be a bit of chore to rename any new songs that I obtain, but I’ve been doing for so many years I never thought about having something rename them for me. If you can’t tell from the title of this post, I wrote something to do just that.

A brief outline of the idea: I wanted something that would rename any songs that I downloaded to fit a certain template of my specification (It’s not a very commonly used template). I usually download albums, where all the songs are in the same original template, so rather than rename all the files by hand, I wanted something that would rename all the files with just one command. What I have right now, is a perl script that you call from the directory that holds all the song names, and it asks you some questions about how the files are currently named and how you want them to be named, then it changes all the file names for you. The interface right now is pretty basic, but I just wanted to get the project functional at first. I’ll probably be making a nicer web interface for this in the next couple of days.

I started coding something to do this yesterday, and I’ve got a decently working template. Unfortunately it relies on a Unix-based OS (it’s running on my mac) , so it won’t be compatible with Windows. It takes advantage of perl’s very straightforward interaction with the shell and uses the shell’s mv command to actually rename the files. The harder part was actually coming up with the new name for the file based on some pretty generic user inputs. The script currently asks for: the ordering of the naming elements in the file, how those elements are delimited, how spaces are delimited, and the file extension. Then it asks how you want the elements to be ordered (and which elements you want in the name) and how you want them to be delimited.

From all the input data, I use shell’s ‘ls’ command to get all the file names, then I split them by the element delimiter (one of the arguments). A simple regex converts all the space delimiters to regular spaces, and then I map the ordering of the original file to the desired ordering of the new file. From there, I just join this with the desired delimiter, and add the file extension back on. Then I can just mv from the old file name to the newly build file name and that changes the filename. It’s that simple.

Of course, it won’t end up being that simple. I need to add in a lot of error checking and input validation even before I start improving the interface. I also want it so that you just need to pass in 2 arguments for initial format and desired format and the script figures the rest of it out. Unfortunately I’m not exactly sure how I’d go about doing this. In terms of error checking, I don’t want to rename any files if one of the files fails, so I have to figure out how I can go about this (I haven’t really thought about it and I don’t know off the top of my head). I also want to make the outputs look a lot nice, and as usual I need to document the code a lot better. Then I can get started on that web interface.

This is a pretty quick project that I don’t intend to spend much more time on, but it’s gotten me more familiar with the shell, with using perl for scripting (rather than for web stuff), and also with regex’s and string manipulation. Aside from me learning more, the real motivation is that it’s a script I would use a lot, as I do download a lot of music and I want to have it all named properly. Assuming that I make the interface to the script a lot nicer, it’s a script that I and hopefully some other people would actually use.

Web Design: 7 Lessons I Learned from WeNote

Saturday, August 4th, 2007

Since January, I’ve been working on and off on website/application called WeNote that I hope to release for the last time in December of this year. Now that I’m pretty much done with the structure of this blog, I’ll be talking about that project a lot mere in here. Anyway, This was my first time making a production quality application so naturally I made a lot of mistakes. I’ve learned a lot from these mistakes and now feel a lot more capable of making something that users will actually go to and be interested in. I’ve already released two versions of the website, both of which have been met with brutal critiques from my friends and very little site traffic, so I shut the site down and am rebuilding it from scratch. Although there were also a lot of back-end mistakes that I may, here are 7 of the Web Design-related mistakes that most web designers will have to learn at some point.

Version one of WeNote was absolutely terrible. The layouts of pages were not thought out at all and inconsistent across pages, the color scheme was terrible, content on pages was poorly written, and the site just didn’t look appealing. In my defense, I didn’t spend much time working on the user experience because I was just learning PHP. I focused a lot on the back-end, which even still didn’t turn out that well. Version two was a significant improvement from version one in terms of functionality, but visually it was still pretty bad. When I released this one, I told all of my friends to use it, and they all made accounts, but I was getting no traffic. I attribute most of this failure to lack of a good user experience, because the back-end was pretty functional. With that introduction, here’s the list.

1. User Experience is really important: This one seems obvious, but it still needs to be stressed. Without an amazing user experience, users will not come to your site, regardless of how functional it is. This is a big problem for me, because I’m less interested in layout design and graphic design and more interested in the back-end code for the site. I think a lot of hackers prefer to work on interesting code (i.e. back-end) than on the stylistic needs of the site, but both aspects are essential to any good website. Every time I showed off WeNote to one my friends, he always said something like, “The UI sucks.” I really should have listened to him and spent some time on the UI, but it was really easy for me to procrastinate. As a result, for the third release of WeNote, I’m actually building an HTML mock-up of the site before I even start on the back-end, so that the site has a decent user experience.

2. Attention to Detail: This one I actually learned from building a different website, but it’s incredibly important. When styling a website, every single detail matters. My friend and I co-operated on a website called uclinked a couple of months ago. We didn’t spend much time on the site (it was a pretty simple idea), but everyone who I showed the site to noticed a smudge in our banner image. It really took away from the sight because that was the first thing everyone noticed, and it immediately gave them the impression that the site was poorly made. From this experience, I learned that every single detail matters. On my mock-ups for Version 3 of WeNote, I’ve spent a lot of time making sure that colors blend will, that images and borders line up and are consistent, and that every element is exactly where I want it and looks exactly how I want it to. It’s time consuming and tedious, but it’s essential to building a successful site.

3. Navigation: This is actually related to functionality of the site, but it contributes a lot to the user experience. When a user visits your site, you want them to be able to find what they want easily, so you need to have a very clear, simple navigation system. Navigation areas aren’t the places to show off your creativity. These guys have to be easy to find and easy to use, otherwise people will get frustrated with your site, and they’ll never come back. Don’t try new navigation techniques on your sites. People expect to find navigation elements in certain places, usually right below the banner, or on the left side of the page, and if they don’t find them where they expect, you have a problem. You can be creative with your buttons and any animation you want in your nav bars, but make sure people can find them and use them very easily.

4. CSS can’t do it all: It’s time to face the facts, you’re going to have to use some images to spice up your site. As amazing as CSS is, it’s not going to make you’re site look good on it’s own. You’ll need some Photoshop-ped or GIMP-ed images around your pages to complete the user interface. The problem here is that it takes a long time to learn how to use good image processing software, and a lot of hackers would rather be coding that messing around with GIMP. Unfortunately, it’s something that has to be done, and it really makes your site look better. With plain CSS, you can make your site functional and usable, but it definitely won’t be very cool looking.

5. Textual Content: If you have a lot of text on your web pages, it’s important that every word is carefully selected. You don’t want too much text (unless your writing articles), but your text must convey your message to the users. For WeNote, I had text on my home page that explained what WeNote was and why someone would want to use it, but it was pretty wordy, and not even very explicit. It might be a good idea to get someone who writes really well to come up with blurbs for you if you’re not very good at it (like me). It’s also important to place text in places that people expect it, and also make sure that the font is well sized and easily readable.

6. Design for your audience: When styling a page, take some time to think about what the page is supposed to be, and what your users will expect. For example, there’s a pretty standard template for online newspapers with Old-English-y font for the header, a navigation bar for the sections of the paper, and recent stories on the front page. You can also think about what your users like, and theme some of your images, fonts, colors, etc. around what they like. If your page is more artsy, you may want your header, and buttons to reflect that blurring into the other elements. Maybe you could use some really nice nature picture as part of your banner. At the same time, don’t be too cheesy or too exclusive. You don’t want to intimidate possible users with your design, so keep your theme a little general. This is something I haven’t exactly figured out yet, but I’m working on it.

7. Design for all browsers: Internet Explorer, Mozilla Firefox, Apple Safari all behave slightly differently when it comes to CSS and images. It’s terrible that things aren’t standardized, but as a website designer, you need to be sure that regardless of browser, your users have a great experience on your page. Test your site on all browser to make sure that it works properly and that it looks good. Have separate style sheets to load based on what browser is being used (Javascript can detect browser and load the appropriate style sheet). One of my friends made me a banner for WeNote in photoshop, and it looks great in Firefox on my Macbook Pro, but in Safari, the colors are purple, rather than blue, and the banner doesn’t fit with the rest of the page. As a result, I’m going to have to change my banner image, even though what my friend made was really cool. Compatibility with different browsers is essentially to attracting a wide user base.

So that’s some of the things I learned in the last 8 months of web design. I’m still consider myself a novice at it, so I’ll be learning a lot in the future and I’ll write about some of that too. I hope this is useful to anyone interested in designing a website.