Posts Tagged ‘iTunes’

voiTunes Released!

Friday, November 2nd, 2007

I’ve been working on voiTunes for a couple of months on and off and I’m finally done! I’m proud to say that it has impressed quite a few of my friends and I’m ready to release to the rest of the world (well, just the mac users). You can download it here. It will only work on Mac OS X, and possible Leopard (I have yet to test it on Leopard). If you have Windows, I’m sorry but you’ll have to wait until I port it (if I ever get around to it).

A few notes: I really enjoyed working on the widget, it turned out to be not too difficult but I wasn’t able to add all the functionality that I wanted mostly because I wanted to have something concretely released this week. I have almost working code that adds a couple more features but it’ll take me several more hours to integrate that nicely into the working widget. As a result, functionality is limited, but it still works and works decently well. Recognition isn’t that great, but that isn’t exactly my fault as I’m using a recognition engine for Carnegie Mellon.

Please post comments, criticisms (be harsh!), or any other feedback.

Voice Recognition Media Library

Thursday, September 20th, 2007

It’s time to finally let the cat out of the bag. I announced about a month ago that I was working on a really cool project and now I’m finally going to talk about it. From the title, it’s pretty obvious that the project involves voice recognition and media libraries. Well, here we go!

I worked at a company called Tellme this summer, and all of their technology revolves around voice recognition. Anyway, after working on voice applications for them, I thought it’d be really cool to extend iTunes to take voice commands. I talked to some of my fellow interns about third party voice recognition software that I could use, and got started.

First, let me talk about some of the software I’m using. My voice recognition tool is called sphinx. It’s an open source Java package provided by Carnegie Mellon that’s really easy to extend. Sphinx was a bit difficult to install, but once that’s done, they provide a lot of demos that you can just look at and add on to. I just took their “hello world” demo, and added a bunch of stuff to it to get my media library recognition to work. If you’re at all interested (and because you’re still reading, I take it that you are) I strongly recommend checking out their software, playing with it and extending it to build your own applications.

On the other side of the application, I’m using a perl module to control iTunes. The module just has functions that output some applescript, so unfortunately this project can only be installed on top of OS X right now, but it’s a really simple, easy to use interface to applescript and therefore iTunes. This is what I really love about perl; there are so many libraries for perl that you can do pretty much whatever you want, just with installing a couple of modules onto your computer. I’m also using perl for XML parsing and other file handling, while I’m using Java for the user interface and interaction to sphinx.

Ok, so what exactly does my prototype do? Essentially, I have a java program that starts up the sphinx recognizer, waits for the user to say something like “itunes play”, “itunes pause”, “itunes next”, or “itunes select”, and then it processes the command, and sends a command to iTunes. “Play”, “pause”, and “next” are self explanatory, but “select” is a little more intricate. When the users says “itunes select,” the program prompts you to say an artist name, and then a track name by that artist and then it searches your iTunes library and plays that specific track. That’s the program from a high level.

Looking deeper down, the Java code is just a loop that waits for recognition, but I use different grammars each time. The main procedure just has a grammar that accepts the “iTunes” commands. Each command outputs a different string, which I send as the argument to a perl script. The perl script then uses the applescript library to send a command to iTunes.

Prior to loading the recognizer, I use perl’s XML libraries to parse “Itunes Media Libary.xml,” the file that stores all of iTunes’ track information. Then when the user says, “itunes select,” I dynamically generate a grammar composed of a list of all the artists in your music library. Then when you say an artist name, I dynamically generate a grammar of all the song titles. Once I have the artist name and the song name, I pass the information into the perl script, which again sends the commands to iTunes. On the whole, the project doesn’t seem too complicated, but there’s quite a bit of code involved, and it’s definitely more challenging than any of the other projects I’ve worked on.

My current prototype is fully operating as I just described, but it’s certainly not complete. Before I actually publicize and maybe distribute my software, I want to fully integrate it into iTunes as a plug-in. I also need to improve recognition, because my ultimate goal is to just leave the plug-in on, but it should know when it should be listening and when it shouldn’t. Finally, I also want to let the system learn artist and track names, because as of now, it only understands legitimate English words and a lot or artists names aren’t English words. I want to build a feature that allows you to train the system when you first run it, so that it learns this names and recognizes them for later runs.

Of course all of my aspiration and additions are a lot more challenging than just getting my prototype running, but I really think this project is cool and that it’s worthwhile for me to spend my time on. It would be amazing to show off to my friends, peers, and, of course, recruiters. What’s more important, though, is that I’ll learn a lot about building a stable, user-friendly, product that integrates a lot of different technologies. I hope to make this a project that I take from start to finish, spending time on all aspects of product development. It’s a challenge, but it’s been fun so far, and it’ll be so worth it when it’s complete.