Posts Tagged ‘ideas’

WordPress Automated Tagging

Sunday, November 4th, 2007

First of all, I’m going to be switching this blog into a wordpress blog very soon. It’s just too much maintenance for me and it takes me unnecessary time to add features that I want. With wordpress everything will look a lot nicer and work a lot better and it’ll just make things easier for me. I’ve gotten to the point where I’d rather focus on writing than doing site maintenance and debugging.

The other main reason for doing this is that I’d like to make an add-on for wordpress that tries to automatically generate reasonable tags for the writer’s entry. I’m switching over so that I can test this on wordpress. I would do it on my own framework except I haven’t really finished my tagging system and I don’t plan to so I have nowhere to test. So it’ll be a lot easier to just build off of a more robust system like wordpress and not have to worry about all of the details that they’ve taken care of for me, so that I can focus on the project at hand.

I came up with the idea for this project after talking to a company called Metaweb at an internship fair on campus. They’ve built an online query-able database of the world’s information (or some of the world’s information) so that developers can build application on top of it and take advantage of all of the nicely structure data. The database is Freebase. It’s fun to play around with, but I see it as pretty useful for building applications that require information for external sources (like the automated tagger).

So I was playing around with Freebase, and I thought that it would be really cool if I could automate tag generation on my blog (mostly because I usually forget to do it anyway). With this structure data I thought that maybe I can take advantage of the tagging that takes place on Freebase, and all I’d have to do is find out which tags to take from there. I could just look for keywords in the post, send queries to Freebase, look at the significant tags, and suggest some of the tags for the post. Sounds pretty straightforward.

The general plan of attack is: keep a postings file (just a word -> entry relationship ) of all the significant words in all the posts of this user. Then when a new post is published, look at the significant words and find the ones that seem like good candidates for tags (via some text search algorithm like TF-IDF). Then for each of these, look at the Freebase entry for them and pull out the tags on Freebase for these words. Suggest these tags (or some subset of these tags) as tags for the article. Then allow the user to choose which of the suggested tags he/she likes.

Obviously this is pretty high level plan of attack and I’ve been told that it’s not going to be that easy, but I certainly think it would be cool. Also, it occurred to me that it didn’t have to be Freebase that I’m querying. Maybe I could just look at previous posts from this blogger that contain a similar set of candidate words. There are definitely variations of this that may end up being better solutions, but I’ll play around with all of that when I actually start working on it.

I see this as useful because it not only allows my tags to be thought up for me (which is very convenient), but it could also keep my tags consistent with each other (I won’t have tags for “photos” and “photography” for example). That would drastically improve the readability of a blog as there would be less tags and stories would be categorized better. In these respects I think it’s a pretty worthwhile project to take on.

Also I think it’s more challenging than the stuff that I usually do. It deals with text search algorithms, efficient and appropriate data storage, and a fair amount of artificial intelligence (as in “is this word significant based on what this user has previously written about and what’s he’s writing about in this article?”). Fortunately, I’m currently taking a databases course where I’ve already learned about text search and of course data storage. I’m also planning to take an AI course next semester where I may learn some concepts that I can put to use in this project. Again, I haven’t really spent too much time on the project yet, apart from downloading wordpress and going through some of the code, and I may never get around to it if something more important comes up, but I think it’s a pretty cool project that I would enjoy working on. Hopefully I’ll find some time to get it done sometime soon.

What project ideas have revenue-generating potential?

Monday, September 3rd, 2007

I’m very fortunate to be living in Silicon Valley. Both my parents work in highly technical areas so I can bounce ideas and implementations off of them. They give me really good feedback about anything, both technical and not. What’s more, many of their friends are equally technical and entrepreneurial, so I’m in a great place to bloom as an entrepreneur. Recently I met one of my friend’s uncles, and I started talking to him about entrepreneurship and all that good stuff. Turns out he’s a venture capitalist, so he gave me a lot of advice, and input from the point of view of a VC. I’ve only known him for about a month, but I think I’ve already learned quite a lot from him.

I had a chat with him earlier today; we talked first about school, work, and music, but I asked him for some feedback on an idea that my friend and I just started working on. I won’t go into detail on the idea, but both my friend and I were pretty excited to work on it; it seemed interesting and also pretty useful to the general public. Anyway, I explained the idea, and needless to say, he didn’t particularly like it. Yet, he didn’t just rag on it as most people do, he gave me really good reasons why he didn’t like it that I can apply to other ideas.

The first point he made was that any idea that you want to profit from needs to have some “hard problem” to it. Sounds pretty obvious, but in retrospect, most of my ideas have nothing really hard to them. They’re just a lot of mindless coding, with some tricky areas. His reasoning was that if there isn’t anything challenging to what you do, competitors will just imitate you, and no one wants more competition. You need something that really separates your product from everyone else’s, something that’s not easily reproduced. If you’re product solves a problem better than anyone else can, then you’ll probably be successful. Our idea, unfortunately isn’t that hard to implement.

I agree with this statement from a business standpoint, but not having a challenging problem doesn’t mean that you shouldn’t implement your idea. It’s always great to contribute to open source projects and build free services, just don’t expect to generate revenue from them. I still want to build this project that my friend and I are working on, because I think it’s a cool idea. Even if it’s not challenging, it’s still something fun to work on and it’ll be a good learning experience. I’m not at the stage in my life when I need to be making a lot of money (it’s always good of course), so I’ll still work on my ideas that I think are cool, regardless of a challenging they are. On the other hand, things that aren’t challenging aren’t nearly as fun, and stimulating as things that are. I think that should be the real deterrent from working on certain ideas: not that others can replicate, but that they aren’t as fun.

The second point he made was about building off of someone else’s (lets call them company A) infrastructure. If you’re doing this, you better add something really unique to it that makes it significantly challenging for the owner of the infrastructure to imitate. If you’re product is trivial here, as soon as you become remotely popular, company A will devote some resources to imitating you, and they’ll do things a lot better than you ever can. While you only have access to their API’s, they can configure everything about the base infrastructure to optimize for what you’re doing and probably beat you out. This goes along with the challenging problem bit, but it becomes even more important when you’re using someone else’s work.

In thinking about this, I’ve become even more turned off to using external infrastructures. With a lot of my projects, I like to do as much as possible by myself. Rather than rely on other people’s code and API’s, I prefer to build things myself, customizing them to suit my needs. I like this for two reasons: I build a more optimized application and I learn a lot more by designing everything. The lesson here is: don’t expect to make a lot of money off of mash-ups. Don’t get me wrong, many mash-ups are really cool, but they’re not going to generate a lot of revenue.

We also talked about generating revenue from an idea. He affirmed my belief that a lot of money can be made with advertisements, but advised me not to worry so much about income, and focus on building something good that attracts visitors or users. We started talking about this site, (it’s my only project with some hope of generating income right now) and he encouraged me to just keep writing and focus on publicizing and building subscriptions. What I took from this is that you shouldn’t worry about generating revenue at the onset, but instead concentrate on building something really good.

Since talking to him, I’ve been reassessing my current projects and deciding whether I should keep working on them or not. Some of them I’ll definitely scrap; they don’t sound like much fun and I don’t think they’ll make me any money. Others, I think I’ll continue. At this point, none of my ideas really have anything “challenging” about them, but I’m still a pretty young student with a lot to learn. I still have plenty of time.