Word cloud for wordpress

Don’t mind the title. I set it so people can find this post through a search. Everyone who’s not interested in how the cloud has been generated can skip the following paragraph.

I spent a couple of hours generating a word cloud for wordpress. I first exported the blog contents to an xml file. I then tried http://www.wordie.net but the xml tags were all over the place. So I wrote an XSL stylesheet to extract the xml data to a less verbose xml file that contains only posts and comments. I then used IBM Word Cloud to generate the word cloud. I fiddled with the configuration. I set number of words to 100 and I had to create my own stop words. My stop words included English stop words, many other HTML specific tags, font, alignment and related information, commonly occurring links, and common words we use in the blog (that are not relevant for the word cloud eg. “going”). If you want the list of stop words and the stylesheet then just ask.

Click on the image to see a bigger view. I’m quiet pleased that we spend most of our time here thinking, talking about Ethiopia, hoping, wishing, reminiscing, and referring to each other. There is a strong word at the left bottom corner, hate. There’s also “ass” and “sex”. I wonder where these came from 😀


1 Comment (+add yours?)

  1. tibebe
    Apr 17, 2010 @ 15:52:48

    double damn is all i got to say to the text before the image.

    hell is included but i can’t see any heaven.

    and about the ass, sex, girls and what not, guilty as charged!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

wordpress statistics
%d bloggers like this: