A few weeks ago Louise Ferguson wrote about how her City Of Bits feed was being categorised by bloglines users.
This got me thinking. It wouldn't be too hard to write a scraper that took all of the publicly available subscription and folder information from bloglines, do some analysis on it and produce some reports on how people were categorising feeds on bloglines. This would fit in well with my other data liberation projects.
Four weeks later, after a few late nights I have got the first results from my analysis, and I have made a very important discovery.
Bloglines users appear to like knitting!
After producing a list of the top 100 folder names subscribed to on bloglines I found the usual suspects at the top "Blogs, news, tech, people, politics" etc. etc. But then at number 37, I found a folder called "Knitting" that had been used for 2,085 feeds.
This is only the first delivery of bloglines stats and I will be producing a lot more stuff in the future.
The data I have scraped covers the period from 1st July 2003 to 8th April 2004.
I have got data on 32,415 public subscribers and their 1,059,140 public subscriptions.
I have looked at the data on subscriber activity as well.
The chart below shows how many days subscribers are active for. I am defining activity as being the process of subscribing to feeds.
6,880 (21%) of all public subscribers were only active for one day. This means that these people subscribed to one or more feeds on a specific day and then never subscribed to any more feeds again. Of course they may have been reading those feeds since, but I can't get any data on reading activity.
324,319 (30%) of public subscriptions not in a folder.
734,821 (70%) of public subscriptions are in a folder.
Users have created 29,279 differently named folders.
This chart shows the relationship between number of subscriptions and number of folders.
And of course, don't forget to look at the list of the top 100 folders. (I have still got to do something with the other 29,179 folders).
I will be producing charts that show what subscriptions are in a particular folder, so the chart for the "Knitting" folder will look a bit like this:
There will also be reports for specific feeds showing which folders they appear in:
Here is one for Louise Ferguson (City Of Bits):
(The top entry is for people who have subscribed to it without categorising it. The number is the number of times it has been categorised in that folder.)
Tom Smith (OTHER Blog):
John Rhodes (Webword):
Lou Rosenfeld (Bloug):
Peter Merholz (Peterme):
Edward Tufte (Ask ET):
Nick Finck (Digital Web):
And I may even have some time to generate some unfashionable tag clouds as well.
And just in case anyone is still reading, I have come up with an alternative to "Folksonomy", as I really don't like the term very much. After a few weeks thought the alternative I have come up with is "Usersaurus". It works for me on many levels.
(Oh, and I also need to sort out the 'duplicated' feed issue of more than one bloglines ID for one site feed).
See Also: del.icio.us references