Experiment with Delicious and Python

Once in a while, I look at my Delicious bookmarks to get an idea of what I have been upto in recent time. The ‘Current Interests‘ tool was written with exactly that in mind.

I began to wonder if my bookmarks can give me an idea of trends in technology and my interest in them. So I quickly wrote a Python script to give me the top tags in each year and here are the results.

Continue reading Experiment with Delicious and Python

Have I stopped blogging?

“Have you stopped blogging?”, people ask me. I don't have a definite answer. True, it has been a long time since I blogged. Although I have been quite active in my online activity as is apparent here I somehow couldn't blog about anything for the last 3 months!

During the course of the 3 months since my last blog entry, I came across quite a lot of things, which I found interesting and would have normally blogged about. However, I got into this vicious circle where I thought that it is not worth blogging about it, after such a long gap and that added to the time, and now it became more and more difficult to blog about something.

Ok, so what's keeping me interested?

  • Lotus connections, specially the idea of Activities. This has been an eye-opener regarding the way I organize information in my system/s.
  • The emergence of a new web pattern of “server pushing information” to the browser, commonly referred to as Comet.
  • ProjectZero, which is IBM's answer to rapid and 'Zero' obstacle development of Web oriented applications.
  • TiddlyWiki – I wonder why I did not come across this before! It is absolutely fabulous and the idea of a single page self-contained wiki is just too good to believe and sometimes scary. 🙂

Also, of late, I started getting interested in analysis of web activities.

A day of effort, and some hacking of the Firefox history and a tool called RapidMiner helped me get some insights into my browsing habits, which I had never thought about before. I noticed a pattern in the way I come across new topics. Also I learnt about the way I get to certain frequently accessed sites and what I can do to get to certain information quicker than ever before. Finally, I realized that the new del.icio.us Firefox extension has helped me improve my browsing habits and made my bookmarks more valuable.

It is really interesting to see what other 'inferences' are possible with the data that is already available! Considering the fact that data today is available in a wide-variety of open-formats and the data also being openly available, it is possible to fetch all this data, feed it to some analyzer and get some interesting insights and use that to make your web journey more fruitful. The Flickr Cluster experiments are just tip of the iceberg!

Some projects/tools related to this are APML, ManyEyes.

Ok, I have written about a wide variety of topics that I am currently finding interesting.

So, finally, back to the question I started off with. Have I stopped blogging? The answer definitely has got to be a 'No'!

Public bookmarks, Private tags

Has anyone come across a bookmarking site that allows us to make bookmarks public, but attach private tags to them?

Here's the use-case:
Tags in my opinion are small bits of information that we attach to the entity under consideration. Now the entity itself may be public, but not the information that I attach to it.

To be more specific, let me tell you where I felt the need for this. When chatting with , I came across several books in Amazon. I started bookmarking these in delicious and attaching the tag bibliophile to them.

Then came a thought. How about storing information like whether I have read this book or not, what the number of this book is (I number all my books) and other information which might not be worthwhile to make public or for some reason does not seem appropriate.

In terms of implementation, I guess it is quite simple. You need to now track each tag and see if it is private or not (an extra field in the database) and then display them accordingly in the UI.

Flock – a Web 2.0 browser?

I have been trying Flock for about a month now. And I am stuck to it.

What I liked:

  1. Blogging support – I am making this blog entry from within Flock. Also there is Technorati publishing support and all that.
  2. Flickr support – you get to know if someone adds new photos and you get to see them in a neat view.
  3. Delicious support – one of my favorite features here. There is a neat sync between your local bookmarks and your delicious bookmarks. You just click on the 'star' next to the address bar and you get a popup where you indicate whether the bookmark should be posted to del.icio.us.
  4. An improved search bar – there is live Yahoo search, local search history and the usual search engine support.
  5. Performance – somehow seems better than Firefox. Dunno why? 😐 (However, see hate point 2)
  6. Web snippets – I don't use this much, but there is a snippets bar, where you can copy snippets of your interest.
  7. News – This is where you get to manage your RSS feeds. But I don't use this either, not better than Blogbridge. 🙂

What I hated:

  1. I sometimes feel they should have gone with making an extension over Firefox rather than a separate browser. Some extensions might not work in Flock. Developers have to adhere to Flock separately. This is not good.
  2. Sometimes, there is some backend process which runs for a long time and results in a 'Unresponsive script' warning. This stops the working of the browser for a while.

Overall, I strongly recommend Flock for people who use the utilities mentioned and were craving for integration of these.

technorati tags:, , , , , , , ,

Blogged with Flock

Semantic Crawler – an update

This is in continuation of my blog entry on Semantic Grabbers. I did some experiments after consultation with . Thanks for the inputs.

My intention was to get a set of related words given a single word as input. I wanted to make use of the <rdf:Bag> tag that Delicious provides.

The idea that I had in mind was to start off by seeing the number of occurrences of each tag in the <rdf:Bag> of all links and then to use this to decide which tag to analyze next. The more frequent the occurrence of a tag, the more likely it is to be chosen next.

For example, suppose I see that RDF occurs most frequently in the links, then I select that as my next tag for analysis. I keep updating this list with more tags and their frequency as I crawl through the tags.

Here's the problem I faced: There are chances of the use of very generic words like tech, development, tutorial etc that are likely to be used in more links than others. So the crawler was mislead. The selected tag becomes more and more irrelevant as the crawling proceeds.

There are some solutions that I have in mind.
1. Provide weight-age in comparison with the root-word (i.e. the given word).
2. Do a study of 'all' the tags for the entire list possibly including the description as well and then see the relationships. (This emerged after my discussion with .
3. Provide more than one word as input and use these words to determine the set of related words.

Determining relationships between words is not quite easy in folksonomies because of the lack of contextual information. However it surely is a rich set of information that needs to be exploited.

The result will be available here for a few days.

A semantic grabber

So what's a semantic grabber? If you do a Google search, you get, umm, '0' results (as on 08-March-2006).

So this definitely is not the word used in the wild. So what's it then?

Well, the story began like this. I started off experimenting the evolving pub-sub model wherein you give a list of keywords and you get the latest feeds for it based on the keywords specified. I was trying to come up with an optimum filter that would give me really crisp information. This is a tough job especially in the as yet semantically immature WWW.

My first requirement was to get a good list of keywords. For example, I would like to know all keywords related to semantic-web. I know words like RDF, OWL, RDQL etc are related to semantic-web. But I want a bigger list. (Does this remind you of Google sets?)

Where can I get a list of keywords? I turned to Delicious. If you are a Web 2.0 geek, you would definitely be aware of the rdf:Bag tag, where you get the list of all tags for a particular link.

For example, an rss page for the tag 'rss' has a link which has the following tags:

<taxo:topics>
  <rdf:Bag>
    <rdf:li resource=”http://del.icio.us/tag/rss”/>
    <rdf:li resource=”http://del.icio.us/tag/atom”/>
    <rdf:li resource=”http://del.icio.us/tag/validator”/>
  </rdf:Bag>
</taxo:topics>

So you know that rss, atom and validator are some 'related' keywords. Of course, there is no context here, so there could be possibilities of people tagging http://www.google.com/ as 'irc'. (This is true. I have seen people tag Google as IRC). But if you consider a weightage for tag relationships, then soon you can come up with a model where you get to see tag clusters.

Ok, now back to the topic on Semantic grabbers. The idea came to my mind when I thought of writing a crawler that crawls on Delicious RSS feeds and tries to find out tag clusters. So this crawler is not interested in links, but is actually interested in data that resides in the links. That clearly distinguishes it from a normal HTTP grabber, which blindly follows links and grabs pages.

Soon, with the evolution of RDF, I guess there will be more such crawlers on the web (what are agents?) and people are already talking about how we can crawl such a web. This is my first attempt at it.

So ditch Google sets (if at all you have tried it) and use a 'semantic grabber'. 😉

Web 2.0 service aggregation tools

Recently I noticed a new trend in the Web 2.0 aggregation tools. These are tools which combine other web 2.0 services in one place and provide a way to host a single page containing all your services. The most common services provided by these aggregating tools are combining delicious, flickr, blogspot and rss feeds in one place.

Examples of such tools are:
Suprglu
Squidoo
Peoplefeeds

Here are my pages:
My Suprglu page
My Semantic Web page @ Squidoo
My Peoplefeeds page
(I had signed up for squidoo long back and got a chance to check out their public beta offering.)

I found an inherent problem in these services.

What I tried to do is to set up a page which contains feeds of my interest based on various other tag search results. In particular, I wanted it to aggregate feeds from delicious, Technorati, Google blog search, Yahoo news search, Feedster, Icerocket etc. I wanted search results for:
(semanticweb OR semantic-web OR semweb OR sw OR semantic_web) AND (owl OR rdf OR rdfs OR ontology OR ontologies OR taxonomy OR rdql OR SPARQL OR w3c OR metadata OR semantic OR semantics OR knowledge)

These are the problems I faced:
* Most tag search engines are not intelligent enough to provide RSS feeds for such searches.
* The page is not intelligent enough to remove duplicate links. For example, suppose I have a page bookmarked in delicious having the tags as semantic-web and rdf, then that particular link shows up in both the tag searches. So if I combine the tag search results, the page shows up twice.
* Most of the service providers do not have an option to turn off non-English pages. So many Japanese and French (or Latin?!) pages turn up in the results.
* I want a hierarchy. I should be able to create a group “Semantic web” which contains feed results for the search query given above and another group, say “Web 2.0” which has a similar query. I should be able to relate the results of “Semantic web” group with those of “Web 2.0”.
* The ability to view feeds using different views – “Technical” and “Non-technical” or “Office related” or “Non office related”.
* Finally, there should be a theme. I would like to read my “Technical feeds” once a day and “Comics” once a week. How do I separate them?

I am still looking for a solution.