Public bookmarks, Private tags

Has anyone come across a bookmarking site that allows us to make bookmarks public, but attach private tags to them?

Here's the use-case:
Tags in my opinion are small bits of information that we attach to the entity under consideration. Now the entity itself may be public, but not the information that I attach to it.

To be more specific, let me tell you where I felt the need for this. When chatting with , I came across several books in Amazon. I started bookmarking these in delicious and attaching the tag bibliophile to them.

Then came a thought. How about storing information like whether I have read this book or not, what the number of this book is (I number all my books) and other information which might not be worthwhile to make public or for some reason does not seem appropriate.

In terms of implementation, I guess it is quite simple. You need to now track each tag and see if it is private or not (an extra field in the database) and then display them accordingly in the UI.

Thoughts about Google Notebook, Google Co-op and people tagging in the enterprise

Time advances, so does technology. So although a lot of ideas are hovering in my mind and I have been updating myself with the happenings in the software world, I somehow could not find time to compose a blog entry and share my views. Work has kept me busy like never before.

So let me try and consolidate everything into one entry here:

First and foremost, Google. Whew! These guys never stop (Yahoo, wake up!).

Google released the Google Notebook some time back. I have been trying this for about a week now and it is quite satisfactory.

Let me start with the pros and then go to the cons.

The tool is a quickie. Clip it and click on Add Note and you are done. It cannot be simpler (unless they provide some keyboard shortcut like Ctrl-Shift-C to copy and paste in Google Notebook). You can add your own notes or edit existing ones. You can clip images too! The search is there as always (almost taken for granted when it is Google 🙂 ).

It also allows us to make private notes or make notebooks public.

And now to the cons…

The first is a security issue. As some people are mentioning, the ease of use of this tool may tempt users to clip private data from intranets and store it in Google's servers. And Google has the right to index it.

There is absolutely no meta-data attachment. No tagging! :O (How can people forget tagging in the Web 2.0 world?!)

It is not easy to relate articles. The best way to do this is to create a new section and put everything under it, but this will tire you soon.

There is no export feature. This is a big threat. You start clipping things and you are tied to Google possibly forever!

Ok, we now proceed to the next application Google released -> Google Co-op.

Google Co-op allows users to customize the search results that Google generates (does that sound like Eurekster Swicki?).

The interesting feature here is the extensibility that Google provides in specifying topics of interest, the keywords, links etc.

And what does Google get in return? Lots of meta-information. How nice it would be, if people give you a list of words that fall in a particular category? Google will definitely relish this!

With the hopes that Google does not turn bad, let us enjoy the cool features that they provide and the competition that they face. Competition enables innovation and that is good news for end users.

Some other things that I heard recently: People tagging in the enterprise. This reminds me of a discussion that I had with my mentor some time back.

Let us suppose that I have a list of contacts in my Sametime list. How will I categorize these people? By their teams? Well, may be so.

But someday, I would want to send a mail to all people who are active in some particular community. Or I would want to know the set of people who I have contacted for a particular purpose, which is not necessarily related to their present team. Now is it possible for me to get this view of the users?

People tagging is all about this. Here is a paper from IBM that talks about people tagging in the enterprise.

The concept is simple, but extremely powerful. The idea is to tag people, the way you tag links in a bookmarking tool. Once you do that, you can find all people who belong to a particular tag.

Tagging is central to almost all resources today and will soon form part of the filesystem. (Heard of semantic filesystems?). The line between the functions/services provided by the operating system and the services provided in the internet will diminish and will result in the emergence of the first generation of Web O/Ses. Soon, Web O/Ses will be THE O/Ses.

A departing thought. Today I saw an alert in my mailbox that talked about the next generation web. Wonder where this article is from? Deccan Herald! I don't know how many of them noticed it, but this is news that the semantic web is catching on. The article talked about how Google threw unexpected results for (mostly technical) words that had more than one meaning and how semantic web can help solve this.

Whoa. Enough for today. 🙂

Semantic Crawler – an update

This is in continuation of my blog entry on Semantic Grabbers. I did some experiments after consultation with . Thanks for the inputs.

My intention was to get a set of related words given a single word as input. I wanted to make use of the <rdf:Bag> tag that Delicious provides.

The idea that I had in mind was to start off by seeing the number of occurrences of each tag in the <rdf:Bag> of all links and then to use this to decide which tag to analyze next. The more frequent the occurrence of a tag, the more likely it is to be chosen next.

For example, suppose I see that RDF occurs most frequently in the links, then I select that as my next tag for analysis. I keep updating this list with more tags and their frequency as I crawl through the tags.

Here's the problem I faced: There are chances of the use of very generic words like tech, development, tutorial etc that are likely to be used in more links than others. So the crawler was mislead. The selected tag becomes more and more irrelevant as the crawling proceeds.

There are some solutions that I have in mind.
1. Provide weight-age in comparison with the root-word (i.e. the given word).
2. Do a study of 'all' the tags for the entire list possibly including the description as well and then see the relationships. (This emerged after my discussion with .
3. Provide more than one word as input and use these words to determine the set of related words.

Determining relationships between words is not quite easy in folksonomies because of the lack of contextual information. However it surely is a rich set of information that needs to be exploited.

The result will be available here for a few days.

A semantic grabber

So what's a semantic grabber? If you do a Google search, you get, umm, '0' results (as on 08-March-2006).

So this definitely is not the word used in the wild. So what's it then?

Well, the story began like this. I started off experimenting the evolving pub-sub model wherein you give a list of keywords and you get the latest feeds for it based on the keywords specified. I was trying to come up with an optimum filter that would give me really crisp information. This is a tough job especially in the as yet semantically immature WWW.

My first requirement was to get a good list of keywords. For example, I would like to know all keywords related to semantic-web. I know words like RDF, OWL, RDQL etc are related to semantic-web. But I want a bigger list. (Does this remind you of Google sets?)

Where can I get a list of keywords? I turned to Delicious. If you are a Web 2.0 geek, you would definitely be aware of the rdf:Bag tag, where you get the list of all tags for a particular link.

For example, an rss page for the tag 'rss' has a link which has the following tags:

    <rdf:li resource=””/>
    <rdf:li resource=””/>
    <rdf:li resource=””/>

So you know that rss, atom and validator are some 'related' keywords. Of course, there is no context here, so there could be possibilities of people tagging as 'irc'. (This is true. I have seen people tag Google as IRC). But if you consider a weightage for tag relationships, then soon you can come up with a model where you get to see tag clusters.

Ok, now back to the topic on Semantic grabbers. The idea came to my mind when I thought of writing a crawler that crawls on Delicious RSS feeds and tries to find out tag clusters. So this crawler is not interested in links, but is actually interested in data that resides in the links. That clearly distinguishes it from a normal HTTP grabber, which blindly follows links and grabs pages.

Soon, with the evolution of RDF, I guess there will be more such crawlers on the web (what are agents?) and people are already talking about how we can crawl such a web. This is my first attempt at it.

So ditch Google sets (if at all you have tried it) and use a 'semantic grabber'. 😉

The evolution of the pub-sub model on the web

Recently, I have seen a new trend emerging on the web. Until quite recently, we had people publishing their information as RSS feeds and others subscribing to it. This was the first step towards the pub-sub (publish subscribe) model.

Then came tagging and people started publishing 'relevant' tags along with the feed entries. This has helped in the emergence of a new trend, wherein I am able to track not just websites, but information pertinent to certain keywords (or tags).

A major advantage of this is that I don't have to subscribe to RSS feeds, rather I just subscribe to a set of keywords (optionally combined using a regular expression) and then get information based on it. I have been trying this for quite sometime now and have been getting wonderful results.

In fact, this is how founders of websites are able to track the popularity of their tool by just subscribing to the keyword that relates to their website. The moment someone tags their blog entry with this tag, it arrives in the feed readers of the founders and they are quick to comment and 'show interest'. Here's more information and an example of how the founder of a website tracked my blog entry within a single day and here's another.

Hoping that tagging is not misused (remember what happened to <meta>?), we have a new way of tracking relevant information.

Key-Value Tagging

The act of tagging consists of labelling objects with keywords [Wikipedia].
Tagging, the way it works now, is attaching separate keywords with
objects. Although we might attach multiple keywords with the same
object, the words are independent of each other (Don't argue that the
words are related in the sense of tag clusters. Let me get to the

In its present form, tagging no wonder has created a revolution. But
would it not be more useful if tagging were in the form of key-value
pairs as well. I should have an option of either tagging objects with
single words (as it works now), or with key-value pairs.

How would this help? I had written about Problems
with Podcasts
sometime back. Now consider a model in which I
could not only have skip-points which mention where a particular topic
starts, but also what these topics are and my own comments on it.

If you compare a single podcast to a set of blog entries, 'key-value'
tagging could be compared to comments to a single blog entry. It would
look somewhat like this:

 <comment>This is where the speaker talks about Google's WebOS initiative.</comment>

Although this can be done using XML so easily, an end user would not
like writing XML code. So a simple interface could be provided where
the user writes the time and the comment and this is clubbed with the
podcast and can be accessed anywhere on the web. Further, the user
could add any information, for example, the name of the speaker
(example, speaker=Gautham) or the location where the podcast was
created (example, location=Bangalore).

And just like tags, nothing is pre-defined. The user can add just about
any 'key-value' tags to any object. Again, as I keep mentioning, RDF
has solutions to these. But 'Keep It Stupidly Simple' is how the web
works. So be it. 🙂

I have been talking about Tag evolution here.

Tag evolution

Tagging has been one of my recent interest fields. The concept of attaching words with objects exposes a lot of possibilities, although it is quite simple and straightforward.

While there are people who say that tagging is not useful/time consuming [1] [2], I feel this is just the beginning in information/knowledge management. I feel there should be one solution that fits all. If you don't like tagging don't tag. If you want tagging use it. And if you want more than that, have more (this is yet to come, but there are people working on this).

There are many other tools/technologies being developed in the semantic world that help this cause, but their 'complexity' has resulted in lesser adoption. So the golden rule seems like 'If it is for the web, KISS'.

Tagging in its present form has a lot of cons and so it is evolving naturally. Here is my first snapshot of the latest developments in this field.

Tag clouds
Tag clusters
– Explanation of flickr clusters [1] [2]
Tag tagging
About the concept of tagging
Tools built using tags

Analysis-Paralysis and Information overload

I had this interesting thought today.

How many times has it happened to you that you come up with a brilliant idea and then after a lot of research you realize that someone else is working on it and are way-ahead?

But what I felt is that if this continues, then you will always be in a state of Analysis-paralysis. With the problem of Information overload, this problem is more intense. (Wanna know more about Anti-patterns?)

It is better therefore, to get into ACTION! This is probably the reason why RSS is a huge success, so is tagging. While there are groups which design standards, there are groups which actually jump into the playground and implement things. Someday the 2 groups converge.

And why did I have this thought? Well, tagging is evolving and you will soon hear about “Tag clusters”. While you might feel that this is normal, the clusters are responsible for giving a context to tags. Now this is where Semantic web concepts help.