What would a semantic desktop look like?

For a moment, let's forget the question and look at a totally different question – what does your desktop look like? Stop reading this for a moment, minimize all windows and have a look at your desktop.

You would see a set of icons (files) right?

Ok, let's say you repeat this experiment sometime later, say a month from now? What would it look like? A similar set of files?

Ok, now how different is it from your current desktop?

If I am right, there is a very strong probability that your desktop reflects your current interests. If you use your computer to listen to songs, then you see some directories containing songs, may be a bunch of players, etc. And if you are a game buff, you will probably see a list of shortcuts to games.

What I am getting to is that the desktop is contextual and at any time I can somewhat determine how you use your system by just looking at your desktop. Although in terms of implementation, the desktop is just another folder, the way users use it is quite different.

Once in a while you spend time re-organizing things, moving things away from your cluttered desktop (and this happens just like in real life 🙂 ). Why do you do that? The reasons for this is not just that your desktop is cluttered, but also that your interests have changed over time.

One more petty observation. If you are in the habit of using more than one system, you will observe that the desktop reflects different things in different systems.

Ok now to the question of a semantic desktop. My idea of a semantic desktop is that of an intelligent desktop that knows what you are currently interested in, shows you what you like at the moment and silently archives things from your desktop (atleast move it away from the 'desktop') as your interests change and it sees that there are irrelevant things out there.

It might also contain information relating to the latest music if you are a music buff, or may be some game that you might be interested in playing if you are interested in games.

You might now wonder, how different it is from widgets that we use on our desktops? Well, there are some differences.

Widgets are 'floating' on top of your desktop. In other words they don't form part of your desktop and I see it like a work-around to not having support from Operating Systems (or the Desktop Environment) for a true semantic desktop.

My idea of a semantic desktop is that of a 'dynamically changing background'. The background image should also have other information embedded in it and this should be real-time. The information could be your contacts, local news, a music player, what not!

The most relevant thing I can think of right now is setting some 'start-pages' as your desktop background. Start pages could be something like Google IG or My Yahoo, My MSN or say Goowy.

Of course, we are not even close to what I see as a real semantic desktop. But I guess we are making some major advancements in this direction and the day is not far-off when I will see my information totally organized and I see the realization of 'information-on-demand'.

So to wind up, how can this happen? Being a semantic web enthusiast, I will naturally expect semantic web to solve the problem of providing data and applying views on data stores. Then there is a need for operating systems (or the desktop environment) to provide support for such an environment. For example, the synchronization interval in the case of the example above is 1 day! That is not even close to 'real-time'. The widgets that we see today should not be floating around, but should be truly embedded into the desktop and should prop up if we need to interact with it, but otherwise stay there just as a provider of information.

Now someone may say that widgets can be configured to not float around. For example, Yahoo Widgets has this option of Konspose mode or moving the widgets to a lower layer.

I however feel there are vital differences between widgets and semantic desktops.

1. I expect the widgets not to interfere with the rest of my icons. So an “Auto arrange” should arrange the icons and widgets in such a way that they don't overlap.
2. A double click on a widget should activate the widget and I should be able to interact with it. Clicking on an empty space should embed the widget back into the desktop.
3. I should be able to resize, minimize/restore widgets at will. Each widget should have a set of configurable properties that I can set through a right-click.

This clearly indicates that Widgets require more support from the Operating System (or the Desktop Environment).

Let me be optimistic and expect some support soon.

Semantic web and privacy issues

Whoa! This is something interesting.

On one side are the people who are talking about making interesting analysis from information over the web and on the other side are people who are talking about its potential threat to privacy.

Well, I am talking about collecting data from various sources and then making interesting analysis from this data. And this data could be of facts, things or 'people'.

Entity analytics is not something new to the Semantic Web. There is some work going on in the field of Relationship resolution (Who is who), Identity Resolution (Who knows who) and Anonymous Resolution (Who is who and who knows who, anonymously). And this is really important because it helps organizations combat against frauds and threat.

But the concern raised in this article in BBC cannot be ignored. The most striking statement made here by Hugh Glaser, Southampton University, with reference to the web is, “All of this data is public data already. The problem comes when it is processed”.

You better leave the needle in the hay. Don't try to analyze and find out where I had been last Friday!

Ok, so what is the solution. Role based security at the data source level is something that I can think of. Build security into the core of the system. This way, no data can get out without people having proper access permissions.

Another solution is to make sure users 'mark' data as available for analysis and if so what kind of analysis. Using data for sampling (individuals being totally anonymous) might not be really bad.

Well, this is something that I feel are some solutions that might be considered to solve this problem. Time will tell.

Wikipedia in RDF

This blog entry is not quite to do with what Wikipedia in RDF is all about, but the kind of problems that I faced in using it.

When I initially read about the Wikipedia in RDF initiative, I was excited. Imagine being able to download the meta information of ALL the articles of Wikipedia and then being able to query it, analyze it and do anything that you would want to do with it.

I loyally downloaded the gzip for RDF/XML format. The zip file size is 397 MB and the unzipped size is supposed to be 3.7 GB (supposed to be, because I did not have enough space in a single partition to unzip the entire zip. I initially had doubt if XP supports files of this size, but saw some page, which said that the maximum file size is the size of the volume in NTFS partitions).

Ok, here come a host of problems. I conducted my experiments in a 256 MB system. I guess the processor is not bad; it is a 1.7GHz Celeron system.

In order to analyze this file, I should first extract it. I extracted this zip partly (about 800 MB) and then tried to open it in my text editor – SciTE. I was disappointed. The file did not open. I then tried Wordpad (I did not dare to try Notepad!), Vim (for Windows), Edit (from cmd.exe) and Mozilla Firefox.

The best response I got was from Edit (I am not surprized. I have done some tests before and I saw that Edit is the best text editor in Windows!), which clearly said it cannot handle files of that size and it will show the first 65000 odd lines. Decent. I atleast get to view 65000 lines!

The second best response was from Mozilla Firefox. I had some problems here. Firefox tried to parse the file, since it was in RDF. I changed the extension to txt so as to avoid parsing and tried again. Firefox immediately started loading the file. It occupied about 150MB of memory, just before it stopped working.

Vim was bad too. 🙁 The file just did not open and Vim made an abnormal exit.

So I am left with a host of problems before I can start playing with this file.

Is there any text editor that I can use to open this file? I guess there should be SOME editor that does caching and is written specially to load huge files.

Ok, now on to the second problem. I am thinking of making some analysis using this RDF document. In order to do that I should be able to 'load' the entire file in memory (because it requires an XML parsing of RDF), or else I cannot use it. I guess I should use FileChannel to create a map of the file and a pull parser to parse the file.

I have not tried this, but I am cent per cent sure that I will face problems. Size does matter!

Wish me luck. 🙂

A semantic grabber

So what's a semantic grabber? If you do a Google search, you get, umm, '0' results (as on 08-March-2006).

So this definitely is not the word used in the wild. So what's it then?

Well, the story began like this. I started off experimenting the evolving pub-sub model wherein you give a list of keywords and you get the latest feeds for it based on the keywords specified. I was trying to come up with an optimum filter that would give me really crisp information. This is a tough job especially in the as yet semantically immature WWW.

My first requirement was to get a good list of keywords. For example, I would like to know all keywords related to semantic-web. I know words like RDF, OWL, RDQL etc are related to semantic-web. But I want a bigger list. (Does this remind you of Google sets?)

Where can I get a list of keywords? I turned to Delicious. If you are a Web 2.0 geek, you would definitely be aware of the rdf:Bag tag, where you get the list of all tags for a particular link.

For example, an rss page for the tag 'rss' has a link which has the following tags:

<taxo:topics>
  <rdf:Bag>
    <rdf:li resource=”http://del.icio.us/tag/rss”/>
    <rdf:li resource=”http://del.icio.us/tag/atom”/>
    <rdf:li resource=”http://del.icio.us/tag/validator”/>
  </rdf:Bag>
</taxo:topics>

So you know that rss, atom and validator are some 'related' keywords. Of course, there is no context here, so there could be possibilities of people tagging http://www.google.com/ as 'irc'. (This is true. I have seen people tag Google as IRC). But if you consider a weightage for tag relationships, then soon you can come up with a model where you get to see tag clusters.

Ok, now back to the topic on Semantic grabbers. The idea came to my mind when I thought of writing a crawler that crawls on Delicious RSS feeds and tries to find out tag clusters. So this crawler is not interested in links, but is actually interested in data that resides in the links. That clearly distinguishes it from a normal HTTP grabber, which blindly follows links and grabs pages.

Soon, with the evolution of RDF, I guess there will be more such crawlers on the web (what are agents?) and people are already talking about how we can crawl such a web. This is my first attempt at it.

So ditch Google sets (if at all you have tried it) and use a 'semantic grabber'. 😉

A general update of many things

This blog entry has no specific title. I wish to talk about a lot of things here:

First and foremost, I have updated my website – ? / // / // `/ ? / /-/. It now reflects a lot of things and is dynamically updated based on RSS feeds. Thanks to RSS-to-Javascript for a cool utility.

From now on, all the items that I search (which I used to normally include in Khoj) will be reflected in my homepage.

All my comments that I make in a hazaar other web-sites will also be dynamically updated in my web-page. (Thanks to this blog entry for the cool hack.)

Let me tell you what this hack is all about:
You might be commenting in a number of other websites and you might wonder how I can club all these together and bring them in one place. This hack helps you do this.

This hack reminds me of the UNIX tooling concept, where you are given small tools and it is up to your creativity to join them together and do wonderful things. Semantic web and Web 2.0 revolution has just started and this is just a taste of it! You will see more of service-combinations in the near future.

Semantic Web -> Single data source -> The future of search -> Google base

It has just been 2 weeks since me and were discussing, “What will happen to search engines like Google, when the concept of Single data source comes in”.

The concept of single data source would mean that no data would exist in static pages. All the data would reside in some storage unit and the pages would be created (if at all required) at run time based on the users' interests.

The existing search engines work on static pages. How well would this work in Web 2.0? Suppose the only pages that existed in the Internet were dynamic pages, what can the search engines index?

Enter Google… Enter Google Base.

I should have thought of it before. As some “Google 1 hour video” says, Google will never give up. They think way ahead of others!

People are spreading rumors about Google base. Here is what Slashdot has to say. The comments are interesting as well.

Google stepped in and made an official announcement too.

People at Google are not fools! They know that once the world moves towards Semantic web and Web 2.0, the amount of static content is going to be drastically reduced. This would mean that search engines cannot boast of having indexed 8 million (or billion) pages and if they do that, it would be considered seriously out-fashioned. (Google has in fact stopped putting that number in their home page; why they did this is a different story altogether!)

It seems like Google says, “How can we solve this problem? Ask people to send data to us? Yeah, why not?! Why should we go around and ask people for data? Let us ask them to publish it here. We want all info. We have the capacity to store it all here. Make your data dynamic and we'll instantly show the world the data that you created.” (You publish, we subscribe! Inverse-RSSing hah?)

Smart!!!

Now the question comes, whether they are really moving towards the semantic web or not. I think they are. I did not get a chance to see Google base as yet; assuming that all the rumors are spreading true facts about Google Base, Google is using a “name=value” kinda structure in Google base, which is a basic pre-requisite for facts representation in Semantic web.

This could mean that Google would then say, “Just publish it wherever you want in a definite syntax, and we will take it from there”. The only difference between this way of indexing and the present way is that in the new method, Google is able to interpret the content in a much better way as the data is structured.

Analysis-Paralysis and Information overload

I had this interesting thought today.

How many times has it happened to you that you come up with a brilliant idea and then after a lot of research you realize that someone else is working on it and are way-ahead?

But what I felt is that if this continues, then you will always be in a state of Analysis-paralysis. With the problem of Information overload, this problem is more intense. (Wanna know more about Anti-patterns?)

It is better therefore, to get into ACTION! This is probably the reason why RSS is a huge success, so is tagging. While there are groups which design standards, there are groups which actually jump into the playground and implement things. Someday the 2 groups converge.

And why did I have this thought? Well, tagging is evolving and you will soon hear about “Tag clusters”. While you might feel that this is normal, the clusters are responsible for giving a context to tags. Now this is where Semantic web concepts help.

2 days, 2 experiences

September 2, 2005:

As usual, after my office work, I improve my gyan in my fields of interest. I am trying out the newly discovered Clusty and hit upon something very interesting.

This site is called KurzweilAI.net. It is not something that I can describe, but here are some things it deals with:

* The Singularity.
* Living Forever.
* Will Machines Become Conscious?
* How to Build a Brain.
* Visions of the Future.

If you are not already excited, then you better not continue reading this blog. But if you are, then I suggest you start off with this:

Chapter 1: The Evolution of Mind in the Twenty-First Century.

I am tempted to tell you what is in there, but let me not, or else you will criticize me for curbing your interests after you read it.

September 3, 2005:

Early morning, I find myself attending a workshop in Le Meridian hotel, Sankey Road. This seminar is about:
Model Driven and Service Oriented Development using Eclipse, J2EE and Web Services.

The workshop was lead by Shridhar Iyengar, a distinguished engineer from IBM. It was one of the best workshops I had ever attended. It was jointly organized by Rotary Bangalore West, IBM and OMG.

Why I specially liked is that, this workshop dealt with upcoming trends in Software development. It talked about concepts like Modeling, Metadata, Service Oriented Development, Model Driven Architecture, Reverse Engineering, Reusable Assets etc.

I come to office today and find more interesting stuff:
Microsoft suing Google and Ballmer using offensive words against Google, making comments like Google will disappear within 5 years etc! (Is this more because of desperation?!)