Categories
World Wide Web

Google Blog Search for my page

I could not find a Google Blog Search For Your Site option, so ended up writing one for myself:

(For some reason it is not working in LJ. Do they disable scripting?)

You can check it in my blogspot page.

Categories
World Wide Web

All your data is ours, but, but wait, what about privacy? contd…2

Context for this article:
1. All your data is ours, but, but wait, what about privacy?
2. All your data is ours, but, but wait, what about privacy? contd…

Guess what? It's been hardly 2 weeks since I blogged about it and here comes a tool which does it.

Omnidrive does it all. The creators say that it will be ubiquitos, unrestrictive and over all, users have their own private encrypted storage area. It also allows you to share files or publish it if you want to. (But is it going to be free?)

It is still not open for use. However you can sign up for beta-testing if interested.

Categories
World Wide Web

Web 2.0 service aggregation tools

Recently I noticed a new trend in the Web 2.0 aggregation tools. These are tools which combine other web 2.0 services in one place and provide a way to host a single page containing all your services. The most common services provided by these aggregating tools are combining delicious, flickr, blogspot and rss feeds in one place.

Examples of such tools are:
Suprglu
Squidoo
Peoplefeeds

Here are my pages:
My Suprglu page
My Semantic Web page @ Squidoo
My Peoplefeeds page
(I had signed up for squidoo long back and got a chance to check out their public beta offering.)

I found an inherent problem in these services.

What I tried to do is to set up a page which contains feeds of my interest based on various other tag search results. In particular, I wanted it to aggregate feeds from delicious, Technorati, Google blog search, Yahoo news search, Feedster, Icerocket etc. I wanted search results for:
(semanticweb OR semantic-web OR semweb OR sw OR semantic_web) AND (owl OR rdf OR rdfs OR ontology OR ontologies OR taxonomy OR rdql OR SPARQL OR w3c OR metadata OR semantic OR semantics OR knowledge)

These are the problems I faced:
* Most tag search engines are not intelligent enough to provide RSS feeds for such searches.
* The page is not intelligent enough to remove duplicate links. For example, suppose I have a page bookmarked in delicious having the tags as semantic-web and rdf, then that particular link shows up in both the tag searches. So if I combine the tag search results, the page shows up twice.
* Most of the service providers do not have an option to turn off non-English pages. So many Japanese and French (or Latin?!) pages turn up in the results.
* I want a hierarchy. I should be able to create a group “Semantic web” which contains feed results for the search query given above and another group, say “Web 2.0” which has a similar query. I should be able to relate the results of “Semantic web” group with those of “Web 2.0”.
* The ability to view feeds using different views – “Technical” and “Non-technical” or “Office related” or “Non office related”.
* Finally, there should be a theme. I would like to read my “Technical feeds” once a day and “Comics” once a week. How do I separate them?

I am still looking for a solution.

Categories
World Wide Web

Any relation between President Bush’s speech and Semantic Web?

This blog entry is about a news item, which says that it was a professor of a university who originally wrote President Bush's speech. How was it found out and what are its implications?

Well, look at this article: National Strategy for victory in Iraq.

Now download the PDF document and view the properties of the PDF document. Do you see “feaver_p”?

Here's the reasoning that NYTimes article gives about this:

The role of Dr. Feaver in preparing the strategy document came to light through a quirk of technology. In a portion of the document usually hidden from public view but accessible with a few keystrokes, the plan posted on the White House Web site showed the document's originator, or “author” in the software's designation, to be “feaver-p.”

This has raised concerns about metadata harming the privacy of people.

Some more interesting analysis here and here (These are the articles which relate to metadata over the semantic web).

Ethics is going to be a really hot field soon. 🙂

Categories
World Wide Web

All your data is ours, but, but wait, what about privacy? contd…

I had recently blogged about privacy concerns with regard to storing data online. And this is what I found today: Do you trust Google?

Among the various things that the article mentions I found these interesting:

* Google working with scientists to make available data related to human genomes. (Now who is going to gift me Google Story?)

* Google providing personal data based on RFID tags.

What is Google upto?!

Categories
World Wide Web

Speech recognition -> Podcasts -> Podzinger

Podzinger is just what I was looking for! Podzinger uses speech recognition technologies to actually try and figure out the words in a podcast and then helps us to search within podcasts! Although not quite 100% accurate, it is quite impressive.

This can actually be used in a number of ways:

* Just search for keywords the way you do a normal search and get the podcasts of your choice. Podzinger actually provides RSS alerts for these keywords and so you get podcasts on the fly delivered to your favorite reader.

* I had recently written about the Problems with podcasts, where I had mentioned:

…there is an inherent problem with podcasts. They are not searchable. A typical podcast, for example, Slashdot Review contains many different news items. In this example, Slashdot review contains all the important stories published in Slashdot in that day.

In RSS, suppose I am not interested in reading a particular news item, I can just skip and read the next one.

Podzinger helps us with this.

Usually podcast publishers provide you with a description, which tells you what the podcast contains. Just use this to search in Podzinger and you can magically be transferred to the exact location where that particular item starts.

One problem however: Podzinger works only with IE 5.0+ with RealPlayer. (However for the sake of using this utility you can definitely go back and use that browser. 🙂 )

If you care about podcasts, you definitely should give it a try!

Categories
World Wide Web

All your data is ours, but, but wait, what about privacy?

It started with Gmail as far as I can remember. Google provided 1 GB of space and people thought why not store everything online. As I have already told a zillion times, this is what the single data source concept is all about. And now it is back with a bang, with Google Base.

But a thought struck me today.

How can we rely on people who we don't even know? What is the guarantee that Google will not misuse our data? You might say, “What will Google do with MY data?”, but think again. The world becomes so restricted because of the absence of trust. You are not ready to store your confidential files or your private files in the same place. That 100 billion dollar idea that you wrote last night? Are you ready to store it in an online data-source?

The solution?

It would be better if Google (or anyone for that matter) provides the same service, but it does not know what data we store.

The idea is simple.

Encrypt all data as soon as it is created using some key that depends on the user who created the data. Decrypt it just when you need it. A mediator between the client interface and the server is responsible for the encryption and decryption. The mediator of-course lies on the client side.

And in the world of semantic web services, you can expect companies encrypting all data that they generate. So it is ok if you store your confidential files or the vision document of your company in the same single-data-source that you use to publish your photos to the public! (This seems like a horror story now, but it is perfectly valid.) Accidental leaks will not be a problem.

You don't have to be bothered about whether someone will be accessing that data, or if someone misuses it. All copies made of the document will be a waste as people just cannot make sense of it.

Security features like encryption and digital signatures are going to be a very important piece in technological evolution in the years to come. You can bet on it!

Categories
World Wide Web

RSS hacking – some observations

I tried simulating the situation that I had mentioned in my previous blog entry on Gmail forwarding and service interoperability – an interesting observation.

I first opened a new account in Reader1 (I don't want to mention this) and then subscribed to my blog's RSS feed using it. Then using Reader2, I subscribed to Reader1's RSS feed. I also finally subscribed to Reader2's RSS using Reader1.

Nothing happened again. Reason?

RSS 2.0 specification says that there should be one 'channel' element within the root 'rss' element. 'channel' can contain any number of 'item' elements. 'title', 'link' and 'description' are mandatory elements in 'item'.

Usually, every RSS feed includes a 'pubDate' element although it is not mandatory. Also they include a 'guid', which is a Globally Unique Identifier. The latter makes it unique. The former can be used along with 'link' to give a hint of duplicate entry. So the readers usually identify duplicate entries and a loop will not occur.

However there is something that can still be experimented:

Since the mandatory elements are only: 'title', 'link' and 'description', and since you cannot uniquely identify any feed using one of these (atleast I could not see any mention of this in the spec), we can create an environment where we can show that the infinite loop can occur in principle.

2 things before I wind up:

One: There is some solution to stop the infinite loop problem in RSS although this is not obvious in first sight.
Second: This problem is something that we need to seriously consider now (this stage of web evolution) or else it could be a major design flaw that will require ugly patches later on (remember IPv4?). And this is where a formal approach (standards based) always helps.

Categories
World Wide Web

Gmail forwarding and service interoperability – an interesting observation

Ever seen the Gmail forwarding feature? Gmail helps you in forwarding your mails from one account automatically to another account.

It just occured to me (and would occur to any hacker), what if I forward mails to some account and then from that account forward it back to this?

Guess what? Nothing happens! Gmail has taken care of that.

We had a similar problem when we were discussing about service interoperability in Ananyeah. I guess it is easier to take care of this in Gmail as it is only mail. What if there are other services?

Let me give you an example for other services. It is possible to subscribe to a blog and get the feed delivered in our reader. Let us call the first reader, Reader1. Now assume, Reader1 provides the option of creating an RSS out of it. If I subscribe to this RSS using another feed reader, Reader2 and then subscribe to their RSS using the Reader1, what is bound to happen? Time to check out and start experimenting. (And if you did not understand this concept, don't worry. You will hear about it soon.)

Categories
World Wide Web

Key-Value Tagging

The act of tagging consists of labelling objects with keywords [Wikipedia].
Tagging, the way it works now, is attaching separate keywords with
objects. Although we might attach multiple keywords with the same
object, the words are independent of each other (Don't argue that the
words are related in the sense of tag clusters. Let me get to the
point).

In its present form, tagging no wonder has created a revolution. But
would it not be more useful if tagging were in the form of key-value
pairs as well. I should have an option of either tagging objects with
single words (as it works now), or with key-value pairs.

How would this help? I had written about Problems
with Podcasts
sometime back. Now consider a model in which I
could not only have skip-points which mention where a particular topic
starts, but also what these topics are and my own comments on it.

If you compare a single podcast to a set of blog entries, 'key-value'
tagging could be compared to comments to a single blog entry. It would
look somewhat like this:

<skippoint>
 <time>0.24.29</time>
 <comment>This is where the speaker talks about Google's WebOS initiative.</comment>
</skippoint>

Although this can be done using XML so easily, an end user would not
like writing XML code. So a simple interface could be provided where
the user writes the time and the comment and this is clubbed with the
podcast and can be accessed anywhere on the web. Further, the user
could add any information, for example, the name of the speaker
(example, speaker=Gautham) or the location where the podcast was
created (example, location=Bangalore).

And just like tags, nothing is pre-defined. The user can add just about
any 'key-value' tags to any object. Again, as I keep mentioning, RDF
has solutions to these. But 'Keep It Stupidly Simple' is how the web
works. So be it. 🙂

I have been talking about Tag evolution here.