Categories
World Wide Web

Why it is the way it is – an analysis of the proposal by TimBL of the WWW

Ever wonder why hyperlinks in the World Wide Web (WWW) are unidirectional? Why are links not typed? Why are links many to one and not many to many? Why do browsers have the restrictions that they have today? Why is the web the way it is?

A lot of the answers to these questions are hidden somewhere deep in the web itself. Having come across several technical issues with the web, I began to wonder what the initial creators of the web perceived the web to be? What was running in the minds of the users when they came across the idea of the web?

I started tracing back into history to the very beginning of the WWW. That’s how I came across the ‘original proposal of the WWW‘.

So here are some of my notes on the paper:
(Content in italic are from the paper.)

Use cases for the WWW

The initial use-cases for the WWW were related to project management – communicating project ideas, storing technical details for retrieval later, finding out who wrote a piece of code, fetching all related documents for the current task. Most of the proposal revolves around the system to allow for multiuser hypertext access which is non-centralized and non-hierarchical.

Relationship to relational databases

Linked information systems have entities and relationships. There are, however, many differences between such a system and an “Entity Relationship” database system. For one thing, the information stored in a linked system is largely comment for human readers. For another, nodes do not have strict types which define exactly what relationships they may have. Nodes of similar type do not all have to be stored in the same place.

What does this mean?
We do have entities and relationships, but there are no fixed rules. Entities don’t need to have types and any two entities can be related to each other. There is also no restriction on where the entities are stored.

Hypertext

The key ideas around Hypertext were put down by Vanevar Bush in 1945 in the form of Memex. There were several attempts by people to implement Hypertext and also Hypermedia (linking images, video etc). Ted Nilson coined the word Hypertext in 1965 and subsequently also coined the term Hypermedia. The first implementation of Hypertext in some form seems to be from Doug Engelbart in 1968. The buzz around Hypertext picked up during the late 1980’s – there was a dedicated Usenet newsgroup, a bunch of conferences starting with Hypertext’87, several ACM papers, workshops etc. All this happened even before the WWW was born. There were several commercial products too, like Hypercard from Apple.

TimBL had also tried his hands at building a hypertext system, which he called Enquire. TimBL claims to have built it as early as 1980, although the first mention of Enquire seems to be in this proposal made in 1989.

When I started researching on Hypercard features, I realized one thing. These products are easily 20 years old. Technology has changed a lot in this time. It is really hard to imagine how many of these products looked like. Either the source is not available in its entirety or it is tough to compile. This reminds me of what Grady Booch said – about having an archive of source code similar to the archive of books, videos, music and web pages.

Anyway, the most important difference I see between Enquire and Hypercard is that Enquire was more of a ‘programmers playtool’, while Hypercard was targeted towards end-users.

So while Hypercard had ‘fancy graphics’, Enquire had typed links and was available for multi user access.

WWW requirements

About the requirements that TimBL put down for the WWW:
* Remote access across networks, Heterogeneity, Non-Centralisation – These are what are now taken for granted. The WWW is ubiquitous, it never breaks as a system, it can be accessed from just about any device that is Internet aware.
* Access to existing data – This was one of the reasons why the WWW became popular. It was easy to get existing data onto the web with minimal effort.
* Private links –
One must be able to add one’s own private links to and from public information. One must also be able to annotate links, as well as nodes, privately.
Frankly, I am not sure what TimBL means by private links ‘from’ public information.
* Bells and Whistles – Graphical access to the web was considered optional.
* Data analysis – This is one thing that has not taken off.
It is possible to search, for example, for anomalies such as undocumented software or divisions which contain no people. It is possible to generate lists of people or devices for other purposes, such as mailing lists of people to be informed of changes.
It is also possible to look at the topology of an organisation or a project, and draw conclusions about how it should be managed, and how it could evolve. This is particularly useful when the database becomes very large, and groups of projects, for example, so interwoven as to make it difficult to see the wood for the trees.

The Semantic Web is showing this promise.
* Live links – These are what are now called ‘Dynamic pages’ and most popular pages on the web are ‘live’ in that sense.

The implementation

Much of the academic research is into the human interface side of browsing through a complex information space. Problems addressed are those of making navigation easy, and avoiding a feeling of being “lost in hyperspace”. Whilst the results of the research are interesting, many users at CERN will be accessing the system using primitive terminals, and so advanced window styles are not so important for us now.

As I read this, it gives me a feeling that TimBL was not thinking of making the WWW a ‘public’ web that would be used by just about everyone. Even a non-techie could build a page of content and hook it onto the web. Usability seemed to be of least importance.

The only way in which sufficient flexibility can be incorporated is to separate the information storage software from the information display software, with a well defined interface between them.

This division also is important in order to allow the heterogeneity which is required at CERN (and would be a boon for the world in general).

A client/server split at this level also makes multi-access more easy, in that a single server process can service many clients, avoiding the problems of simultaneous access to one database by many different users.
‘information display software’ – Now that’s what the browser is! Also this is what created the need for HTTP, HTTP server and HTML.

Conclusion

Do we still visualize the web as just content linked via Hypertext? How can we accommodate social networking and the whole realm of developments around Web 2.0 and social network applications?

The web has surely come a long way!

(Note: Draft content – subject to change)

Categories
General

Who do we believe?

As information is becoming cheaper everyday and as we are getting access to more and more information, I see one problem. There are certain ‘well known theories’ which are being proved to be untrue. Also of how ‘facts’ are generated when in fact it had never really occurred. These are things that we studied during our schooling as ‘facts’.

On one side, this is a good thing. It makes you question everything you read or hear and not just accept things blindly. But on the other side, it makes you feel, well, then, what do we believe?

Wikipedia is a classic example of information accuracy and the arguments around it. Do you trust Wikipedia? Take an example of a controversial article – say Scientology, or about Crop Circles, or say the Nazca lines. Would you believe what Wikipedia has to say? Well, isn’t there a slight possibility that the theory is wrong, especially when there are mathematicians, archaeologists, physicists or historians who subscribe to either sides of the controversies.

What if a vast majority of the people actually believe something that is actually not true? Wasn’t the earth believed to be in the center of the solar system and that the sun revolved around the earth?

Here are some things that I came across in recent days:
1. The theory of evolution and the theory of Intelligent design.
2. The Sphinx mystery – is Sphinx older than it was initially thought to be and does it have connections to mars?
3. The Aryan invasion theory – did it really happen?
4. Global warming a myth?
5. Aliens and UFO’s – has anyone really spotted them?
6. Man landing on the moon

Well, the list is endless. If you look for information on any of these, you will see tons of information that can convince you either ways.

Not all of us are mathematicians, not all of us are theoretical physicists. Nor do we have the time to verify every single ‘fact’ we come across.

So the question is how do we believe what we read and who do we trust and believe?!

Categories
Technology

Digitized information and the affect on history

First a bit of background on this post.

It is new year's time and I was busy preparing my resolutions for the next year. Now, I like a systematic method of capturing the information about how I fared and I keep trying out new organizing tools to do so. I have switched from simple notes (as text documents) to sophisticated and personal XML formats to do so.

I decided to use FreeMind as my organizer for this year. In order to ensure that I don't lose time on experimentation with this capturing methodology, I tried it out in the last week of December.

A week's experimentation and I was convinced. It was working out well. In the back of my mind, I was also thinking of the need to take some kind of backup of this data, lest I lose ALL my information because of some stupid mistake.

The stupid mistake was bound to happen sooner than expected. Some wrong keystroke and the file size was reduced to 0. In other words the data in the file was wiped out. I tried using some data recovery software to scan my filesystem and try recovering some backup of the file but it did not work and my week's data was lost!

It was too hard to believe that my data was there just a moment ago and now I don't have it anymore. This made me think how vulnerable our data is. As a co-incidence, the same day, I listened to some podcast where the narrator was explaining the vulnerability of digital data.

Imagine that the entire life form is wiped out suddenly because of some form of 'doomsday' and life regenerates and these new guys don't have a concept of 'digital information'. Imagine that these guys are now doing research similar to the present day 'archeology'. What would they come up with?

“Just a few hundred years back it is believed that there were some intelligent life forms on the earth. We have recovered some evidence that these people were very intelligent. Different people had different roles. We have recovered some highly symmetrically shaped objects. It is believed that a lot of people (yeah, they are referring to software engineers) spent their entire life with these objects. From carbon dating the life of these objects shows that it is very recent, so these people are believed to be doing some very intelligent work although this has not been proven. Also not much inscribed or written data is available from this era. So we believe that they had a sophisticated method of communication.”

Digital data is so different. We can have multiple copies of it. But then it is all upto our interpretation. The pits on a CD represent how my food looked when I had been on my US trip, but only my computer knows how to interpret this information. Updates mean that new data just overwrites previous data. This is quite different from a traditional paper/pen method of capturing information. I don't, for example, have a way of looking at how my idea evolved from the very beginning to the present. The death of a person can also wipe out so much of information about the person residing in various data stores like mail-boxes, bookmarking sites etc, which give a hint about the personality of the person.

The bottom line is that digital data is vulnerable to loss. So whenever you are churning out data remember this fact. If something is precious, move that out of the digital world and into the real world. Data in digital form no matter how many backups you take is always vulnerable to loss.

Just my 2 cents.