rdf – Travel Photography and Technology «buzypi.in»

This post is an analysis of an early document on Hypertext Design Issues.

The key ideas being discussed in this document are on Hypertext – whether links should be monodirectional or bidirectional, should links be typed etc.

These discussions were conducted in the early days of the web. It is interesting to know how things have evolved since the time this design was made.

Let’s first get some facts right:
Hypertext links today:

Are Two-ended
Are Monodirectional
Have one link
Are Untyped
Contain no ancillary information
Don’t have preview information

What are the implications of this design?

Hyperlinks are not multiended. A single link cannot link to multiple destinations. There are however cases when one to many, many to one and many to many ‘links’ might make sense. These types of connections among information nodes is what RDF/OWL help achieve.
an advantage is that often, when a link is made between two nodes, it is made in one direction in the mind of its author, but another reader may be more interested in the reverse link.
Bloggers want to track those pages that have linked to their posts. Google indexes allow us to track links to a particular page. Linkback mechanisms have evolved in the Blogger world to serve precisely this purpose. In general however, we never know who has linked to our page
It may be useful to have bidirectional links from the point of view of managing data. For example: if a document is destroyed or moved, one is aware of what dangling links will be created, and can possibly fix them.
This problem has not yet been solved. Since links are monodirectional, dangling links cannot be detected. Dangling links – when the information linked to changes, there is no way to clean up the links
About anchors having one or more links: This is still debatable. There are some utilities that allow you to make every word a hyperlink and allow executing a host of ‘commands’ on the word. Ex: Perform a Google search for the word, lookup the word in dictionary.com, map the word (if it is a city) or lookup in Wikipedia. However I am not a big fan of these utilities since I feel it clutters the screen and the context detection is not yet great.
Typed links: I feel this is the single most important thing missing from Hyperlinks in WWW. While making types mandatory would have complicated the issue, a standard way to provide ‘types’ to links should have been provided. Anyway, it’s the way it is. So how are people solving this issue? Microformats, RDFa are 2 things I know of. The data is mostly silently read by the browser and tools and users are usually unaware of this data in the pages. In other words, the User Interface for typed links is still not great.
Meta information associated with links. Interesting! I am aware of Wikipedia articles containing the date when the page was last visited but this is pretty much manually updated as far as I know.
Preview information: Snap solves this very issue.

The conclusion?
Well, it’s tough to say how optimal the design of hypertext on the WWW was. Introducing multi-directional links and typed links would definitely help the technical people out there, but would introduce complexity which would perhaps have made it so tough for the web to flourish that it wouldn’t be what it is today.

This blog entry is not quite to do with what Wikipedia in RDF is all about, but the kind of problems that I faced in using it.

When I initially read about the Wikipedia in RDF initiative, I was excited. Imagine being able to download the meta information of ALL the articles of Wikipedia and then being able to query it, analyze it and do anything that you would want to do with it.

I loyally downloaded the gzip for RDF/XML format. The zip file size is 397 MB and the unzipped size is supposed to be 3.7 GB (supposed to be, because I did not have enough space in a single partition to unzip the entire zip. I initially had doubt if XP supports files of this size, but saw some page, which said that the maximum file size is the size of the volume in NTFS partitions).

Ok, here come a host of problems. I conducted my experiments in a 256 MB system. I guess the processor is not bad; it is a 1.7GHz Celeron system.

In order to analyze this file, I should first extract it. I extracted this zip partly (about 800 MB) and then tried to open it in my text editor – SciTE. I was disappointed. The file did not open. I then tried Wordpad (I did not dare to try Notepad!), Vim (for Windows), Edit (from cmd.exe) and Mozilla Firefox.

The best response I got was from Edit (I am not surprized. I have done some tests before and I saw that Edit is the best text editor in Windows!), which clearly said it cannot handle files of that size and it will show the first 65000 odd lines. Decent. I atleast get to view 65000 lines!

The second best response was from Mozilla Firefox. I had some problems here. Firefox tried to parse the file, since it was in RDF. I changed the extension to txt so as to avoid parsing and tried again. Firefox immediately started loading the file. It occupied about 150MB of memory, just before it stopped working.

Vim was bad too. 🙁 The file just did not open and Vim made an abnormal exit.

So I am left with a host of problems before I can start playing with this file.

Is there any text editor that I can use to open this file? I guess there should be SOME editor that does caching and is written specially to load huge files.

Ok, now on to the second problem. I am thinking of making some analysis using this RDF document. In order to do that I should be able to 'load' the entire file in memory (because it requires an XML parsing of RDF), or else I cannot use it. I guess I should use FileChannel to create a map of the file and a pull parser to parse the file.

I have not tried this, but I am cent per cent sure that I will face problems. Size does matter!

Wish me luck. 🙂