Categories
Technology World Wide Web

Google Docs, ODF and Data Portability

Consider the code below to display a line of text in HTML:


<style>
.paragraph-text {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
</style>
...
<p><span class="paragraph-text">Here is a test line</span></p>

Now let’s say, we see some developer write it this way:


<style>
.T1_1 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_2 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_3 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_4 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_5 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_6 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_7 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_8 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
.T1_9 {font-family: Arial; font-size: 11pt; font-weight: normal; text-decoration: none}
</style>
...
<p class="P1">
<span class="T1_1">Here</span>
<span class="T1_2"> </span>
<span class="T1_3">is</span>
<span class="T1_4"> </span>
<span class="T1_5">a</span>
<span class="T1_6"> </span>
<span class="T1_7">test</span>
<span class="T1_8"> </span>
<span class="T1_9">line</span>
</p>

What would you say of the quality of the markup above?

Now what if I tell you that the people behind the best search engine have done it this way? Don’t believe me? Why don’t you try it out for yourself:

  1. Create a document in Google Docs
  2. Enter some text

    Here is a test line
  3. Export it in ODT
  4. Now open the ODT file using an Archive utility and open up the content.xml and inspect the contents


<office:automatic-styles>
...
  <style:style style:name="T1_1" style:family="text" style:parent-style-name="Default_20_Paragraph_20_Font"><style:text-properties style:text-line-through-style="none" fo:font-style="normal" style:font-style-asian="normal" style:font-style-complex="normal" fo:color="#000000" style:font-name="Arial" fo:font-size="11pt" style:font-name-asian="Arial" style:font-size-asian="11pt" style:font-name-complex="Arial" style:font-size-complex="11pt" fo:font-weight="normal" style:font-weight-asian="normal" style:font-weight-complex="normal" style:text-underline-style="none"/></style:style>
  ... (repeated for the rest)
</office:automatic-styles>
...
<text:p text:style-name="P1">
  <text:span text:style-name="T1_1">Here</text:span>
  <text:span text:style-name="T1_2">
  <text:s/>
  </text:span>
  <text:span text:style-name="T1_3">is</text:span>
  <text:span text:style-name="T1_4">
    <text:s/>
  </text:span>
  <text:span text:style-name="T1_5">a</text:span>
  <text:span text:style-name="T1_6">
    <text:s/>
  </text:span>
  <text:span text:style-name="T1_7">test</text:span>
  <text:span text:style-name="T1_8">
    <text:s/>
  </text:span>
  <text:span text:style-name="T1_9">line</text:span>
</text:p>

Not only have they used separate spans, they have used different styles for each word and then duplicated the entire style definition for each! I couldn’t believe this. Although I was pretty sure, Libreoffice would not do this, I wanted to verify it nonetheless and here is what I saw:


<text:p text:style-name="Standard">Here is a test line</text:p>

(So it is possible to generate crisper markup)

Wow!!! Am I missing something obvious here?

Imagine exporting a document that runs into several pages. No wonder Libreoffice seemed to take some time to import the document.

There are other minor issues with the exported document. For example, Google rewrites every link to redirect it via Google’s servers. Now I can understand why this would be required, but does it make sense to export the document with these redirections? Then there is an issue where the indentation of the bullets is much larger than what is regularly found in documents.

I use quite a few of the Google products. Of the companies out there I am the least hesitant to use Google’s products because of their support for data portability. But then, I guess they need to do a better job at ensuring that the ported data is usable and does not take another Googler or a Google Application to interpret it, which primarily defeats the purpose of data portability in the first place.

I hope they solve these issues soon.