Google Reader – Mark Until Current As Read

I am an ardent feed consumer. I easily have over 300 feeds in my Google Reader and read them whenever I get a chance. The feeds include technology blogs, photography blogs, local news, startup blogs, blogs by famous people, blogs that help me in my projects etc.

It’s just not possible for me to visit every feed category every day, so I frequently see some of these categories overflow with posts.

Now I know there are extensive blog posts which describe how to better manage feeds and to cut down on information overload. But as we all know there is no simple solution.

So here I was using Google Reader and just skimming through the posts when I came across this need.

Suppose a feed has about 100 unread posts and I have skimmed through half of them, and read one in between that I thought was interesting, I am now left with quite a few posts on top of my read post, that I am not interested in reading but want to mark them as read so I don’t need to see them again. Would it be possible to mark these as read leaving the rest untouched?

The recent changes to Google Reader provide one option – Mark all entries older than a day, week or month as read. But this does not exactly serve the purpose.

I ended up hacking a Greasemonkey script to do exactly what I wanted.

Here is how the script behaves:

Just press Ctrl+Alt+Y and the script will mark all entries above the current read entry as ‘read’. Ctrl+Alt+I will mark all entries below the current entry as read – for people who read backwards. 🙂

Added benefits:

  • This also works with search results in Google Reader.
  • The script works with entire folders, so you can skim through all posts in a folder marking the ones you have skimmed as read.

How it works:
The script uses the css class names to determine which posts are unread above (or below) the current post. Once it obtains this list, it simulates a click on each of these posts and thereby marks them as read. Simple as that!

This script is part of the Better GReader extension and has featured in Lifehacker.

In order to install the Google Reader – Mark Until Current As Read script, visit this site.

Downloading data using Greasemonkey – Part 2

So I finally found some time to continue my experiments with the data download from browser to the server.

This time my target was Orkut. I decided that I write a simple script to extract my Orkut profile and then display a sub-set of these fields in my own site using my own formatting.

I did not write a Greasemonkey script this time, but just used Firebug to write Javascript. Here is the browser side script:

var arrayToExtract = new Array('listdark', 'listlight');

for(var z=0;z<arrayToExtract.length;z++){
   var elements = $$('.'+arrayToExtract[z]);   // Just got lucky here. $$ is available!
   for(var i=0;i<elements.length;i++){
       var item = elements[i].getElementsByTagName('p');
       if(item[0] == undefined)
           continue;
       postData(item[0].innerHTML);
       postData(item[1].innerHTML);
   }
}

function postData(data){
   var scriptElement = document.createElement('script');

   scriptElement.setAttribute('src','http://buzypi.in/backup?data='+data+'&file=orkut&date='+Date());

   document.body.appendChild(scriptElement);

}

The script above posts the profile information one by one to the server and the server captures it and appends it in a file. The server side code is as follows:

<?php
global $_REQUEST;

$file_name = $_REQUEST['file'];
$data = $_REQUEST['data'];
$more = $_REQUEST['more'];

$DIRECTORY = 'data';

$file_with_location = dirname(__FILE__).'/'.$DIRECTORY.'/'.$file_name;

$file_handle = fopen($file_with_location,'a');

fwrite($file_handle,$data);

if($more == "true")
   ;
else
   fwrite($file_handle,"n");

$success_value = fclose($file_handle);

echo "/*";
if($success_value === TRUE){
   echo "Successfully appended: ".$data."<br/>";
   if($more == "true"){
      echo "Expect you to send more data";
   }
} else {
   echo "Failed to write data";
}

echo "*/";

?>

Guess what happened when I executed the script?

The data was appended to the file alright, but the ordering of the items was messed up in some places.

Here is a sample:

job description:
work phone:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
career interests:
...

while the expected output was:

job description:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
work phone:
career interests:
...

The job description content should have been received before ‘work phone’, but this was not the case.

So what is the solution?

There are 2 things I can think of:
1. Ensure that data posted is atomic.
2. Come up with a simple sliding window protocol arrangement between the browser and the server.

Solution 1 is not always feasible, because of the limits on GET URL size. In fact, we might need to split the body just so that it can be posted using GET’s. So the only solution that can take care of this is (2).

I will post more entries as I progress. Meanwhile, if you have any better solution to the problem, comment here.

Downloading your data using Greasemonkey

Whenever I use some service over the web, I look for several things. Ease of use and customisability are important factors.

However, the most important thing I consider is vendor lock-in (or rather the lack of it). Let's say I am using a particular mail service (ex, GMail). If someday, I find a better email service, would it be easy for me to switch to that service? How easy is it for me to transfer my data from my old service to my new service?

For services like Mail, there are standard protocols for data access. So this is not an issue. However for the more recent services, like blogging, micro-blogging etc, the most widely used data access methodology/format is 'HTTP' via 'RSS' or 'ATOM'.

However, it's not the case that all services provide data as RSS (or XML or in any other parseable form). For example, suppose I make a list of movies I have watched, in some Facebook application, or a list of restaurants I visited, how do I download this list? If I cannot download it, does it mean I am tied to this application provider forever? What if I have added 200 movies in my original service and I come across another service that has better interface and more features and I want to switch to this new service but not lose the data that I have invested time to enter in my original service?

In fact, recently when I tried to download all my Twitters, I realized that this feature has been disabled. You are not able to get your old Twitters in XML format.

So what do we do when a service does not provide data as XML and we need to somehow scrape that data and store it?

This is kind of related to my last blog entry.

So I started thinking of ways in which I could download my Twitters. The solution I thought of initially was using Rhino and John Resig's project (mentioned in my previous blog entry). However, I ran into parse issues like before. So I had to think of alternative ways.

Now I took advantage of the fact that Twitters are short (and not more than 140 characters).

The solution I came up with uses a combination of Greasemonkey and PHP on the server side:

Here is the GM script:
If you intend to use this, do remember to change the URL to post data to.

// @name           Twitter Downloader

// @namespace      http://buzypi.in/

// @author         Gautham Pai

// @include        http://www.twitter.com/*

// @description    Post Twitters to a remote site

// ==/UserScript==

function twitterLoader (){
	var timeLine = document.getElementById('timeline');
	var spans = timeLine.getElementsByTagName('span');
	var url = 'http://buzypi.in/twitter.php';
	var twitters = new Array();
	for(var i=0;i<spans.length;i++){
		if(spans[i].className != 'entry-title entry-content'){
			continue;
		}
		twitters.push(escape(spans[i].innerHTML));
	}

	for(var i=0;i<twitters.length;i++){
		var last = 'false';
		if(i == twitters.length - 1)
			last = 'true';
		var scriptElement = document.createElement('script');
		scriptElement.setAttribute('src',url+'?last='+last+'&data='+twitters[i]);
		scriptElement.setAttribute('type','text/javascript');
		document.getElementsByTagName('head')[0].appendChild(scriptElement);
	}
}

window.addEventListener('load',twitterLoader,true);

The server side PHP code is:

<?php

global $_REQUEST;
$data = $_REQUEST['data'];
//Store data in the DB, CouchDB (or some other location)
$last = $_REQUEST['last'];
if($last == 'true'){
	echo "
	var divs = document.getElementsByTagName('div');
	var j= 0;
	for(j=0;j<divs.length;j++){
		if(divs[j].className == 'pagination')
		break;
	}
	var sectionLinks = divs[j].getElementsByTagName('a');
	var href = '';
	if(sectionLinks.length == 2)
		href = sectionLinks[1].href;
	else
		href = sectionLinks[0].href;
	var presentPage = parseInt(document.location.href[document.location.href.indexOf('page')+'page'.length+1]);
	var nextPage = parseInt(href[href.indexOf('page')+'page'.length+1]);
	if(nextPage < presentPage)
		alert('No more pages to parse');
	else {
		alert('Changing document location');
		document.location.href = href;
	}
	";
} else {
	echo "
	var recorder = 'true';
	";
}

?>

The GM script scrapes the twitters from a page and posts it to the server using <script> includes. The server stores the twitters in some data store. The server also checks if the twitter posted was the last twitter in the page. If so, it sends back code to change to the next page.

Thus the script when installed, will post twitters from the most recent to the oldest.

Ok, now how would this work with other services?

The pattern seems to be:
* Get the data elements from the present page – data elements could be movie details, restaurant details etc.
* Post data elements to the server.
** The posting might require splitting the content if the length is more than the maximum length of the GET request URL.
* Identify how you can move to the next page and when to move to the next page. Use this to hint the server to change to the next page.
* Write the server side logic to store data elements.
* Use the hint from the client to change to the next page when required.

The biggest advantage of this method is we make use of the browser to do authentication with the remote service and also to do the parsing of the HTML (which, as I mentioned in my previous post, browsers are best at).

Speed reading by hacking the column count in Firefox

Recently, I came across a Greasemonkey script for Wikipedia. The script helps us to view Wikipedia articles in multiple columns.

I found this to be useful and in fact saw that it improved my reading speed. In the last one week, I have referred to a lot of Wikipedia articles, and I am really addicted to this multi-column hack.

So now, when I am reading some article, if the article spans the entire width of the page, I open Firebug, 'Inspect' the element displaying the content under consideration and add:

-moz-column-count: 3;
-moz-column-gap: 50px;
font-family: Calibri;
font-size: 11px;

to the element.

And if I end up visiting this site frequently, then I can add a Greasemonkey script or a Userstyle for the page or set of pages.

The above screenshot shows a Wikipedia page as displayed in my browser.

So why is this so useful?
Sometime back, when I was reading an article on usability, I learnt that the reading speed depends on the width of the column. This is one of the reasons why you are able to read news articles faster in newspapers than online. You end up spanning the page vertically rather than horizontal + vertical eye movements. Rather than point to a single article, I would like to point you to the Google search for the study around this topic.

Some of the popular pages where I have added this multi-column functionality are: Wikipedia, Developerworks and Javadocs.

Google Reader finally has search!

This was one feature that I was missing in Google Reader. So, while I tried the Google Custom Search when I really missed it, I was not quite happy with it, since it was showing up really old posts and there was no obvious way of viewing only 'relevant' posts or 'new' posts.

I have also tried a couple of GreaseMonkey scripts. But I was not happy with the user-interface integration.

So finally today, I open Google reader and see a tiny box on the top and wonder for a moment if it was some GreaseMonkey script running. Then I make a search and am convinced it is not! I also make a search in Google News to make sure it is true. And yeah, here is the confirmation. This is perhaps the most long awaited feature ever with regard to Google's applications.

The integration is just too good. Plus there is option to search only within specific tags or subscriptions. There is suggest in the drop down of tags and subscriptions. And guess what, there is also a way to reach the result page directly. Just create a keyword bookmark for: http://www.google.com/reader/view/#search/%s/ and give it a keyword like grs (Google Reader Search) and then use your browser address bar to perform a search directly in Google Reader, for example, 'grs eclipse'.

Suits me perfectly! Finally I feel like I am playing with an ATOM store rather than a simple feed reader.