The BullsNCows Application in AngularJS

About 5 years back, when I was still dabbling with client-side technologies for the first time, I had written the BullsNCows game in pure Javascript.

My knowledge of Javascript and client side technologies has evolved quite a bit since then. So, recently when I was looking at this code I thought, “This code could be better if we use AngularJS“.

So today I thought of rewriting this in AngularJS and using Bootstrap for styling. Here is the code in Github in case you want to play with it and here is the game in case you want to play.

bullsncows

So what’s changed?

  • All the code related to styling has been moved to Bootstrap. The alternate table coloring seems so much easier now!
  • The update of the table is done with AngularJS so I don’t need to do complex DOM manipulation – it is a straight-forward binding to the tr element of the table.
  • The data handling and algorithm to compute bulls and cows is slightly improved. It used to be a O(n^2) algorithm. It’s now linear.
  • Elimination of globals – except the app object nothing is global now.

Downloading data using Greasemonkey – Part 2

So I finally found some time to continue my experiments with the data download from browser to the server.

This time my target was Orkut. I decided that I write a simple script to extract my Orkut profile and then display a sub-set of these fields in my own site using my own formatting.

I did not write a Greasemonkey script this time, but just used Firebug to write Javascript. Here is the browser side script:

var arrayToExtract = new Array('listdark', 'listlight');

for(var z=0;z<arrayToExtract.length;z++){
   var elements = $$('.'+arrayToExtract[z]);   // Just got lucky here. $$ is available!
   for(var i=0;i<elements.length;i++){
       var item = elements[i].getElementsByTagName('p');
       if(item[0] == undefined)
           continue;
       postData(item[0].innerHTML);
       postData(item[1].innerHTML);
   }
}

function postData(data){
   var scriptElement = document.createElement('script');

   scriptElement.setAttribute('src','http://buzypi.in/backup?data='+data+'&file=orkut&date='+Date());

   document.body.appendChild(scriptElement);

}

The script above posts the profile information one by one to the server and the server captures it and appends it in a file. The server side code is as follows:

<?php
global $_REQUEST;

$file_name = $_REQUEST['file'];
$data = $_REQUEST['data'];
$more = $_REQUEST['more'];

$DIRECTORY = 'data';

$file_with_location = dirname(__FILE__).'/'.$DIRECTORY.'/'.$file_name;

$file_handle = fopen($file_with_location,'a');

fwrite($file_handle,$data);

if($more == "true")
   ;
else
   fwrite($file_handle,"n");

$success_value = fclose($file_handle);

echo "/*";
if($success_value === TRUE){
   echo "Successfully appended: ".$data."<br/>";
   if($more == "true"){
      echo "Expect you to send more data";
   }
} else {
   echo "Failed to write data";
}

echo "*/";

?>

Guess what happened when I executed the script?

The data was appended to the file alright, but the ordering of the items was messed up in some places.

Here is a sample:

job description:
work phone:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
career interests:
...

while the expected output was:

job description:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
work phone:
career interests:
...

The job description content should have been received before ‘work phone’, but this was not the case.

So what is the solution?

There are 2 things I can think of:
1. Ensure that data posted is atomic.
2. Come up with a simple sliding window protocol arrangement between the browser and the server.

Solution 1 is not always feasible, because of the limits on GET URL size. In fact, we might need to split the body just so that it can be posted using GET’s. So the only solution that can take care of this is (2).

I will post more entries as I progress. Meanwhile, if you have any better solution to the problem, comment here.

Downloading your data using Greasemonkey

Whenever I use some service over the web, I look for several things. Ease of use and customisability are important factors.

However, the most important thing I consider is vendor lock-in (or rather the lack of it). Let's say I am using a particular mail service (ex, GMail). If someday, I find a better email service, would it be easy for me to switch to that service? How easy is it for me to transfer my data from my old service to my new service?

For services like Mail, there are standard protocols for data access. So this is not an issue. However for the more recent services, like blogging, micro-blogging etc, the most widely used data access methodology/format is 'HTTP' via 'RSS' or 'ATOM'.

However, it's not the case that all services provide data as RSS (or XML or in any other parseable form). For example, suppose I make a list of movies I have watched, in some Facebook application, or a list of restaurants I visited, how do I download this list? If I cannot download it, does it mean I am tied to this application provider forever? What if I have added 200 movies in my original service and I come across another service that has better interface and more features and I want to switch to this new service but not lose the data that I have invested time to enter in my original service?

In fact, recently when I tried to download all my Twitters, I realized that this feature has been disabled. You are not able to get your old Twitters in XML format.

So what do we do when a service does not provide data as XML and we need to somehow scrape that data and store it?

This is kind of related to my last blog entry.

So I started thinking of ways in which I could download my Twitters. The solution I thought of initially was using Rhino and John Resig's project (mentioned in my previous blog entry). However, I ran into parse issues like before. So I had to think of alternative ways.

Now I took advantage of the fact that Twitters are short (and not more than 140 characters).

The solution I came up with uses a combination of Greasemonkey and PHP on the server side:

Here is the GM script:
If you intend to use this, do remember to change the URL to post data to.

// @name           Twitter Downloader

// @namespace      http://buzypi.in/

// @author         Gautham Pai

// @include        http://www.twitter.com/*

// @description    Post Twitters to a remote site

// ==/UserScript==

function twitterLoader (){
	var timeLine = document.getElementById('timeline');
	var spans = timeLine.getElementsByTagName('span');
	var url = 'http://buzypi.in/twitter.php';
	var twitters = new Array();
	for(var i=0;i<spans.length;i++){
		if(spans[i].className != 'entry-title entry-content'){
			continue;
		}
		twitters.push(escape(spans[i].innerHTML));
	}

	for(var i=0;i<twitters.length;i++){
		var last = 'false';
		if(i == twitters.length - 1)
			last = 'true';
		var scriptElement = document.createElement('script');
		scriptElement.setAttribute('src',url+'?last='+last+'&data='+twitters[i]);
		scriptElement.setAttribute('type','text/javascript');
		document.getElementsByTagName('head')[0].appendChild(scriptElement);
	}
}

window.addEventListener('load',twitterLoader,true);

The server side PHP code is:

<?php

global $_REQUEST;
$data = $_REQUEST['data'];
//Store data in the DB, CouchDB (or some other location)
$last = $_REQUEST['last'];
if($last == 'true'){
	echo "
	var divs = document.getElementsByTagName('div');
	var j= 0;
	for(j=0;j<divs.length;j++){
		if(divs[j].className == 'pagination')
		break;
	}
	var sectionLinks = divs[j].getElementsByTagName('a');
	var href = '';
	if(sectionLinks.length == 2)
		href = sectionLinks[1].href;
	else
		href = sectionLinks[0].href;
	var presentPage = parseInt(document.location.href[document.location.href.indexOf('page')+'page'.length+1]);
	var nextPage = parseInt(href[href.indexOf('page')+'page'.length+1]);
	if(nextPage < presentPage)
		alert('No more pages to parse');
	else {
		alert('Changing document location');
		document.location.href = href;
	}
	";
} else {
	echo "
	var recorder = 'true';
	";
}

?>

The GM script scrapes the twitters from a page and posts it to the server using <script> includes. The server stores the twitters in some data store. The server also checks if the twitter posted was the last twitter in the page. If so, it sends back code to change to the next page.

Thus the script when installed, will post twitters from the most recent to the oldest.

Ok, now how would this work with other services?

The pattern seems to be:
* Get the data elements from the present page – data elements could be movie details, restaurant details etc.
* Post data elements to the server.
** The posting might require splitting the content if the length is more than the maximum length of the GET request URL.
* Identify how you can move to the next page and when to move to the next page. Use this to hint the server to change to the next page.
* Write the server side logic to store data elements.
* Use the hint from the client to change to the next page when required.

The biggest advantage of this method is we make use of the browser to do authentication with the remote service and also to do the parsing of the HTML (which, as I mentioned in my previous post, browsers are best at).

HTML parsing and Rhino

About a year back I was working on a personal project in IBM. This was a clone of YubNub for the IBM intranet.

For those of you who don’t know YubNub, it is a simple but powerful tool, which allows you to define keywords to reach pages. One of the popular examples is gim which will take you to the Google Image Search results page for the keywords that you entered.

When I built this YubNub clone, I had plans to introduce the feature of defining commands to get data from specific portions of a page. For example, you would be able to fetch the telephone number of a person using a command like: telephone . The way this works is by scraping the content off a page containing the telephone number at a specific section in the person’s profile page.

But wouldn’t it be cool to provide the flexibility to the user to define what to fetch from a page on the Intranet? You can ask the user to define what content to fetch from a page when he creates the command.

Look at the YubNub create command interface. The basic information asked in the page is:

  • Name of the command
  • URL
  • Description

Now imagine having an extra text-field which asks you to enter the XPath to the content that you want to scrape from the resultant page.

In simple words, this means, you are saying, fetch this page, then get this specific portion of the page and only give me that content. You could perhaps pipe that content to some other command or play with that content in umpteen ways. I haven’t followed YubNub of-late, but I am sure there are many commands in YubNub which have similar functionality.

Now in principle, although this is possible there was one major issue I faced. The server had to do the page fetch and then page scraping. Now although there are very good XML parsers out there, there is no good ‘XML’ parser for HTML. And XPath does not work unless the page is XML.

Most pages on the Internet are HTML (or XHTML) and although it is straight-forward to transform them to XML, anyone who has tried it will see that this is not a simple solution. When you try to parse an XHTML page (even popular pages out there) you will run into issues like ‘entity not defined’ or ‘matching element not found’ etc. Although there are tools like Tidy or TagSoup, you are not guaranteed that the output of such tools is a well-formed XML.

On the other hands, browsers are extremely flexible in the way they handle HTML. Traversing through the HTML DOM is really simple and many a times you don’t even realize that your browser has silently corrected 10’s of errors in the page. You can get to any specific portion of the page using HTML DOM functions or using libraries like JQuery.

So what I was looking for, was some tool which had the flexibility of the browser’s HTML handling, but at the same time was able to function on the server.

As if by co-incidence, I ran into this post from John Resig (the person popular for JQuery). John describes one of his projects on bringing the browser environment to Rhino. He also gives an example of how to scrape content from a web-page and send the result to a file.

Wow! This is exactly what I had been looking for. Since Rhino can be embedded in Java, all you would need to do is to make a call to the JS function to scrape content and then pass the content back to Java and continue with your processing.

Although I don’t work on the project anymore, I see requirement of this functionality in many other places. For example, just sometime back, I was looking for a simple tool to fetch Tiddlers from Tiddlywiki and convert them into a simple HTML page. This will help in supporting those browsers which don’t have Javascript enabled. I tried some of the tools out there, but most of them failed. So I planned to write my own. And lo, I came across this same issue. TiddlyWiki content is in HTML and this content is not easy to parse using XML parsers (which is perhaps why many of those tools failed). So how about using Rhino and John’s project to scrape content from the wiki and sending it to a file in a different format?

The project looks very promising. I should follow it closely.

Bulls and cows and the Javascript challenge

About 2 years back, I had conducted an experiment with the Bulls and Cows game[1] [2]. I now wanted to see what the 'human average' for the game is. So I wanted to build a small Facebook application to add the social aspect to the game and conduct my experiments.

But before I continued, I had to solve a major problem.

If I continue to make it a Javascript game, as is hosted here, I need to ensure that the random number generated by the browser is secure and not manipulated or found out by the player using illegal ways.

Anyone who knows a bit of Javascript and is used to looking at code using Firebug will soon be able to 'guess' the number in one step:

Yeah, that's right. I store the random number generated in a variable randomNo. And you can find out the value using Firebug. Now this is fine, as long as it is not a competition and you play the game because you actually like it and not because you are winning a million dollars. But what if this game was being played for money?

So my next attempt was to think of storing a MD5 of the number and then match it with the MD5 of the number entered by the player. This works well as long as the random number is generated on the server side and only the MD5 is sent to the client.

Can the random number and its MD5 be generated on the client side without the user being able to 'debug' and get the random number?

My first attempt towards this was the following piece of code:

function getRandomNo(){
        var md5OfRandomNo = MD5(Math.floor(Math.random()*10001)+'');
	return md5OfRandomNo;
}

But unfortunately:

and you step into the function and:

🙁

Right now, I am still not able to find a fool-proof way to generate the random number on the client side. Is there a solution?

Ok, let's say the number is securely generated in some way (client or server) and we only store the MD5 value on the client. Now, there is a second problem:

What if the player just changes the random number altogether?

>>> randomNo
"948f847055c6bf156997ce9fb59919be"
>>> randomNo = MD5('7839')
"ca91c5464e73d3066825362c3093a45f"

We need to maintain a session and include some verification code to ensure that the MD5 was not manipulated.

Is there a solution for this if we want to write the entire game using only Javascript? Are there any other issues other than the 2 described?

Groovy – a cool JVM based scripting language

Ever thought you are so addicted to Java that although the world is talking about moving to functional languages, you just cannot see yourself leaving the JVM?

Well, as Jerry put it, Groovy could be your solution. You can still use the JVM, but instead of Java, use Groovy as your programming language.

Are there any compelling reasons to move to Groovy or even away from Java to any of the scripting languages on the JVM?

Well, I wouldn't consider myself to be an expert in Java, or any programming language, but from the little experience that I have, I should say, there are some reasons I can think of.

Some time back, we had this requirement. We were doing some project on Eclipse and we had to make changes to a basic Eclipse object because an assumption made by the Eclipse architecture was not true for us. Were we trying to break the architecture of Eclipse? May be. Why were we trying to do it? Well, technological coolness!

Anyway, the point is, this is not possible without actually downloading the source and then making the modification and asking users to replace the original JAR containing that object with ours. Or we need to ask the Eclipse community to accept our change and it might not be a compelling reason for the community. Is there a simpler solution?

How would it be if it was possible to modify this object during runtime and not require anyone to replace the JARs?

Well, although I have not tried it, I am sure it is quite easy to do this with a functional language. I have done some similar changes to Dojo using the 'prototype' property.

There are simpler reasons than that. Java sometimes can get really painful. Remember how painful it is to simply replace backslashes in Java? Remember how painful it is to obtain a substring from the 2nd character at the beginning to the last but one character at the end? Remember how painful it is to write a string of HTML to the response from a servlet? Can setters/getters be simpler? Can XML document creation be simpler?

With the world talking about byte-code modification, metaprogramming and scripting languages on the JVM, Java is becoming more of a platform language than a core programming language. It is more like what C/C++ was about 10 years back and what assembly was before that.

So back to my question on why Groovy.
The Groovy guys have answered it elegantly. Here are the reasons that I found to be the most compelling:
1. It has functional aspects.
2. The syntax is neat and consistent.
3. It is Java based. It is easy to move back to Java if I ever need to. Groovy is completely interoperable with Java and reuses a lot of the Java semantics. If you really need to get this, you should see this tutorial.

So go ahead and give it a try! Who knows, you may soon start to wonder why you were doing Java programming all these days!

Firefox 2 experiments

Previous related entries:
Firefox extensions – my picks
Firefox extensions – my picks II (a web developer's heaven)
Microsummaries – a new feature in Firefox 2

Firefox 2 was released recently. This week I got a chance to dabble with it. Integrated spell-check is a new feature that I liked.

I cannot stop talking about Microsummaries. So I continued my experiments and I found some useful extensions.

Here they are:
Microsummary Generator Builder
XPath Checker

Also for Web developers, here are some more useful extensions (other than the ones that I have mentioned in my previous blogs)

UrlParams
Execute JS

Also I found these extensions very useful:
All-in-One Sidebar

So here's how it looks now:

Firefox extensions – my picks II (a web developer’s heaven)

Are you into development of scripts using JS and DOM? Do you do AJAXian scripting (using XMLHttp etc)? Then you might have realized the pain of not being able to debug them when faced with problems. A set of 'alert's is not always elegant.

So here's what you do:
1. First things first : Use Firefox
2. Install these extensions.

* Venkman Javascript Debugger : https://addons.mozilla.org/extensions/moreinfo.php?id=216&application=firefox
If you start using this, you will forget that you are using a browser. You will see how the browser is changing into something like an IDE. More about it here.

* Firebug : https://addons.mozilla.org/extensions/moreinfo.php?id=1843&application=firefox
This extension shows you errors that occur during rendering of a page. Click on the line no and it will transfer you to the appropriate line in the appropriate file. Not just this, you can also use this to inspect different values of the HTML DOM, by just clicking inspect element and then clicking the item that you want to inspect and find request/responses of XMLHttp Requests.

* Hypertext DOM browser : https://addons.mozilla.org/extensions/moreinfo.php?id=1584&application=firefox
Yet another cool extension that helps you view the DOM values. This is more 'navigable' than the one that Firebug provides. It also has an evaluator, which helps you to navigate to the appropriate node and then start inspecting the values.

HTML scripting has never been as much fun before!