So I finally found some time to continue my experiments with the data download from browser to the server.
This time my target was Orkut. I decided that I write a simple script to extract my Orkut profile and then display a sub-set of these fields in my own site using my own formatting.
I did not write a Greasemonkey script this time, but just used Firebug to write Javascript. Here is the browser side script:
var arrayToExtract = new Array('listdark', 'listlight'); for(var z=0;z<arrayToExtract.length;z++){ var elements = $$('.'+arrayToExtract[z]); // Just got lucky here. $$ is available! for(var i=0;i<elements.length;i++){ var item = elements[i].getElementsByTagName('p'); if(item[0] == undefined) continue; postData(item[0].innerHTML); postData(item[1].innerHTML); } } function postData(data){ var scriptElement = document.createElement('script'); scriptElement.setAttribute('src','http://buzypi.in/backup?data='+data+'&file=orkut&date='+Date()); document.body.appendChild(scriptElement); }
The script above posts the profile information one by one to the server and the server captures it and appends it in a file. The server side code is as follows:
<?php global $_REQUEST; $file_name = $_REQUEST['file']; $data = $_REQUEST['data']; $more = $_REQUEST['more']; $DIRECTORY = 'data'; $file_with_location = dirname(__FILE__).'/'.$DIRECTORY.'/'.$file_name; $file_handle = fopen($file_with_location,'a'); fwrite($file_handle,$data); if($more == "true") ; else fwrite($file_handle,"n"); $success_value = fclose($file_handle); echo "/*"; if($success_value === TRUE){ echo "Successfully appended: ".$data."<br/>"; if($more == "true"){ echo "Expect you to send more data"; } } else { echo "Failed to write data"; } echo "*/"; ?>
Guess what happened when I executed the script?
The data was appended to the file alright, but the ordering of the items was messed up in some places.
Here is a sample:
job description: work phone: I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace. career interests: ...
while the expected output was:
job description: I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace. work phone: career interests: ...
The job description content should have been received before ‘work phone’, but this was not the case.
So what is the solution?
There are 2 things I can think of:
1. Ensure that data posted is atomic.
2. Come up with a simple sliding window protocol arrangement between the browser and the server.
Solution 1 is not always feasible, because of the limits on GET URL size. In fact, we might need to split the body just so that it can be posted using GET’s. So the only solution that can take care of this is (2).
I will post more entries as I progress. Meanwhile, if you have any better solution to the problem, comment here.