Tuesday, July 3, 2012

Customizing Google Docs Forms - Part 2

(This is the second part. Part 1.)

Google Docs Forms - Exploring the Front-End

Let's recap. What we are about to do in this part is to save a Google Docs Form entirely (HTML+CSS), write PHP scripts which will replace Google's own back-end, and set the form to work with these scripts.

This is the form in question:
https://docs.google.com/spreadsheet/viewform?formkey=dDJ5VDB3aURJSVN3NThjdmE2M1pLbGc6MQ#gid=0

Note that it's actually a link to https://docs.google.com/spreadsheet/viewform with two parameters in its URL - formkey and gid. Navigating directly to the viewform script returns an error; we can't really read this from Google. Which is why I decided to use PHP for this job - it just reminded me of how PHP behaves, always responding to a GET request with generated HTML.

This is a questionnaire with 50 questions. Working with Chrome, you can hit the F12 key to show the developer window. From the Console tab, you can enter this code to check all radio buttons to 1:
_radios = document.getElementsByClassName('ss-q-radio');for(i = 0; i < _radios.length; i++){ _radio = _radios[i]; if (_radio.value == "1") { _radio.checked = true; }}
(this post is written after I examined the page's source-code, which is how I got this code written.)

Nevertheless, let's not do that right away; go to the bottom of the page and submit the form as it is. You'll get to an error screen:

https://docs.google.com/spreadsheet/formResponse?formkey=dDJ5VDB3aURJSVN3NThjdmE2M1pLbGc6MQ&ifq

which is the same screen you'll get when you submit the form correctly.

So now, it's clear the two scripts are working here: viewform and formResponse. The first displays the form initially; the second validates the input, and either displays the form with the necessary error marks or simply displays a success message. Google being our model for good design, this is the template we are about to follow.

A third and unseen factor here is the SQL table. In Google Docs, creating a form automatically creates the linked Google Docs Spreadsheet which contains all the entries of people who filled the form. This is a very user-friendly view of the data. Unfortunately for our researcher, I will not supply such a nice interface. I can use the Google Docs spreadsheet, though

Into the Form's Source

Viewing the source-code of our form, a couple of points pop out.

First of all, for an automatically-generated page, the code is very need. It contains a single HTML <form>, refers to only one CSS stylesheet, and contains just one snippet of Javascript code, which is here below.


      (function() {var divs = document.getElementById('ss-form').getElementsByTagName('div');var numDivs = divs.length;for (var j = 0; j < numDivs; j++) {if (divs[j].className == 'errorbox-bad') {divs[j].lastChild.firstChild.lastChild.focus();return;}}for (var i = 0; i < numDivs; i++) {var div = divs[i];if (div.className == 'ss-form-entry' &&div.firstChild &&div.firstChild.className == 'ss-q-title') {div.lastChild.focus();return;}}})();
 What this snippet does, is to focus the page on either the title of the Form or on the highest errorbox (returned by formResponse when a required field is missing). It is placed near the end of the page's <body> and is designated to run as soon as the rest of the page has been parsed.

These kind of things are exactly the reason why I chose to scrape everything from Google Docs. I've done a couple of relatively small projects regarding web design and have come very quickly to the conclusion, that it's very hard to write HTML and CSS which display well on all browsers and on all platforms. I take this code from Google since I trust them. If anyone get do it right - I believe it is them. When I go to the Form page from Chrome, I believe I get the same page I'd get when I'd browse from either IE or Firefox; and it will display just right on all three.

To sum everything up, the design will should be as follows:

  • viewform - will display the initial form. While Google's version takes the formkey parameter to identify the form, we can serve static HTML. The <form> element's action should be our formResponse script.
  • formResponse - given the form data by the POST method, this page should:
    • If all required fields have been filled: save the data to the SQL. Display a link or re-direct to Turk's assignment approval URL...
    • Otherwise, if some inputs are missing: display the questionnaire again. If we choose to highlight the problematic parts (just as Google Docs does), that means some tampering with the returned HTML is required - we can't just serve static content. The effort is very close to generating the entire page content dynamically according to the questionnaire's structure. Perhaps giving up on this question-specific error highlighting is the wise thing to do right now...
A small note I forgot to mention on the previous post. I've mentioned the mandatory reason for writing our own code is to hook up with Turk. But now that we have control over the form using Javascript et al., we can do some other cool things. For instance - a Youtube video is being showed to our test subjects. We can use the Youtube Javascript API to make the form's submit button active only after the movie has been displayed. Also, we can verify the time it took our test subjects to complete the form, which is useful for making sure they didn't do it too fast. That's a nice benefit.

So what's next?

  1. Write a PHP page which returns a form, which redirects its data to a second PHP page.
  2. Write the second PHP page which can observe the data sent by the first and decided if it's OK or not.
  3. If data is missing, have the second PHP page display an error message.
  4. If the data is OK, write it to an SQL table and display a link to Turk's confirmation URL.
So this is it - this is the point where I should really start writing some code. The default location for WampServer's index page is C:\wamp\www and this is exactly where I am heading.

Wait for the next post for some code examples.

Customizing Google Docs Forms - Part 1

This post deals with implementing a Google Docs Forms -like system on a local server. This allows customizing it with Javascript in order to get it working with Mechanical Turk. This also means that this tutorial is irrelevant if you do not have (or do not need) access to web and SQL servers.

What we wanted...

...was to use Google Docs for a research with subjects from Amazon's Mechanical Turk (Turk from now on). While a Google Docs Form is very rigid and cannot be extended or even toyed with by Javascript (due to the "Cross-Domain Error"), we didn't think we'll need any enhancements.

Oh, how wrong we were!

How Mechanical Turk works (the interesting parts)


  1. You create your website: this is the task you want people to perform. Naturally, this website has a link.
  2. You post this task on Turk (supplying the link); people can now see it.
  3. What people (providers) see when they enter the task is a two-frames page: the top frame is Turk's header (banners etc). The bottom frame is your task - the page your link points to.
  4. When people finish your task, Turk recognizes this and adds their data to your management interface. That allows you to pay them quickly via Turk's interface (the alternative being that they'd supply you with their paypal account details and you'd have to do all the arrangements in order to pay them - nothing you'd like very much...
Sounds easy, eh? We can imagine how 1,2 and 3 are accomplished. But how exactly does 4 work?

Oh.

When you supply Turk with your link, Turk posts it with an additional parameter: assignmentId.  That is, if your link points to

http://www.foo.com/experiment.php

Then the bottom frame's src is

http://www.foo.com/experiment.php?assignmentId=<whatever>

This is done in an intelligent way - if your page is 

http://www.foo.com/experiment.php?bar=baz

Then the frame's src will be 

http://www.foo.com/experiment.php?bar=baz&assignmentId=<whatever>



Your page should parse this assignmentId param from the page's URL, so that when the task is finished, the worker's browser will navigate to

http://workersandbox.mturk.com/mturk/externalSubmit?finished=true&assignmentId=<whatever>

And this lets Turk know the assignment has been completed, allowing point number 4 from the above list to actually take place.

Note that the notification is per assignmentId - which is some combination of the worker and your task. That means you can only know that a certain performance of the task has been completed. You cannot know which worker did it; workers' privacy is protected. Of course you can define in advance that Turk will only display your task to workers of a certain criteria (age, sex, location...) - but workers can lie when they register into Turk so it's not really effective.

So what does that mean?

The summary of the former section is this. If you'd like to use Turk, your task's page should be able to parse a parameter off it's URL. That means Javascript. And since we can't incorporate Javascript into a Google Docs Form, we have to find a way around.

The solution: steal everything we can from Google Docs, complete the rest with PHP

  1. Create a Form using Google Docs.
  2. Collect HTML+CSS. Use on own website.
  3. Redirect form action to self-written PHP which writes results to SQL server.
This will be done on a Windows 7 machine, using Notepad++ for editing, WampServer for deploying, and Google Chrome for testing. Any additional programming will done using Python 2.x .

Advantages

  1. Saves us from writing effective + cross-browsers HTML+CSS (faith in Google required).
  2. Allows form compser (=non-programming MSc student) to edit the form in Google's comfortable UI; we'll just drink the code from there, saving the headache of writing a nice front-end.

Disadvantages

  1. Replacing Google Docs's code with ours; never a good idea...
  2. Requires a stable web and SQL servers. This is something we wanted to avoid as this experiment takes place in a University, which means access from outside is limited and bureaucracy follows. Oh well...

I'll describe how all of that was done in the next post.