Panda Quality: javascript

Showing posts with label javascript. Show all posts

Sunday, July 29, 2012

Integrating LimeSurvey with Amazon Mechanical Turk (or anything that requires parsing URL GET params with JavaScript)

I resolved from writing my own suverys framework into using LimeSurvey which seemed to be the best freely available hosting. It is written in PHP and JQuery and requires a server for a host - which is something I could supply.

This solution was tested with:

WampServer 2.2;
LimeSurvey 1.9.2;
Browser/OS:

On Windows 7: Internet Explorer 9; FireFox 13; Google Chrome 20.0.1132.57 m.

Installing, configuring and start using LimeSurvey

I built it from scratch.

Thus, for WAMP I used WampServer: http://www.wampserver.com/en/

Downloaded Livesurvey from here: http://www.limesurvey.org/en/stable-release

Unzipped, moved the limesurvey directory to the WampServer www directory.

Browsed to http://localhost/limesurvey . Continued with the installation from there.

When LimeSurvey is installed, login as an admin. The l/p values are stored in the config.php file in LimeSurveuy's root directory. The defaults are admin/password.

Now. Why are we here?

I would like LimeSurvey to

Read the assignmentId parameter of the URL. This is a part of the URL added by Mechanical Turk to any HIT page it refers to.
Store the assignmentId as one of the answers to the questionnaire. (for further inspection before I actually pay people).
Refer workers who finished the questionnaire to a confirmation link, which adds them to the list of people I should (consider) paying to.

The Steps

1) Change LimeSurvey's Global Settings \ Security \ Filter HTML for XSS to No.

2) Create a new Survey, QuestionsGroup, Question. Go to the edit question screen. The question type should be set to "Short free text" (unless you need something different). Pick a Question Code now. It can be anything not used again in your survey. In the example below, the Question Code is 'assignmentId' (without the quotes of course).

3) Click on the Source button. This will disable rich-text editing, and will take anything you'd print in as
raw code to be added into the questionnaire.

4) After you clicked on the Source button, paste the following code:

<script type="text/javascript">
// Adapted by guy4261 from: http://papermashup.com/read-url-get-variables-withjavascript/
function getUrlVars() {
var vars = new Array();
var parts = window.location.href.replace(/[?&]+([^=&]+)=([^&]*)/gi,
function(m,key,value) {
vars[key] = value;
});
return vars;
}

//$(document).ready(function() {
$('#{SGQ}').val(getUrlVars()["assignmentId"]);
//});
</script>
<style type="text/css">
#question{assignmentId.qid}{
display: none;
}
</style>

5) Go to the Survey's Edit survey text elements page. Scroll down to the End URL field. Enter the following URL:
(For Debug:) http://www.google.com/search?q={assignmentId}
(For Release:) http://workersandbox.mturk.com/mturk/externalSubmit?finished=true&assignmentId={assignmentId}

Further Customization Notes

4.1) Note that you need a specific value. In my case it's assignmentId (assigned by Mechanical Turk); in a different system that could've been a completely different parameter(s).

4.2) After the script tag is closed, comes the style tag. By choosing #question1 and making it invisible, we prevent the user from changing the assigned value. The original tutorials suggest hiding a question by tampering with the template.css file, but at least in Chrome this works just as well. However, the selector should be #questionQQ where QQ is the question ID. In the example shown here, while the SGQA (http://docs.limesurvey.org/SGQA+identifier) is 22456X1X1, the question's ID is simply 1. Supposedly, question codes can replace this values starting with LimeSurvey 1.92, but this is what I have installed and I just couldn't make it work.

5) In the End URL, use the macro {questionCode} (in our case: {assignmentId}) to include the collected assignmentId in the reference URL to Mechanical Turk's confirmation page. There is probably a nicer way to make it work with JavaScript auto-forwarding etc. but that's beyond the scope of this tutorial. This also means that the assignmentId is saved along with the survey's results. However, from what I understood, this is useful to avoid paying people who did not feel the questionnaire properly.

Explanations of the Solutions' Steps

4.1) You should be aware that LimeSurvey uses lots of JQuery, hence the $ selectors in this code.
4.2) Note that in the original suggestions across the LimeSurvey forum, the original suggestion was to run
the code as part of a custom document.ready function.
4.3) The function that parses the URL variables can be replaced; I took this one by papermashup: http://papermashup.com/read-url-get-variables-withjavascript/
4.4) {SGQ} is a the LimeSurvey macro for the current question. $('#{SGQ}').val('FOO') simply populates the answer with FOO.

Thursday, July 5, 2012

Customizing Google Docs Forms - Part 3

(This is the third part. Part 1. Part 2.)

Time For Some Code

Save the Google Docs Form page (viewform) to a local file on c:\wamp\www\viewform.php .

Changed the form action to point to a local page. viewpost.php:

<form action="viewpost.php?assignmentId=dog" method="POST" id="ss-form">

When I browse to http://localhost/viewform.php , I get a page looking exactly like my Google Docs Form, thanks to the fact that the CSS files was saved with the page. I created a page viewpost.php with the following code:

<?php
print("<pre>");
print_r($_POST);
print_r($_GET);
print("</pre>");
?>

Finally, I entered the form on localhost and submitted it. Here is the output:

Array
(
    [entry_68_single_yodawg] => 
    [entry_63_single] => 
    [pageNumber] => 0
    [backupCache] => 
    [submit] => Submit
)

Array
(
    [assignmentId] => dog
)

What can we learn from this?

PHP makes accessing POST or GET (URL) parameters very easy. As easy as $_POST['param-name'].
For every input element in the original form, it's name attribute is used as the parameter name.
For radio button groups in which no selection was made, the matching parameter does not appear in the $_POST variable.
And after some validations against the code - dots are not allowed in the parameter names, so they are converted to underscores in the $_POST array. "entry_63_single" in $_POST is originally "entry.63.single" in the HTML.

So what now?

First, we need to define a protocol for describing a questionnaire. In Google Docs, you get a WYSIWYG editor and things are very easy. In this project's case, I am not interested in creating such an editor. My employers did not allocate such amounts of money for this project - and something simpler would suffice. I am thinking about an Excel spreadsheet which they'd write according to my guidelines. I will convert the spreadsheet into a TSV file which Python will handle.

We'll use this protocol to generate the questionnaire. Again, that means two php pages:

One just for displaying the form. That's not so hard, we've sort of got it done.

The second is for validating the form, insert the data into the SQL server if it's good or display it with error boxes if necessary. Both should have their HTML elements tagged in accordance with Google Docs' CSS file so they would look good. (The distances I'd walk to avoid styling on my own are immense, I assure you.)

Another thing we'll need to generate is SQL code for defining the table properly. Creating the database could also be nice but that's less important for me right now. Defining the table, though, is crucial.

The display, validation , insertion and table definition are all dependant on the protocol. Therefore, my solution is to iterate over the protocol-based input file (probably a TSV adapted from excel) and for each row, parse it and generate PHP code for each of these targets.

On that note, creating an interface for vieweing or downloading the results can be nice. wamp server comes with phpmyadmin installed, but something more friendly for the researchers is always a pleasure.

Viewing

The viewer was the easiest part to make and you can find it here: https://gist.github.com/3049801

It's mostly based on this post: http://www.anyexample.com/programming/php/php_mysql_example__display_table_as_html.xml

This php script

Pulls the table data as a TSV, independantly of the table's schema.
Makes your browser automatically download it.
Can get dbname and table as GET parameters (in the URL).

It works when the connection details (server address, username, password) are correctly set; and when the mysql server itself has permissions for that user over the requested table.

Since it's already 02:25, I think I'll go to sleep now... Later this week I'll be working on creating prepared SQL statements and binding them with the incoming POST parameters.

Happy hacking!

Tuesday, July 3, 2012

Customizing Google Docs Forms - Part 2

(This is the second part. Part 1.)

Google Docs Forms - Exploring the Front-End

Let's recap. What we are about to do in this part is to save a Google Docs Form entirely (HTML+CSS), write PHP scripts which will replace Google's own back-end, and set the form to work with these scripts.

This is the form in question:
https://docs.google.com/spreadsheet/viewform?formkey=dDJ5VDB3aURJSVN3NThjdmE2M1pLbGc6MQ#gid=0

Note that it's actually a link to https://docs.google.com/spreadsheet/viewform with two parameters in its URL - formkey and gid. Navigating directly to the viewform script returns an error; we can't really read this from Google. Which is why I decided to use PHP for this job - it just reminded me of how PHP behaves, always responding to a GET request with generated HTML.

This is a questionnaire with 50 questions. Working with Chrome, you can hit the F12 key to show the developer window. From the Console tab, you can enter this code to check all radio buttons to 1:

_radios = document.getElementsByClassName('ss-q-radio');for(i = 0; i < _radios.length; i++){ _radio = _radios[i]; if (_radio.value == "1") { _radio.checked = true; }}

(this post is written after I examined the page's source-code, which is how I got this code written.)

Nevertheless, let's not do that right away; go to the bottom of the page and submit the form as it is. You'll get to an error screen:

https://docs.google.com/spreadsheet/formResponse?formkey=dDJ5VDB3aURJSVN3NThjdmE2M1pLbGc6MQ&ifq

which is the same screen you'll get when you submit the form correctly.

So now, it's clear the two scripts are working here: viewform and formResponse. The first displays the form initially; the second validates the input, and either displays the form with the necessary error marks or simply displays a success message. Google being our model for good design, this is the template we are about to follow.

A third and unseen factor here is the SQL table. In Google Docs, creating a form automatically creates the linked Google Docs Spreadsheet which contains all the entries of people who filled the form. This is a very user-friendly view of the data. Unfortunately for our researcher, I will not supply such a nice interface. I can use the Google Docs spreadsheet, though

Into the Form's Source

Viewing the source-code of our form, a couple of points pop out.

First of all, for an automatically-generated page, the code is very need. It contains a single HTML <form>, refers to only one CSS stylesheet, and contains just one snippet of Javascript code, which is here below.

(function() {var divs = document.getElementById('ss-form').getElementsByTagName('div');var numDivs = divs.length;for (var j = 0; j < numDivs; j++) {if (divs[j].className == 'errorbox-bad') {divs[j].lastChild.firstChild.lastChild.focus();return;}}for (var i = 0; i < numDivs; i++) {var div = divs[i];if (div.className == 'ss-form-entry' &&div.firstChild &&div.firstChild.className == 'ss-q-title') {div.lastChild.focus();return;}}})();

What this snippet does, is to focus the page on either the title of the Form or on the highest errorbox (returned by formResponse when a required field is missing). It is placed near the end of the page's <body> and is designated to run as soon as the rest of the page has been parsed.

These kind of things are exactly the reason why I chose to scrape everything from Google Docs. I've done a couple of relatively small projects regarding web design and have come very quickly to the conclusion, that it's very hard to write HTML and CSS which display well on all browsers and on all platforms. I take this code from Google since I trust them. If anyone get do it right - I believe it is them. When I go to the Form page from Chrome, I believe I get the same page I'd get when I'd browse from either IE or Firefox; and it will display just right on all three.

To sum everything up, the design will should be as follows:

viewform - will display the initial form. While Google's version takes the formkey parameter to identify the form, we can serve static HTML. The <form> element's action should be our formResponse script.
formResponse - given the form data by the POST method, this page should:

If all required fields have been filled: save the data to the SQL. Display a link or re-direct to Turk's assignment approval URL...
Otherwise, if some inputs are missing: display the questionnaire again. If we choose to highlight the problematic parts (just as Google Docs does), that means some tampering with the returned HTML is required - we can't just serve static content. The effort is very close to generating the entire page content dynamically according to the questionnaire's structure. Perhaps giving up on this question-specific error highlighting is the wise thing to do right now...

A small note I forgot to mention on the previous post. I've mentioned the mandatory reason for writing our own code is to hook up with Turk. But now that we have control over the form using Javascript et al., we can do some other cool things. For instance - a Youtube video is being showed to our test subjects. We can use the Youtube Javascript API to make the form's submit button active only after the movie has been displayed. Also, we can verify the time it took our test subjects to complete the form, which is useful for making sure they didn't do it too fast. That's a nice benefit.

So what's next?

Write a PHP page which returns a form, which redirects its data to a second PHP page.
Write the second PHP page which can observe the data sent by the first and decided if it's OK or not.
If data is missing, have the second PHP page display an error message.
If the data is OK, write it to an SQL table and display a link to Turk's confirmation URL.

So this is it - this is the point where I should really start writing some code. The default location for WampServer's index page is C:\wamp\www and this is exactly where I am heading.

Wait for the next post for some code examples.

Customizing Google Docs Forms - Part 1

This post deals with implementing a Google Docs Forms -like system on a local server. This allows customizing it with Javascript in order to get it working with Mechanical Turk. This also means that this tutorial is irrelevant if you do not have (or do not need) access to web and SQL servers.

What we wanted...

...was to use Google Docs for a research with subjects from Amazon's Mechanical Turk (Turk from now on). While a Google Docs Form is very rigid and cannot be extended or even toyed with by Javascript (due to the "Cross-Domain Error"), we didn't think we'll need any enhancements.

Oh, how wrong we were!

How Mechanical Turk works (the interesting parts)

You create your website: this is the task you want people to perform. Naturally, this website has a link.
You post this task on Turk (supplying the link); people can now see it.
What people (providers) see when they enter the task is a two-frames page: the top frame is Turk's header (banners etc). The bottom frame is your task - the page your link points to.
When people finish your task, Turk recognizes this and adds their data to your management interface. That allows you to pay them quickly via Turk's interface (the alternative being that they'd supply you with their paypal account details and you'd have to do all the arrangements in order to pay them - nothing you'd like very much...

Sounds easy, eh? We can imagine how 1,2 and 3 are accomplished. But how exactly does 4 work?

Oh.

When you supply Turk with your link, Turk posts it with an additional parameter: assignmentId. That is, if your link points to

http://www.foo.com/experiment.php

Then the bottom frame's src is

http://www.foo.com/experiment.php?assignmentId=<whatever>

This is done in an intelligent way - if your page is

http://www.foo.com/experiment.php?bar=baz

Then the frame's src will be

http://www.foo.com/experiment.php?bar=baz&assignmentId=<whatever>

Your page should parse this assignmentId param from the page's URL, so that when the task is finished, the worker's browser will navigate to

http://workersandbox.mturk.com/mturk/externalSubmit?finished=true&assignmentId=<whatever>

And this lets Turk know the assignment has been completed, allowing point number 4 from the above list to actually take place.

Note that the notification is per assignmentId - which is some combination of the worker and your task. That means you can only know that a certain performance of the task has been completed. You cannot know which worker did it; workers' privacy is protected. Of course you can define in advance that Turk will only display your task to workers of a certain criteria (age, sex, location...) - but workers can lie when they register into Turk so it's not really effective.

So what does that mean?

The summary of the former section is this. If you'd like to use Turk, your task's page should be able to parse a parameter off it's URL. That means Javascript. And since we can't incorporate Javascript into a Google Docs Form, we have to find a way around.

The solution: steal everything we can from Google Docs, complete the rest with PHP

Create a Form using Google Docs.
Collect HTML+CSS. Use on own website.
Redirect form action to self-written PHP which writes results to SQL server.

This will be done on a Windows 7 machine, using Notepad++ for editing, WampServer for deploying, and Google Chrome for testing. Any additional programming will done using Python 2.x .

Advantages

Saves us from writing effective + cross-browsers HTML+CSS (faith in Google required).
Allows form compser (=non-programming MSc student) to edit the form in Google's comfortable UI; we'll just drink the code from there, saving the headache of writing a nice front-end.

Disadvantages

Replacing Google Docs's code with ours; never a good idea...
Requires a stable web and SQL servers. This is something we wanted to avoid as this experiment takes place in a University, which means access from outside is limited and bureaucracy follows. Oh well...

I'll describe how all of that was done in the next post.