Monday, May 18, 2015

Learning Torch7 - Part 2: The Basic Tutorial

From this part of the page:
https://github.com/torch/torch7/wiki/Cheatsheet#newbies

I got to the tutorials part:
https://github.com/torch/torch7/wiki/Cheatsheet#tutorials-demos-by-category

And to the first tutorial:
http://code.madbits.com/wiki/doku.php

And to its first part, about the basics:
http://code.madbits.com/wiki/doku.php?id=tutorial_basics

The tutorial is straightforward, but ends with a request to normalize the mnist train data.

This is the code I came up with after a very painful realization.

The realization is what you see on line 4:

train.data = train.data:type(torch.getdefaulttensortype())


It appears that the data is loaded by torch not in tables or torch tensors but in something called "userdata". This is the type assigned to things coming from the dark depths of C / C++ interops with Lua. This userdata had a particular interesting feature: its contents were forced (through wrapping) to be in the range [0, 256) . So storing -26, for instance, would result in the value 256 - 26 = 230 to appear in the data. So casting was the first step to gain back my sanity in this case.

After casting back to torch tensors can you use :mean(), :std() and other tensor methods which make this code short and quick.

By the way - this normalization is called "z-score": http://en.wikipedia.org/wiki/Standard_score
I didn't quite catch the right way to describe the whole act in proper English, but that's what we are doing here (subtracting the mean and dividing by the standard deviation).

More useful things learned along the way:

for k,v in pairs(o) do -- or pairs(getmetatable(o))
   print(k, v)
end

is the equivalent of Python's dir(o) .

I also started working with this IDE: http://studio.zerobrane.com/

Moving on with the tutorials, the supervised learning will be my next post's subject.

Learning Torch7. Part 1

(part 1, aka: "The First Circle")

My entry point is this:
https://github.com/torch/torch7/wiki/Cheatsheet#newbies

I've read this page cover-to-cover and obtained a machine on which torch was installed. Installing it sounds like a nightmare and I wonder if the performance over a docker container would be the same, making the installation seem like a bad dream. So - read is step 1, installation is step 2.

Step 3 is according to the Newbies training course is to Learn Lua in 15 Minutes:
http://tylerneylon.com/a/learn-lua/

Notable comments:
  • nil is the local None/null/void/undefined.
  • do/end wrap blocks (just like { and } would in other languages)
  • There is no ++, += operators, so  n = n+1  is the way to go.
  • == , ~= for equal/nonequal test.
  • .. is string communication
  • Anything undefined evaluates to nil. (So you can type 'lkgjadsgjdas' into the interpreter without getting an error).
  • Only nil and false are considered false. 0 is true!
How to create tables:
-- Literal notation for any (non-nil) value as key: t = {key1 = 'value1', key2 = false}
u = {['@!#'] = 'qbert', [{}] = 1729, [6.28] = 'tau'} print(u[6.28]) -- prints "tau"
Matching keys within tables: -- Key matching is basically by value for numbers -- and strings, but by identity for tables. a = u['@!#'] -- Now a = 'qbert'. b = u[{}] -- We might expect 1729, but it's nil: -- b = nil since the lookup fails. It fails -- because the key we used is not the same object -- as the one used to store the original value. So -- strings & numbers are more portable keys.

Python dir() equivalent
print(_G) ~~ dir()

Using tables as lists/arrays:
-- List literals implicitly set up int keys:
v = {'value1', 'value2', 1.21, 'gigawatts'} for i = 1, #v do -- #v is the size of v for lists. print(v[i]) -- Indices start at 1 !! SO CRAZY! end -- A 'list' is not a real type. v is just a table -- with consecutive integer keys, treated as a list.

The other meaningful parts, to be read thoroughly in the tutorial, are the metatables parts.