# Deep Learning & Neural Networks
## Project 2 - Binary Encodings

In this project we will implement an autoencoder of the numbers 1-8 where the numbers are given as 1-hot encodings. e.g. $(1,0,0,0,0,0,0,0)$ for $1$ and $(0,1,0,0,0,0,0,0)$ for $2$, etc.

We will try to achieve a 3-dimensional represenation of this data. Clearly, we know this should be binary $(0, 0, 0)$ for $1$, $(0, 0, 1)$ for 2, etc. However, we will see if the neural network can work this out for itself; we expect it (almost) will except for perhaps assinging a different order to the binary numbers.

### Setup

We will load all the necessary libraries, which are just TensorFlow, Numpy and Matplotlib.

In [None]:
# Load TensorFlow
import tensorflow as tf
# Load numpy - adds MATLAB/Julia-style math to Python
import numpy as np
# Load matplotlib for plotting
%matplotlib inline
import matplotlib.pyplot as plt

The data is just an $8 \times 8$ identity matrix. Using matplotlib we can also plot a heatmap of it.

In [None]:
# Create the one-hot encodings
# Symmetric matrix, so doesn't really matter
# But for sanity, we'll think of row = number
data = np.eye(8).astype('float32')
# Plot it
plt.matshow(data, cmap=plt.cm.gray)

### Building the TensorFlow graph for our autoencoder
Let's create our computation graph
#### The encoder

Our encoder is a fully-conntected layer followed with a sigmoid, with no bias term. So mathematically we can write it as 

$$y = \sigma(W x)$$

where $y$ is the code.

In [None]:
INPUT_DIM = 8
CODE_DIM = 3

enc_weight = tf.Variable(tf.random_uniform([INPUT_DIM,CODE_DIM], -1.0, +1.0))
enc_input  = tf.matmul(data, enc_weight)
enc_output = tf.nn.sigmoid(enc_input)

#### The decoder

This is a fully-connected layer followed by softmax, again with no bias.

We'll be fancy here: let's use the same weights, just transposed! So mathematically it's

$$x' = softmax(W^\top y)$$

where $x'$ is the reconstructed output.

In [None]:
dec_weight = tf.transpose(enc_weight) #tf.Variable(tf.random_uniform([CODE_DIM,INPUT_DIM], -1.0, +1.0))
dec_weight = tf.Variable(tf.random_uniform([CODE_DIM,INPUT_DIM], -1.0, +1.0))
dec_input  = tf.matmul(enc_output,dec_weight)
dec_output = tf.nn.softmax(dec_input)

Because we can do this, let's summarize everything with math. Our autoencoder is basically the function

$$x' = softmax(W^\top \sigma(W x)) $$

and to train the autoencoder we need to solve the non-linear, non-convex optimization problem

$$\min_{W \in \mathbb{R}^{8 \times 3}} \left\lVert softmax(W^\top \sigma(W x)) - x \right\rVert_2^2$$

and we will do this with Gradient Descent, as usual.

#### The training operation

The error we want to minimize is given with the following tensorflow code that implements that sum of squared differences between the data and the decoder's output (this corresponds to the objective function of the above optimization problem):

In [None]:
error = tf.reduce_sum(tf.square(data - dec_output))

Like in the previous exercise, we will declare a gradient descent optimizer and run the initialization step.

In [None]:
optimizer = tf.train.GradientDescentOptimizer(0.1)
train = optimizer.minimize(error)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

What kind of output do we get right now with the network initialized to random weights? Let's plot this using matplotlib again

In [None]:
plt.matshow(sess.run(dec_output), cmap=plt.cm.gray)

Clearly it's not what we want.

Let's now train our autoencoder so it does roughly what it's supposed to! Again we call the train operation (a gradient descent step) many times from our ``session`` object:

In [None]:
# Run some gradient steps
errors = []
N_STEPS = 5000
for step in range(N_STEPS):
    cur_error, _ = sess.run((error,train))
    errors.append(cur_error)
    if step % 100 == 0:
        print step, cur_error
plt.plot(range(N_STEPS), errors, 'b-')

How well does our autoencoder reproduce the input now?

In [None]:
plt.matshow(sess.run(dec_output), cmap=plt.cm.gray)

Almost perfectly, and what does the coding look like? We can plot the intermediate layer of our network by running it via the ``Session`` object:

In [None]:
plt.matshow(sess.run(enc_output), cmap=plt.cm.gray)

We can round the codings and then print all the unique binary numbers... hopefully there are close to 8 of them!

In [None]:
np.unique(map(lambda l: int(''.join(map(str, l)), 2), np.round(sess.run(enc_output)).astype(int)))