Introduction

This notebook follows Martin Gorner's session on deep learning (youtube, slide deck, google blog). It's very accessible, even for beginners, and I encourage you to watch it.

Based on his presentation, I demonstate how to create a simple 1-layer neural network to recognize handwritten numbers from 28x28 pixel images from the MNIST dataset.

In [1]:
# Show matplotlib output within the notebook
%matplotlib inline
In [2]:
# Required packages are tensorflow, numpy, and matplotlib
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
In [3]:
# Set pseudorandom generator seed to help reproducibility
import random
random.seed(0)
In [4]:
# Download the mnist dataset and save it to MNIST_data
# Initialize an mnist object with image labels converted into one-hot encoding (5 is [0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [5]:
# Test if mnist loaded
plt.imshow(mnist.train.next_batch(1)[0].reshape(28,28), cmap='gray')
Out[5]:
<matplotlib.image.AxesImage at 0x18196905f8>

Initialize the variables and the biases

In [6]:
# Create a tensor for the samples with batch size (None because it is unknown at this time),
# Dimensions of the grayscale image: (28, 28)
# Number of channels: 1 because grayscale)

# In the video the expected input array was a (28, 28) because images are 28x28 pixels
# But by default, input_data.read_data_sets() already flattens this into a single row of 784 (because 28*28 = 784)

# X = tf.placeholder(tf.float32, [None, 28, 28, 1])  # 28,28 input
X = tf.placeholder(tf.float32, [None, 784])          # 784,1 input
In [7]:
# Create a tensor for the weights
# The 28x28 pixel image will become a 784 element linear vector
# The layer will have 10 neurons

# W = tf.Variable(tf.zeros([784, 10]))  # in the video, initialized as zero, here I initialize using random floats
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
In [8]:
# Create a tensor for the biases
# Matrix computation L = X.W + b, where "+" means broadcast
# Each weight will receive the broadcasted bias, so there will be 10 of the same biases
b = tf.Variable(tf.zeros([10]))

Create the model

In [9]:
# Model
# Y = softmax(X.W + b)
# 
# Variable  Explanation, tensor shape in []
# --------  -------------------------------
# Y       : predictions, Y[100,10]
# softmax : activation function and will be applied line-by-line
# X       : image tensor, X[100, 784], minibatches of 100
# W       : weights, W[784,10], "." between X and W means matrix multiply
# b       : biases, b[10]

# In the video, tf.reshape was called because the X tensor was in the shape [100, 28, 28, 1] 
# and needed to be reshaped to (784,1) 
# Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
Y = tf.nn.softmax(tf.matmul(X, W) + b)

# Placeholder for correct answers in one-hot encoding
# These are known values to train with. Here, we use the label of each image
Y_ = tf.placeholder(tf.float32, [None, 10])

Train using gradient descent

In [10]:
# Loss function
# We use cross-entropy to as a measure to compare our prediction with the known value
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))  # from the video

# Below is from the tutorial https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
# tf.reduce_mean makes the cross-entropy value robust to changes in batch size.
# This means that you can keep the learning rate the same even if the batch size changes.
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(Y_ * tf.log(Y), reduction_indices=[1]))

# To train the neural network, we want to minimize cross-entropy between our predictions and the known values
# We use stochastic gradient descent to help us find the minimum

# To make sure we actually get close to the minimum, and not constantly overshoot it,
# we scale the gradient by a factor called the learning rate.
# Try experimenting by using different learning rates like 0.1, 0.03, 0.0005
optimizer = tf.train.GradientDescentOptimizer(0.003)

# The objective of the optimizer is to minimize the cross entropy
train_step = optimizer.minimize(cross_entropy)

Success metrics

In [11]:
# This part is optional and has nothing to do anymore with training a neural network
# This is solely for reporting statistics to track progress

# Compares he position with the highest values are equal in the predictions and the labels
# Remember that we are using one-hot encoding for both, so we use tf.argmax to find the positions in the vectors
is_correct = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_,1))

# % of correct answers found in the batch
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

Start training using TensorFlow

In [16]:
# Initialize all the variables and placeholders declared previously
# Remember that tensorflow does not immediately execute commands, but instead builds a representation first
# This part create a representation of the initialization process

# init = tf.initialize_all_variables()  # This method is now deprecated
init = tf.global_variables_initializer()
In [17]:
# To actually execute commands, we have to create a tensorflow session
sess = tf.Session()

# Pass init to actually initialize
sess.run(init)
In [18]:
# This part is not in the video.
# I use these lists to collect statistics to report later, similar to Martin's real-time charts in the video

# Statistics using training data
train_accuracy = []
train_cross_entropy = []

# Using testing data, which the neural network has never seen before
test_accuracy = []
test_cross_entropy = []
In [19]:
# There are 60,000 images in the MNIST training set
# Looping over 10000 times and retrieving 100 images at every iteration means that
# we would be able to use the entire training set at least once.
# Going over the entire training set means we have achieved 1 epoch
iterations = 10000
batch_size = 100

for i in range(1, iterations+1):
    # Load batch of images and correct answers (labels)
    batch_X, batch_Y = mnist.train.next_batch(batch_size)
    
    # Train using train_step
    # Remember to pass data to the placeholders X and Y_ by using a dictionary
    # X is the training data in [100,784,1] tensor and Y_ is the correct answers in [100, 10] tensor
    train_data = {X: batch_X, Y_: batch_Y}
    sess.run(train_step, feed_dict=train_data)
    
    # Report statistics and append to list
    # We do not train on accuracy or cross_entropy functions
    # We pass this to tensorflow in order to retrieve accuracy and cross entropy data after 1 round of training
    a, c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
    train_accuracy.append(a)
    train_cross_entropy.append(c)
    
    # Measure success on data that the model has never seen before, aka the test set
    if i % 100 == 0:
        test_data = {X: mnist.test.images, Y_: mnist.test.labels}
        a, c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
        test_accuracy.append(a)
        test_cross_entropy.append(c)

        # Print every 1000 iterations
        if i % 1000 == 0:
            print(i, a, c)
1000 0.9174 2983.4639
2000 0.9177 2882.19
3000 0.92 2809.4119
4000 0.9215 2855.806
5000 0.923 2729.1729
6000 0.9191 2809.4275
7000 0.9253 2714.0452
8000 0.922 2713.6929
9000 0.9232 2739.6128
10000 0.925 2701.851

Plot accuracy

In [38]:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(15,10))

text_x_pts = np.arange(99, len(train_accuracy), 100)

# Plot training accuracy and test accuracy
ax1.plot(train_accuracy, alpha=1, linewidth=0.1)
ax1.plot(text_x_pts, test_accuracy, alpha=1, linewidth=2)
ax1.grid(linestyle='-', color='#cccccc')
ax1.set_ylabel('% of correct answers in minibatch')
ax1.set_xlim(-100, 10100)

# Zoomed in version
ax2.plot(train_accuracy, alpha=1, linewidth=0.1)
ax2.plot(text_x_pts, test_accuracy, alpha=1, linewidth=2)
ax2.grid(linestyle='-', color='#cccccc')
ax2.set_ylabel('% of correct answers in minibatch')

ax2.set_ylim(0.85, 1.0)
Out[38]:
(0.85, 1.0)

Plot cross entropy

In [44]:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(15,10))

text_x_pts = np.arange(99, len(train_accuracy), 100)

ax1.plot(train_cross_entropy, alpha=1, linewidth=0.1)
ax1.plot(text_x_pts, np.array(test_cross_entropy)/100, alpha=1, linewidth=2)
ax1.grid(linestyle='-', color='#cccccc')
ax1.set_ylabel('cross-entropy per image')
ax1.set_xlim(-100, 10100)

ax2.plot(train_cross_entropy, alpha=1, linewidth=0.1)
ax2.plot(text_x_pts, np.array(test_cross_entropy)/100, alpha=1, linewidth=2)
ax2.grid(linestyle='-', color='#cccccc')
ax2.set_ylabel('cross-entropy per image')
ax2.set_ylim(0, 70)
Out[44]:
(0, 70)

Conclusions

We have trained a simple 1-layer neural network using TensorFlow. Even only after a little over 1 epoch of 10000 iterations with 100 images per batch, our simple network has achieved an accuracy of about 92%!

Now, try increasing the training length and see if this affects the accuracy.