## Introduction¶

This notebook follows Martin Gorner's session on deep learning (youtube, slide deck, google blog). It's very accessible, even for beginners, and I encourage you to watch it.

Based on his presentation, I demonstate how to create a simple 1-layer neural network to recognize handwritten numbers from 28x28 pixel images from the MNIST dataset.

```
# Show matplotlib output within the notebook
%matplotlib inline
```

```
# Required packages are tensorflow, numpy, and matplotlib
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
```

```
# Set pseudorandom generator seed to help reproducibility
import random
random.seed(0)
```

```
# Download the mnist dataset and save it to MNIST_data
# Initialize an mnist object with image labels converted into one-hot encoding (5 is [0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```

```
# Test if mnist loaded
plt.imshow(mnist.train.next_batch(1)[0].reshape(28,28), cmap='gray')
```

## Initialize the variables and the biases¶

```
# Create a tensor for the samples with batch size (None because it is unknown at this time),
# Dimensions of the grayscale image: (28, 28)
# Number of channels: 1 because grayscale)
# In the video the expected input array was a (28, 28) because images are 28x28 pixels
# But by default, input_data.read_data_sets() already flattens this into a single row of 784 (because 28*28 = 784)
# X = tf.placeholder(tf.float32, [None, 28, 28, 1]) # 28,28 input
X = tf.placeholder(tf.float32, [None, 784]) # 784,1 input
```

```
# Create a tensor for the weights
# The 28x28 pixel image will become a 784 element linear vector
# The layer will have 10 neurons
# W = tf.Variable(tf.zeros([784, 10])) # in the video, initialized as zero, here I initialize using random floats
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
```

```
# Create a tensor for the biases
# Matrix computation L = X.W + b, where "+" means broadcast
# Each weight will receive the broadcasted bias, so there will be 10 of the same biases
b = tf.Variable(tf.zeros([10]))
```

## Create the model¶

```
# Model
# Y = softmax(X.W + b)
#
# Variable Explanation, tensor shape in []
# -------- -------------------------------
# Y : predictions, Y[100,10]
# softmax : activation function and will be applied line-by-line
# X : image tensor, X[100, 784], minibatches of 100
# W : weights, W[784,10], "." between X and W means matrix multiply
# b : biases, b[10]
# In the video, tf.reshape was called because the X tensor was in the shape [100, 28, 28, 1]
# and needed to be reshaped to (784,1)
# Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
Y = tf.nn.softmax(tf.matmul(X, W) + b)
# Placeholder for correct answers in one-hot encoding
# These are known values to train with. Here, we use the label of each image
Y_ = tf.placeholder(tf.float32, [None, 10])
```

## Train using gradient descent¶

```
# Loss function
# We use cross-entropy to as a measure to compare our prediction with the known value
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y)) # from the video
# Below is from the tutorial https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
# tf.reduce_mean makes the cross-entropy value robust to changes in batch size.
# This means that you can keep the learning rate the same even if the batch size changes.
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(Y_ * tf.log(Y), reduction_indices=[1]))
# To train the neural network, we want to minimize cross-entropy between our predictions and the known values
# We use stochastic gradient descent to help us find the minimum
# To make sure we actually get close to the minimum, and not constantly overshoot it,
# we scale the gradient by a factor called the learning rate.
# Try experimenting by using different learning rates like 0.1, 0.03, 0.0005
optimizer = tf.train.GradientDescentOptimizer(0.003)
# The objective of the optimizer is to minimize the cross entropy
train_step = optimizer.minimize(cross_entropy)
```

## Success metrics¶

```
# This part is optional and has nothing to do anymore with training a neural network
# This is solely for reporting statistics to track progress
# Compares he position with the highest values are equal in the predictions and the labels
# Remember that we are using one-hot encoding for both, so we use tf.argmax to find the positions in the vectors
is_correct = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_,1))
# % of correct answers found in the batch
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
```

## Start training using TensorFlow¶

```
# Initialize all the variables and placeholders declared previously
# Remember that tensorflow does not immediately execute commands, but instead builds a representation first
# This part create a representation of the initialization process
# init = tf.initialize_all_variables() # This method is now deprecated
init = tf.global_variables_initializer()
```

```
# To actually execute commands, we have to create a tensorflow session
sess = tf.Session()
# Pass init to actually initialize
sess.run(init)
```

```
# This part is not in the video.
# I use these lists to collect statistics to report later, similar to Martin's real-time charts in the video
# Statistics using training data
train_accuracy = []
train_cross_entropy = []
# Using testing data, which the neural network has never seen before
test_accuracy = []
test_cross_entropy = []
```

```
# There are 60,000 images in the MNIST training set
# Looping over 10000 times and retrieving 100 images at every iteration means that
# we would be able to use the entire training set at least once.
# Going over the entire training set means we have achieved 1 epoch
iterations = 10000
batch_size = 100
for i in range(1, iterations+1):
# Load batch of images and correct answers (labels)
batch_X, batch_Y = mnist.train.next_batch(batch_size)
# Train using train_step
# Remember to pass data to the placeholders X and Y_ by using a dictionary
# X is the training data in [100,784,1] tensor and Y_ is the correct answers in [100, 10] tensor
train_data = {X: batch_X, Y_: batch_Y}
sess.run(train_step, feed_dict=train_data)
# Report statistics and append to list
# We do not train on accuracy or cross_entropy functions
# We pass this to tensorflow in order to retrieve accuracy and cross entropy data after 1 round of training
a, c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
train_accuracy.append(a)
train_cross_entropy.append(c)
# Measure success on data that the model has never seen before, aka the test set
if i % 100 == 0:
test_data = {X: mnist.test.images, Y_: mnist.test.labels}
a, c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
test_accuracy.append(a)
test_cross_entropy.append(c)
# Print every 1000 iterations
if i % 1000 == 0:
print(i, a, c)
```

## Plot accuracy¶

```
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(15,10))
text_x_pts = np.arange(99, len(train_accuracy), 100)
# Plot training accuracy and test accuracy
ax1.plot(train_accuracy, alpha=1, linewidth=0.1)
ax1.plot(text_x_pts, test_accuracy, alpha=1, linewidth=2)
ax1.grid(linestyle='-', color='#cccccc')
ax1.set_ylabel('% of correct answers in minibatch')
ax1.set_xlim(-100, 10100)
# Zoomed in version
ax2.plot(train_accuracy, alpha=1, linewidth=0.1)
ax2.plot(text_x_pts, test_accuracy, alpha=1, linewidth=2)
ax2.grid(linestyle='-', color='#cccccc')
ax2.set_ylabel('% of correct answers in minibatch')
ax2.set_ylim(0.85, 1.0)
```

## Plot cross entropy¶

```
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(15,10))
text_x_pts = np.arange(99, len(train_accuracy), 100)
ax1.plot(train_cross_entropy, alpha=1, linewidth=0.1)
ax1.plot(text_x_pts, np.array(test_cross_entropy)/100, alpha=1, linewidth=2)
ax1.grid(linestyle='-', color='#cccccc')
ax1.set_ylabel('cross-entropy per image')
ax1.set_xlim(-100, 10100)
ax2.plot(train_cross_entropy, alpha=1, linewidth=0.1)
ax2.plot(text_x_pts, np.array(test_cross_entropy)/100, alpha=1, linewidth=2)
ax2.grid(linestyle='-', color='#cccccc')
ax2.set_ylabel('cross-entropy per image')
ax2.set_ylim(0, 70)
```

## Conclusions¶

We have trained a simple 1-layer neural network using TensorFlow. Even only after a little over 1 epoch of 10000 iterations with 100 images per batch, our simple network has achieved an accuracy of about 92%!

Now, try increasing the training length and see if this affects the accuracy.