-
-
Notifications
You must be signed in to change notification settings - Fork 415
Convolutional Neural Networks
The aim is to design a simple convolutional Neural Network using TensorFlow. The tutorial is aimed to sketch a startup model to the the two following:
- Define an organization for the network architecture, training and evaluation phases.
- Provides a template framework for constructing larger and more complicated models.
A custom architecture is designed. The number of output units for the last fully-connected layer is equal to the number of classes
because a Softmax
has been implemented for the classification task. The implemented architecture is very similar to 'LeNet'__ although our architecture is implemented in a fully-convolutional fashion, i.e., there is no fully-connected layer and all fully-connected layers are transform to corresponding convolutional layer. Please refer to 'this link'__ for further details about converting fully-connected layers to convolutional ones and vice versa.
The source code is embedded in code folder.
The dataset that we work on that in this tutorial is the MNIST dataset probably the most famous dataset in computer vision because of its simplicity! The main dataset consist of 60000 training and 10000 test images. However there might be different setups for these images. The one we use is the same in the test set but we split the training set to 55000 images as train and 5000 images as validation set in the case that using cross-validation for determining some hyper-parameters is desired. The images are 28x28x1 which each of them represent a hand-written digit from 0 to 9. Since this tutorial is supposed to be ready-to-use, we provided the code to download and extract the MNIST data as a data object.
Thanks to TensorFlow its code is already written and is ready to use and its source code is available at this repository . The code for downloading and extracting MNIST dataset is as is as below:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", reshape=False, one_hot=True)
As conventional procedure, updating the gradient is done with batches of the data. Moreover Batch Normalization has been implemented for each convolutional layer. No Batch Normalization is done for the fully-connected layers. For all the convolutional and fully-connected layers, the drop-out has been used with the same parameter however this parameter can be customized for each layer in the code. Traditional AdamOptimizer
has been used.
Cross-entropy
loss function has been chosen for the cost of system. The definition is as follows:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
However by considering `TensorFlow official documention`__ it is numerically unstable. The reason can be due to the presence of the log. If the output of the network provide a very bad prediction and by normalizing that prediction we get zero for y value, then the loss goes to infinity and this is unstable. Another issue can be the explosion of the exponential. If the output of any of the neurons is large, since in the softmax we do the exponentiation, then the numerator and denominator of the softmax operation can be very large. So a trick can be add a number to all of the unscaled outputs. all the unit output values can be added by -max{fi{i=0,...,n}} which is the negative sign of maximum of all output values. For further reading refer to `CNN for Visual Recognition`__ Course by Stanford. Also please refer to `softmax_regression`__ for further details.
Instead the tf.nn.softmax_cross_entropy_with_logits
on the un-normalized logits (i.e., softmax_cross_entropy_with_logits is called on tf.matmul(x, W) + b), this function computes the Softmax activation internally which makes it more stable. It's good to take a look at the source code of TensorFlow
for that however there is a traditional idea to overcome this problem which is add an epsilon number with the absolute value of y and take it as y.
Introduction
Inauguration
Basics
Machine Learning Basics
Neural Networks