The goal of this project is to construct a convolutional neural network capable of classifying handwritten numbers and letters from the EMNIST dataset. This dataset is commonly used as a benchmark dataset for machine learning exercises and competitions, with some network architectures achieving testing accuracies as high as 97.9% (Achintya, 2020). Additionally, this dataset has become one of the standard test pieces for innovations in neural network design (Baldominos, 2019)
My goal for this project was not so much to achieve a new accuracy for the dataset, but rather to construct a working neural network which would achieve satisfactory performance, which I defined as achieving over 85% accuracy on the testing dataset based on the accuracy ranges of published data (paperswithcode.com, 2017).
The EMNIST dataset is a collection of handwritten characters including letters and digits (NIST, 2019). It is based on the modified National Institute of Standards and Technology Database which includes 60,000 handwritten digits used for training image recognition systems (LeCun et. al, 1998). This dataset was then extended to 814,255 characters and digits in order to form the EMNIST dataset (NIST, 2019), however, this dataset does not include an equal probability of each character and thus risks overtraining certain characters. I am training my algorithm on the balanced EMNIST datasets, which includes 131,600 characters from 47 balanced classes (Cohen et. al, 2017). Using the balanced dataset means that there are an equal number of elements from each class, and thus the algorithm will not develop a tendency to identify more frequently occurring characters.
The dataset was split using a 1/6 training/validation split, which involves 5/6th of the dataset being used to train the model and the remaining 1/6th being used to validate the results after each epoch. The training process was conducted with mini batches of the overall data in order to minimize the number of weights calculated during each epoch by using a randomly selected subsample for training.
Batch Size | Learning Rate | Epochs | Testing Accuracy | Avg Loss | Notes |
30 | 0.00001 | 60 | 83.0 | 0.0490 | basic structure as initially outlined |
40 | 0.00005 | 60 | 85.5 | 0.0537 | just altered hyper parameters for batch size and learning rate |
40 | 0.00005 | 60 | 50.7 | 3.8100 | added softmax dim = -1 at the end |
40 | 0.00005 | 60 | 85.9 | 0.0547 | softmax back to relu and final layer dropped from 120to 80 |
40 | 0.00005 | 60 | 87.0 | 0.0482 | added dropout at the end with p = 0.05 |
40 | 0.00005 | 60 | 59.5 | 0.2050 | rrelu to relu |
40 | 0.00005 | 60 | 59.2 | 0.1715 | lose dropout |
64 | 0.00005 | 40 | 86.0 | 0.0433 | rrelu back, added a last layer of 80 to 47 |
64 | 0.00005 | 40 | 86.2 | 0.0435 | used SGD for optimization |
64 | 0.00005 | 40 | 86.9 | 0.0377 | dropout between fc 1 and 2 (300 to 160) |
64 | 0.00005 | 40 | 87.1 | 0.0377 | dropout inplace = true |
64 | 0.00005 | 40 | 87.2 | 0.0060 | dropout p = 0.1, back to ADAM for optimization |
64 | 0.00005 | 40 | 86.0 | 0.0064 | added second dropout between 2 and 3 |
64 | 0.00005 | 60 | 86.7 | 0.0061 | upped to 60 epochs to let it keep training |
64 | 0.00005 | 60 | 87.6 | 0.0056 | dropped dropout and kept 60 count |
64 | 0.00005 | 80 | 88.2 | 0.0053 | 80 reps |
64 | 0.00005 | 120 | 88.4 | 0.0053 | 120 reps |
Network Architecture | ||||
Layer type | In | Out | Size | Number of Weights |
Convolution | 1 | 20 | 28 x 28 | 1.57E+04 |
Convolution | 20 | 30 | 28 x 28 | 4.70E+05 |
maxpool | 30 | 30 | 14 x 14 | 1.38E+08 |
Convolution | 30 | 30 | 14 x 14 | 1.76E+05 |
Convolution | 30 | 10 | 14 x 14 | 5.88E+04 |
maxpool | 10 | 10 | 7 x 7 | 9.60E+05 |
flatten | 49 | 49 | 1 x L | 0 |
Linear | 490 | 300 | 1 x L | 147000 |
Linear | 300 | 160 | 1 x L | 48000 |
Linear | 160 | 80 | 1 x L | 12800 |
Linear | 80 | 47 | 1 x L | 3760 |
Total Weights | 1.40E+08 |
Overall, my network achieved a performance of 88.4% which is comparable with the accuracies of many of benchmark networks, which range in performance from 50.93% to 95.96% accuracy on the testing data (paperswithcode.com, 2017). While certainly not as accurate, I would call my accuracy of 88.4% or 16619/18800 to be above average. Additionally, the improvements of accuracy by epoch seemed to reach a plateau around 80 or so training sessions, as can be seen in Figure 1. Obviously these results can always be improved and I will continue to refine my design and implementation techniques through further iterations.
Figure 1. - Training Accuracy vs. Epoch of final network design
A. Agnes Lydia and , F. Sagayaraj Francis, Adagrad - An Optimizer for Stochastic Gradient Descent, Department of Computer Science and Engineering, Pondicherry Engineering College, May 2019.
Baldominos A, Saez Y, Isasi P. A Survey of Handwritten Character Recognition with MNIST and EMNIST. Applied Sciences. 2019; 9(15):3169. https://doi.org/10.3390/app9153169 Add to Citavi project by DOI
Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters. Retrieved from http://arxiv.org/abs/1702.05373
“The EMNIST Dataset.” NIST, 28 Mar. 2019, https://www.nist.gov/itl/products-and-services/emnist-dataset.
Nielsen, Michael A. Neural Networks and Deep Learning, Determination Press, 1 Jan. 1970, http://neuralnetworksanddeeplearning.com/.
Tripathi, Achintya. “EMNIST Letter Dataset 97.9%:ACC & VAL_ACC: 91.78%.” Kaggle, Kaggle, 16 Aug. 2020 ,https://www.kaggle.com/code/achintyatripathi/emnist-letter-dataset-97-9-acc-val-acc-91-78.
LeCun, Yann, et al. “The Mnist Database.” MNIST Handwritten Digit Database, Yann LeCun, Corinna Cortes and Chris Burges, Nov. 1998, http://yann.lecun.com/exdb/mnist/.
Paperswithcode.com. (2017). Papers with code - EMNIST-letters benchmark (image classification). EMNIST Benchmark Algorithms. Retrieved November 19, 2022, from https://paperswithcode.com/sota/image-classification-on-emnist-letters
Li, Fei-Fei. “Convolutional Neural Networks (CNNs / ConvNets).” CS231N Convolutional Neural Networks for Visual Recognition, Stanford University, Jan. 2022, https://cs231n.github.io/convolutional-networks/.
Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical Evaluation of Rectified Activations in Convolutional Network. http://arxiv.org/abs/1505.00853