Up and Running with Keras: Deep Learning Digit Classification using CNN

Published in

ITNEXT

5 min readNov 24, 2018

As the Keras documentation says — “Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.” We will use Tensorflow as the backend. For this, you need to have both Keras and Tensorflow libraries installed.

For complete installation instructions and configuring Tensorflow as the backend of Keras, please follow the links here — https://keras.io/#installation and here — https://www.pyimagesearch.com/2016/11/14/installing-keras-with-tensorflow-backend/

Note that it’s possible to use GPU for training deep learning models but this is out of scope for this article. Interested readers can find the instructions for setting up Tensorflow with GPU here — https://keras.rstudio.com/reference/install_keras.html

In this article, we will develop a simple CNN (Convolutional Neural Network) also known as convent to classify digits 0–9 from grayscale images of size 28x28 pixels into their 10 categories (0 through 9). It’s a multi-class classification problem that we will try to solve using Deep Learning algorithm CNN (Convolutional Neural Network) with above 99% accuracy.

Loading dataset:

First we will load the famous MNIST dataset from keras datasets using the code below —

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Here dataset is loaded and divided into train and test images and corresponding labels. MNIST comes with 70,000 data samples with 60,000 being training data and 10,000 being test data. We can examine the shape of data as below —

train_images.shape

(60000, 28, 28)

and

test_images.shape

(10000, 28, 28)

Preparing Model:

Next we prepare our CNN model (also, known as convent) with below code —

from keras import models
from keras import layers
def make_classifier(optimizer):
model = models.Sequential()

model.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation=’relu’, padding=’same’, input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’, padding=’same’))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’, padding=’same’))
model.add(layers.Flatten())

model.add(layers.Dense(64, activation=’relu’))
model.add(layers.Dense(10, activation=’softmax’))
model.compile(optimizer=‘optimizer’,
loss=’categorical_crossentropy’,
metrics=[‘accuracy’])
return model

The model/network is basically a stack of Conv2D, MaxPooling2D, and Dense layers. We will go over the model configuration hyperparamenters here but explaning how CNN works is beyond scope of this article. For this, I suggest to go over the nice article here — An Intuitive Explanation of Convolutional Neural Networks (https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/)

Here, the input shape of the convent is — (image_height, image_width, image_channels). In our case, since the images are grayscale, image_channels = 1 (value between 0 and 255). We will train the network with input shape (28, 28, 1). We have used padding = “same” in Covn2D layers to not lose any dimension over Conv2D layers while learing the feature map. A 3x3 matrix is used as ‘kernel_size’ to learn the feature map with 32 filters computed by the convolution for first later of Conv2D. Also, we have used ‘relu’ activation function.

We have also used MaxPooling2D layers. Max pooling consists of extracting windows from the input feature maps and outputting the max value of each channel. We used a window size of 2x2.

Our network also consists of a sequence of two Dense layers. The second (and last) layer is a 10-way ‘softmax’ layer, which means it will return an array of 10 probability scores. Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

At the end, we have used —

— optimizer: rmsprop (passed as parameter)

— loss function: categorical_crossentropy

— metrics: accuracy

Shape Coversion:

Our training and test images are stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. As required by the model, we transform it into a float32 array of shape (60000, 28, 28, 1) with values between 0 and 1.

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype(‘float32’) / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype(‘float32’) / 255

Categorical Encoding:

We also apply categorical encoding on the labels. Categorical_crossentropy (our loss function) expects the labels to follow categorical encoding.

from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Model Summary:

Next, we prepare our model and print the model summary which looks as below —

model = make_classifier(‘rmsprop’)
model.summary()

We can see, each maxpooling reduces the input dimension into exactly half wihtout losing any dimension as we have used padding in out model.

Train Model:

Next, we train our model as follows —

history = model.fit(train_images, train_labels, epochs=10, batch_size=200)

Loss vs. Accuracy Plot:

We have plotted a loss vs. accuracy from the model training history with the code below —

import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict[‘loss’]
acc_values = history_dict[‘acc’]
epochs = range(1, len(acc_values) + 1)
plt.plot(epochs, loss_values, ‘bo’, label=’Training loss’)
plt.plot(epochs, acc_values, ‘b’, label=’Training accuracy’)
plt.title(‘Training loss vs accuracy’)
plt.xlabel(‘Epochs’)
plt.ylabel(‘Loss/Accuracy’)
plt.legend()
plt.show()

The plot looks as follows —

We can see our loss going down per epoch while accuracy is increased.

Model Evaluation:

Last step, we have evaluated our model on the test data and we find test accuracy of 99.27% with test loss 0.03166.

test_loss, test_acc = model.evaluate(test_images, test_labels)

Batch Normalization and Dropout:

Batch Normalization and Dropout are techinques used to increase regularization and reduce overfitting. However, author of this article have not found much benefit of using these techniques in this example.

The full source code is available as a Jupyter Notebook here — https://github.com/imeraj/MachineLearning/blob/master/DeepLearnPython/MNIST_classifier.ipynb

References:

[1] Deep Learning with Python: https://www.manning.com/books/deep-learning-with-python

I am no expert in Machine Learning/Deep Learning but I hope this article helped some of the readers. If you like this article, please follow me here or on twitter and don’t forget to clap ;)

Up and Running with Keras: Deep Learning Digit Classification using CNN

Written by Meraj Molla