How to Make an Image Classifier in Python using Keras

Abdou Rockikz · 14 sep 2019

Abdou Rockikz · 9 min read · Updated oct 2019 · Machine Learning · Computer Vision

Image classification refers to a process in computer vision that can classify an image according to its visual content. For example, an image classification algorithm can be designed to tell if an image contains a cat or a dog. While detecting an object is trivial for humans, robust image classification is still a challenge in computer vision applications.

In this tutorial, you will learn how to successfully classify images in the CIFAR-10 dataset which consists of airplanes, dogs, cats and other 7 objects.

We will preprocess the images and labels, then train a convolutional neural network on all the training samples. The images will need to be normalized and the labels need to be one-hot encoded.

First, let's install the requirements for this project:

pip3 install keras numpy matplotlib tensorflow

For instance, open up an empty python file and call it train.py and follow along. Importing keras:

from keras.datasets import cifar10 # importing the dataset from keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras.utils import to_categorical
import os

Hyper Parameters

I have experimented with various parameters, and found this as an optimal ones:

# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30

num_classes just refers to the number of categories to classify, in this case CIFAR-10 has only 10 categories of images.

Understanding and Loading CIFAR-10 Dataset

  • The dataset consists of 10 classes of images which its labels ranging from 0 to 9:
    • 0: airplane.
    • 1: automobile.
    • 2: bird.
    • 3: cat.
    • 4: deer.
    • 5: dog.
    • 6: frog.
    • 7: horse.
    • 8: ship.
    • 9: truck.
  • 50000 samples for training data, and 10000 samples for testing data.
  • Each sample is an image of 32x32x3 pixels ( width and height of 32 and 3 depth which is RGB values ).

Let's load this:

def load_data():
    """This function loads CIFAR-10 dataset, normalized, and labels one-hot encoded"""
    # loading the CIFAR-10 dataset, splitted between train and test sets
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    print("Training samples:", X_train.shape[0])
    print("Testing samples:", X_test.shape[0])
    print(f"Images shape: {X_train.shape[1:]}")
    # converting image labels to binary class matrices
    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)
    # convert to floats instead of int, so we can divide by 255
    X_train = X_train.astype("float32")
    X_test = X_test.astype("float32")
    X_train /= 255
    X_test /= 255
    return (X_train, y_train), (X_test, y_test)

This function loads the dataset which is built-in keras, print some statistics and then:

  • One-hot encode the labels using to_categorical function: This process is really important as it enables the labels to be a vector of 10 numbers so it will be suitable for the neural network. For example, a horse with a value of 7 will be encoded to: [0, 0, 0, 0, 0, 0, 0, 1, 0, 0].
  • Normalizing the images: Since the pixel numbers ranging from 0 to 255 are too big for the neural network to train, we will be normalizing these pixels to be ranged from 0 to 1.

Constructing the Model

The following model will be used:

def create_model(input_shape):
    # building the model
    model = Sequential()
    model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same", input_shape=input_shape))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    # flattening the convolutions
    model.add(Flatten())
    # fully-connected layer
    model.add(Dense(1024))
    model.add(Activation("relu"))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation="softmax"))
    # print the summary of the model architecture
    model.summary()
    # training the model using rmsprop optimizer
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

That's 3 layers of 2 ConvNets with a max pooling and ReLU activation function and then a fully connected with 1024 units. This is relatively a small model comparing to ResNet50 or Xception which are the state-of-the-art. If you feel to use models that made by deep learning experts, you need to use transfer learning, check this tutorial.

Training the Model

Now, let's train the model:

if __name__ == "__main__":
    # load the data
    (X_train, y_train), (X_test, y_test) = load_data()
    # constructs the model
    model = create_model(input_shape=X_train.shape[1:])
    # some nice callbacks
    tensorboard = TensorBoard(log_dir="logs/cifar10-model-v1")
    checkpoint = ModelCheckpoint("results/cifar10-loss-{val_loss:.2f}-acc-{val_acc:.2f}.h5",
                                save_best_only=True,
                                verbose=1)
    # make sure results folder exist
    if not os.path.isdir("results"):
        os.mkdir("results")
    # train
    model.fit(X_train, y_train,
            batch_size=batch_size,
            epochs=epochs,
            validation_data=(X_test, y_test),
            callbacks=[tensorboard, checkpoint],
            shuffle=True)

After loading the data and creating the model, I used some callbacks to help us in the training, ModelCheckpoint will save the model each time an optimal point is reached, and TensorBoard will be tracking the accuracy and loss in each epoch and providing us with nice visualization.

Run this, it will take several minutes to finish, depending on your CPU/GPU.

You'll get similar result to this:

Epoch 1/30
50000/50000 [==============================] - 26s 520us/step - loss: 1.6408 - acc: 0.3937 - val_loss: 1.2063 - val_acc: 0.5559

Epoch 00001: val_loss improved from inf to 1.20628, saving model to results/cifar10-loss-1.21-acc-0.56.h5
Epoch 2/30
50000/50000 [==============================] - 26s 525us/step - loss: 1.1885 - acc: 0.5716 - val_loss: 0.9982 - val_acc: 0.6473

All the way to the final epoch:

Epoch 29/30
50000/50000 [==============================] - 27s 534us/step - loss: 0.4225 - acc: 0.8539 - val_loss: 0.5863 - val_acc: 0.8093

Epoch 00029: val_loss did not improve from 0.56407
Epoch 30/30
50000/50000 [==============================] - 27s 549us/step - loss: 0.4205 - acc: 0.8517 - val_loss: 0.5782 - val_acc: 0.8154

Epoch 00030: val_loss did not improve from 0.56407

Now to open tensorboard, all you need to do is to type this command in the terminal or the command prompt in the current directory:

tensorboard --logdir="logs"

Open up a browser tab and type localhost:6006, you'll be redirected to tensorboard, here is my result:

Validation LossValidation Accuracy

Clearly we are on the right track, validation loss is decreasing, and the accuracy is increasing all the way to about 81%. That's great!

Testing the Model

Once training is completed, you will see various model weights saved in the results folder as shown in the following figure:

Model Weights SavedObviously, we will choose the optimal one, which is cifar10-model-v1-loss-0.57-acc-0.8107.h5 ( at least in my case, choose the highest accuracy or the lowest loss ).

Open up a new python file called test.py and follow along.

Importing necessary utilities:

from train import load_data
from keras.models import load_model
import matplotlib.pyplot as plt
import numpy as np

Let's make a Python dictionary that maps each integer value to its corresponding label in the dataset:

# CIFAR-10 classes
categories = {
    0: "airplane",
    1: "automobile",
    2: "bird",
    3: "cat",
    4: "deer",
    5: "dog",
    6: "frog",
    7: "horse",
    8: "ship",
    9: "truck"
}

Loading the test data and the model:

# load the testing set
(_, _), (X_test, y_test) = load_data()
# load the model with optimal weights
model = load_model("results/cifar10-loss-0.58-acc-0.81.h5")

Evaluation:

# evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print("Test accuracy:", accuracy*100, "%")

Let's take a random image and make a prediction:

# get prediction for this image
sample_image = X_test[7500]
prediction = np.argmax(model.predict(sample_image.reshape(-1, *sample_image.shape))[0])
print(categories[prediction])

Remember when we hot-encoded the labels ? Here, we need to reverse it from that 10 length vector into a number, that's what np.argmax is initially doing. After that, we map it to our dictionary to get the human readable label, here is my output:

10000/10000 [==============================] - 3s 331us/step
Test accuracy: 81.17999999999999 %
frog

The model says it's a frog, let's check it:

# show the image
plt.axis('off')
plt.imshow(sample_image)
plt.show()

Result:FrogTiny little frog! The model was right!

Conclusion

Alright, we are done with this tutorial, 81% isn't bad for this little CNN, I highly encourage you to twick the model or check ResNet50Xception or other state-of-the-art models to get higher performance!

If you're not sure how to use these models, I have a tutorial on this: How to Use Transfer Learning for Image Classification using Keras in Python.

You may wonder that this images are so simple, 32x32 grid isn't how the real world is, images aren't simple like that, they often contain many objects, complex patterns and so on. As a result, it is often a common practice to use image segmentation methods such as contour detection or K-Means clustering segmentation before passing to any classification techniques.

Happy Training ♥

View Full Code
Sharing is caring!


Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.