How to Make an Image Classifier in Python using Tensorflow 2 and Keras

Building and training a model that classifies CIFAR-10 dataset images that were loaded using Tensorflow Datasets which consists of airplanes, dogs, cats and other 7 objects using Tensorflow 2 and Keras libraries in Python.
  · 10 min read · Updated aug 2020 · Machine Learning · Computer Vision


Image classification refers to a process in computer vision that can classify an image according to its visual content. For example, an image classification algorithm can be designed to tell if an image contains a cat or a dog. While detecting an object is trivial for humans, robust image classification is still a challenge in computer vision applications.

In this tutorial, you will learn how to successfully classify images in the CIFAR-10 dataset (which consists of airplanes, dogs, cats and other 7 objects) using Tensorflow in Python.

Note that there is a difference between image classification and object detection, image classification is about classifying an image to some category, like in this example, the input is an image and the output is a single class label (10 classes). Object detection is about detecting, classfiying and localizing objects in real-world images, one of the main algorithms are YOLO object detection.

We will preprocess the images and labels, then train a convolutional neural network on all the training samples. The images will need to be normalized and the labels need to be one-hot encoded.

First, let's install the requirements for this project:

pip3 install numpy matplotlib tensorflow==2.0.0 tensorflow_datasets

For instance, open up an empty python file and call it train.py and follow along. Importing Tensorflow:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import tensorflow as tf
import tensorflow_datasets as tfds
import os

As you may expect, we'll be using tf.data API to load CIFAR-10 dataset.

Hyper Parameters

I have experimented with various parameters, and found this as an optimal ones:

# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30

num_classes just refers to the number of categories to classify, in this case CIFAR-10 has only 10 categories of images.

Understanding and Loading CIFAR-10 Dataset

  • The dataset consists of 10 classes of images which its labels ranging from 0 to 9:
    • 0: airplane.
    • 1: automobile.
    • 2: bird.
    • 3: cat.
    • 4: deer.
    • 5: dog.
    • 6: frog.
    • 7: horse.
    • 8: ship.
    • 9: truck.
  • 50000 samples for training data, and 10000 samples for testing data.
  • Each sample is an image of 32x32x3 pixels (width and height of 32 and 3 depth which are RGB values).

Let's load this:

def load_data():
    """
    This function loads CIFAR-10 dataset, and preprocess it
    """
    def preprocess_image(image, label):
        # convert [0, 255] range integers to [0, 1] range floats
        image = tf.image.convert_image_dtype(image, tf.float32)
        return image, label
    # loading the CIFAR-10 dataset, splitted between train and test sets
    ds_train, info = tfds.load("cifar10", with_info=True, split="train", as_supervised=True)
    ds_test = tfds.load("cifar10", split="test", as_supervised=True)
    # repeat dataset forever, shuffle, preprocess, split by batch
    ds_train = ds_train.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
    ds_test = ds_test.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
    return ds_train, ds_test, info

This function loads the dataset using Tensorflow Datasets module, we set with_info to True in order to get some information about this dataset, you can print it out and see what different fields and their values, we'll be using info for getting number of samples in training and testing sets.

After that, we:

  • Repeat the dataset forever using repeat() method, this will enable us to generate data samples repeatedly (we'll specify stopping condition in the training phase).
  • Shuffle it.
  • Normalize images to be between 0 and 1, this will help the neural network to train much faster, we used the map() method that accepts a callback function that takes the image and label as arguments, we simply used the built-in Tensorflow's convert_image_dtype() method that does that.
  • Finally, we batch our dataset by 64 samples using batch() function, so each time we generate new data points, it'll return 64 images and their 64 labels.

Constructing the Model

The following model will be used:

def create_model(input_shape):
    # building the model
    model = Sequential()
    model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same", input_shape=input_shape))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    # flattening the convolutions
    model.add(Flatten())
    # fully-connected layer
    model.add(Dense(1024))
    model.add(Activation("relu"))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation="softmax"))
    # print the summary of the model architecture
    model.summary()
    # training the model using adam optimizer
    model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

That's 3 layers of 2 ConvNets with a max pooling and ReLU activation function and then a fully connected with 1024 units. This is relatively a small model comparing to ResNet50 or Xception which are the state-of-the-art. If you feel to use models that made by deep learning experts, you need to use transfer learning.

Training the Model

Now, let's train the model:

if __name__ == "__main__":
    # load the data
    ds_train, ds_test, info = load_data()
    # constructs the model
    model = create_model(input_shape=info.features["image"].shape)
    # some nice callbacks
    logdir = os.path.join("logs", "cifar10-model-v1")
    tensorboard = TensorBoard(log_dir=logdir)
    # make sure results folder exist
    if not os.path.isdir("results"):
        os.mkdir("results")
    # train
    model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=1,
              steps_per_epoch=info.splits["train"].num_examples // batch_size,
              validation_steps=info.splits["test"].num_examples // batch_size,
              callbacks=[tensorboard])
    # save the model to disk
    model.save("results/cifar10-model-v1.h5")

After loading the data and creating the model, I used Tensorboard that will be tracking the accuracy and loss in each epoch and providing us with nice visualization.

We will be using "results" folder to save our models, if you're not sure how you can handle files and directories in Python, check this tutorial.

Since ds_train and ds_test will generate data samples in batches repeatedly, we need to specify number of steps per epoch, and that's number of samples divided by the batch size, and it is the same for validation_steps as well.

Run this, it will take several minutes to complete training, depending on your CPU/GPU.

You'll get similar result to this:

Epoch 1/30
781/781 [==============================] - 20s 26ms/step - loss: 1.6503 - accuracy: 0.3905 - val_loss: 1.2835 - val_accuracy: 0.5238
Epoch 2/30
781/781 [==============================] - 16s 21ms/step - loss: 1.1847 - accuracy: 0.5750 - val_loss: 0.9773 - val_accuracy: 0.6542

All the way to the final epoch:

Epoch 29/30
781/781 [==============================] - 16s 21ms/step - loss: 0.4094 - accuracy: 0.8570 - val_loss: 0.5954 - val_accuracy: 0.8089
Epoch 30/30
781/781 [==============================] - 16s 21ms/step - loss: 0.4130 - accuracy: 0.8563 - val_loss: 0.6128 - val_accuracy: 0.8060

Now to open tensorboard, all you need to do is to type this command in the terminal or the command prompt in the current directory:

tensorboard --logdir="logs"

Open up a browser tab and type localhost:6006, you'll be redirected to tensorboard, here is my result:

Validation LossValidation Accuracy

Clearly we are on the right track, validation loss is decreasing, and the accuracy is increasing all the way to about 81%. That's great!

Testing the Model

Once training is completed, it'll save the final model and weights in results folder, in that way, we can train only once and make predictions whenever we desire.

Open up a new python file called test.py and follow along.

Importing necessary utilities:

from train import load_data, batch_size
from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
import numpy as np

Let's make a Python dictionary that maps each integer value to its corresponding label in the dataset:

# CIFAR-10 classes
categories = {
    0: "airplane",
    1: "automobile",
    2: "bird",
    3: "cat",
    4: "deer",
    5: "dog",
    6: "frog",
    7: "horse",
    8: "ship",
    9: "truck"
}

Loading the test data and the model:

# load the testing set
ds_train, ds_test, info = load_data()
# load the model with final model weights
model = load_model("results/cifar10-model-v1.h5")

Evaluation:

# evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print("Test accuracy:", accuracy*100, "%")

Let's take a random image and make a prediction:

# get prediction for this image
data_sample = next(iter(ds_test))
sample_image = data_sample[0].numpy()[0]
sample_label = categories[data_sample[1].numpy()[0]]
prediction = np.argmax(model.predict(sample_image.reshape(-1, *sample_image.shape))[0])
print("Predicted label:", categories[prediction])
print("True label:", sample_label)

We've used next(iter(ds_test)) to get the next testing batch and then extracted the first image and label in that batch and made predictions on the model, here is the result:

156/156 [==============================] - 3s 20ms/step - loss: 0.6119 - accuracy: 0.8063
Test accuracy: 80.62900900840759 %
Predicted label: frog
True label: frog

The model says it's a frog, let's check it:

# show the image
plt.axis('off')
plt.imshow(sample_image)
plt.show()

Result:FrogTiny little frog! The model was right!

Conclusion

Alright, we are done with this tutorial, 81% isn't bad for this little CNN, I highly encourage you to twick the model or check ResNet50Xception or other state-of-the-art models to get higher performance!

If you're not sure how to use these models, I have a tutorial on this: How to Use Transfer Learning for Image Classification using Keras in Python.

You may wonder that this images are so simple, 32x32 grid isn't how the real world is, images aren't simple like that, they often contain many objects, complex patterns and so on. As a result, it is often a common practice to use image segmentation methods such as contour detection or K-Means clustering segmentation before passing to any classification techniques.

Finally, if you want to expand your skills on machine learning (or even if you're a beginner), I would suggest you take Andrew Ng' Machine Learning: Master the Fundamentals course. Good luck!

Happy Training ♥

View Full Code
Sharing is caring!



Read Also





Comment panel