![](https://crypto4nerd.com/wp-content/uploads/2024/03/0zFqlga7m-wILKbZx-1024x576.jpg)
Convolutional Neural Networks (CNNs) are a powerful type of deep learning architecture excelling at image recognition and classification tasks. In this blog post, we’ll build a CNN using Keras and TensorFlow to classify images as cats or dogs. We’ll delve into the code, understand the steps involved, and explore how to save and use the trained model for predictions on new images.
Prerequisites:
Basic understanding of Python programming
Familiarity with machine learning concept
Libraries:
We’ll be using Keras and TensorFlow libraries for building and training the CNN model. Make sure you have them installed using:
pip install tensorflow
First, let’s import the necessary libraries and packages for building our CNN.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import image_dataset_from_directory
We import essential functions from Keras to create the CNN architecture.
Sequential
: This builds a linear stack of layers, where the output of one layer becomes the input of the next.Conv2D
: This defines a convolutional layer that extracts features from the input image.MaxPooling2D
: This performs downsampling to reduce the image size and capture spatial features.Flatten
: This transforms the high-dimensional output from convolutional layers into a 1D array for fully connected layers.Dense
: This defines a fully connected layer that learns complex relationships between features.image_from_directory
: This creates atf.data.Dataset
object from image files organized into class-specific directories. It simplifies the process of loading and preprocessing image data for training and validation.
We’ll start by initializing a Sequential model, which allows us to add layers sequentially.
classifier = Sequential()
We create a sequential model using Sequential
class from Keras. This establishes the linear stacking of layers in the CNN.
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))# Add another convolutional layer
classifier.add(Conv2D(32, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
Why we use them:
- Convolutional layers are the building blocks of CNNs. They are designed to automatically learn spatial features from images.
- Each convolutional layer has filters of a specific size (here, 3×3). These filters slide across the image, extracting features like edges, shapes, and textures.
- The number of filters (32 in this case) determines how many different features the layer can learn.
- The ReLU (Rectified Linear Unit) activation function introduces non-linearity, allowing the network to learn more complex patterns.
- Max pooling layers downsample the image representation, reducing its dimensionality and computational cost. They also help capture spatial relationships between features.
- We use two convolutional layers to extract progressively more complex features from the image.
Input Shape:
input_shape=(64, 64, 3)
specifies that the model expects input images to be 64×64 pixels with 3 color channels (RGB).
Before feeding the feature maps into fully connected layers, we need to flatten them.
classifier.add(Flatten())
Why we use it:
- Convolutional layers produce feature maps, which are typically 3D tensors (height, width, number of channels).
- Fully connected layers require a 1D input vector.
- The flattening layer transforms the high-dimensional output of the convolutional layers into a 1D array suitable for fully connected layers.
Fully connected layers learn to classify the extracted features into different classes.
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=256, activation='relu'))
classifier.add(Dense(units=1, activation='sigmoid'))
Why we use them:
- Fully connected layers take over from the convolutional layers.
- Each neuron in a fully connected layer is connected to all neurons in the previous layer.
- These layers learn complex relationships between the features extracted by the convolutional layers.
- The first two fully connected layers (128 and 256 units) with ReLU activation help the network learn more intricate patterns within the extracted features.
- The final output layer has one unit with sigmoid activation because we’re performing binary classification (cat or dog). The sigmoid function outputs a value between 0 and 1, where a value closer to 1 indicates “cat” and closer to 0 indicates “dog”.
Before training, we compile the model with appropriate optimizer and loss function.
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Why we use it:
- Compiling defines the training process:
optimizer='adam'
: This specifies the Adam optimizer, which is an efficient algorithm for updating the weights of the network during training.loss='binary_crossentropy'
: This defines the binary crossentropy loss function, suitable for binary classification problems. It measures the difference between
With the model architecture defined, we can now train it on our dataset of dog and cat images.
train_dataset = image_dataset_from_directory(
'dataset/train_set',
labels='inferred', # Labels are inferred from subdirectory names
label_mode='binary', # Binary classification (cat or dog)
image_size=(64, 64), # Ensure image size matches model input
batch_size=32, # Adjust batch size if needed
shuffle=True # Shuffle train data for evaluation
)
Here, we use image_dataset_from_directory
to create a dataset for the training set. Let’s break down the parameters:
'dataset/training_set'
: The directory path containing the training images organized into subdirectories representing each class.labels='inferred'
: Specifies that the labels will be inferred from the directory structure.label_mode='binary'
: Indicates that we are performing binary classification (in this case, distinguishing between cats and dogs).image_size=(64, 64)
: Resizes the images to 64×64 pixels. This ensures uniformity in image dimensions for training.batch_size=32
: Specifies the batch size, which is the number of samples per batch during training.shuffle=True
: Shuffles the data to ensure that the model does not learn the order of the images.
What is overfitting and how are we overcoming it?
Overfitting is a common challenge in machine learning where a model learns to fit the training data too closely, capturing noise and random fluctuations that are specific to the training dataset. As a result, the model performs well on the training data but fails to generalize to unseen data, leading to poor performance on validation or test datasets. Overfitting can also occur when the model is too complex relative to the amount of training data, causing it to capture irrelevant patterns.
In the context of image classification, overfitting can manifest when a CNN learns to recognize specific features or patterns that are unique to the training images but do not generalize well to new, unseen images. For example, a model may memorize the exact appearance of certain objects or backgrounds in the training images without learning the underlying characteristics that distinguish between different classes (e.g., dogs vs. cats).
To overcome overfitting, I have used image augmentation to help overcome overfitting. Image augmentation techniques help reduce overfitting by creating variations of training images. These transformations may include:
- Rotation: Rotating the image by a certain angle.
- Shearing: Distorting the image by shifting its pixels along a specified axis.
- Zooming: Enlarging or shrinking the image.
- Horizontal or Vertical Flipping: Mirroring the image horizontally or vertically.
- Brightness and Contrast Adjustment: Modifying the brightness and contrast of the image.
By augmenting the training data with these transformations, we effectively increase the diversity and variability of the dataset, exposing the model to a wider range of scenarios and reducing the risk of overfitting. Additionally, since the augmented images are variations of the original training data, they still contain relevant information for learning the underlying patterns without introducing new labels or ground truth.
I used the image_dataset_from_directory
function from TensorFlow to create training and test datasets from image files organized into directories. By setting shuffle=True
for the training dataset, we ensure that the model sees the augmented versions of the images in random order during each epoch, preventing it from memorizing specific sequences of images.
test_dataset = image_dataset_from_directory(
'dataset/test_set',
labels='inferred', # Labels are inferred from subdirectory names
label_mode='binary', # Binary classification (cat or dog)
image_size=(64, 64), # Ensure image size matches model input
batch_size=32, # Adjust batch size if needed
shuffle=False # Don't shuffle test data for evaluation
)
Similarly, this code snippet creates a dataset for the test set. The parameters are similar to those used for the training set. However, shuffle
is set to False
because shuffling is unnecessary for the test set, and it ensures consistency when evaluating the model’s performance.
train_dataset = image_dataset_from_directory(
'dataset/training_set',
labels='inferred',
label_mode='binary',
image_size=(64, 64),
batch_size=32,
shuffle=True
)
What is happening here:
This line trains the CNN model on the provided train_dataset
.
epochs=25
: This specifies the number of times the model goes through the entire training dataset. You can experiment with different values to improve performance.validation_data=test_dataset
: This separates a portion of the data (test set) for evaluating the model’s performance during training. The model’s accuracy on the training data may not reflect its ability to generalize to unseen data. The validation set helps assess how well the model learns without overfitting to the training data.
Training Process:
- During each epoch, the model iterates through the training dataset in batches (32 images by default).
- For each batch:
- The model calculates the predictions for the images in the batch.
- The loss function (binary crossentropy in this case) measures the difference between the predicted and actual labels.
- The optimizer (Adam) uses the calculated loss to update the weights of the CNN layers to minimize the loss in future iterations.
After each epoch, the model evaluates its performance on the validation set. This helps monitor for overfitting.
We will save the model here so that we can use it later to predict, while not having to train the model again.
# Save the trained model
classifier.save('trained_model.keras')
After training, we evaluate the model’s performance on the test set to assess its accuracy and generalization ability.
# Evaluate the model on the test set and print the accuracy
test_loss, test_acc = loaded_model.evaluate(test_dataset)
print('Test accuracy:', test_acc)
Finally, we can use the trained model to make predictions on new images.
# Load the saved model
loaded_model = load_model('trained_model.h5')# Make predictions on new data
# For example, if you have a single image 'new_image.jpg' for prediction
from keras.preprocessing import image
import numpy as np
# Load the image
new_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size=(64, 64))
new_image = image.img_to_array(new_image)
new_image = np.expand_dims(new_image, axis=0)
# Normalize the image data
new_image = new_image / 255.0
# Make prediction
prediction = loaded_model.predict(new_image)
if prediction[0][0] > 0.5:
print("Cat")
else:
print("Dog")
In conclusion, building a CNN for image classification tasks is a rewarding and exciting endeavor. By leveraging the power of deep learning frameworks like Keras and TensorFlow, we can develop models capable of accurately distinguishing between different classes of objects in images. In this article, we’ve demonstrated how to build and train a CNN for classifying images of dogs and cats, paving the way for further exploration and experimentation in the field of computer vision.
Now its your turn to clap and follow me. Thank you for reading!
Give me a FOLLOW if you liked this, for more tech blogs!
The success of deep learning tells us that the brain’s neural networks really can learn the patterns and structures in the data they are exposed to.
~ Geoffrey Hinton, a pioneer in deep learning