Entering the world of image detection models: A study on convolutional neural network in distinguishing the origin of an animation | by valmir francisco

Hello everyone, some of you may remember me from my first publication here, in which I shared my initial experience with machine learning. I discussed how I utilized a K-means model to classify mushrooms as either poisonous or edible.

Now, in this article, I present my second project in the world of machine learning. This time, I have ventured into the world of Convolutional Neural Networks (CNN) to develop an image detection model. The goal here is to distinguish between images of anime (oriental animation, primarily Japanese) and cartoons (western animation).

Before delving into the technical details of my work, I’d like to share the motivation behind choosing this particular theme. Following my initial venture into machine learning, I sought to challenge myself with a project in image detection. While popular datasets like CIFAR-10 and MNIST were tempting, I yearned for something less common. Thus, I opted for the Anime vs Cartoon dataset. The prospect of working with a subject as subjective as the distinctive visual styles of animation from two culturally distant backgrounds captivated me.

The neural network, or more precisely, the artificial neural network, is a machine learning model designed to instruct computers in processing data with a specific end goal. To reach this goal the neural network uses a process draw inspiration from the human brain’s interconnected system of neurons arranged in layered structures. This design promotes an adaptive learning process, leveraging mistakes and successes to iteratively improve your capabilities.

The architecture of a neural network consists of:

Input layer: This initial layer is responsible for importing information from the external world into the model. The number of neurons in this layer depends on the dimensionality of the input data.
Hidden layers: These layers receive data from either the input layer or another hidden layer, process the information, and then transmit it forward. The number of layers and neurons in these layers is determined by the complexity of the problem.
Output layer: In this final layer, the model produces its output. The number of neurons in this layer depends on the number of categories in your problem.

Other important aspects of the neural network:

Activation function: The activation function of neurons serves several critical roles, such as introducing non-linearity to the network’s output, capturing specific features through different activation functions, and contributing to regularization.
Weights: Weights play a crucial role in determining both the strength and direction of influence of each input on the network’s output. These weights are readjusted during the training process to enable the network to learn and adapt.
Bias: The bias term allows the network to learn to compensate for systematic differences between its predictions and the actual labels. This flexibility enhances the adaptability of the learning process.

The convolutional world of neural netwoks

The main aspect that differentiates a normal neural network and a convolutional neural network lies in the presence of a crucial component — the convolutional layer. But what exactly is a convolutional layer?

Example in code of a convolutional layer

These layers are tasked with executing convolution operations on the input data of our model. In simple terms, this mathematical operation involves combining the input with filters (also known as kernels), ultimately producing a feature map. This unique characteristic of convolution plays a pivotal role in extracting visual patterns and establishing a feature hierarchy, especially when dealing with gridded data such as images.

In this project, I employed a workflow similar to the one used in my previous article. The work unfolded through six distinct stages: fetch data, preprocessing, data validation, data segregation, model training, and model testing. These stages collectively formed a systematic approach to developing of the machine learning model.

Development Tools

For the implementation, the Google Colab environment with Python3 served as the primary coding platform, providing collaborative capabilities and access to GPU resources for efficient model training. Additionally, the AI development platform WandB played a crucial role in recording and storing results, along with artifacts generated during the model creation.

Well, after all the explanations of the project, we can delve into the work itself. The data fetching stage occurred together with the choice of the project theme. To accomplish this, I researched some datasets on the Kaggle website. During this exploration, I discovered something interesting for the project, the ‘Anime x Cartoon’ dataset. This dataset comprises more than 8000 animation images divided between two folder, anime and cartoon, and inside of each folder we have other folders of their respective animation title.

Once the zip file housing these archives was uploaded to Google Drive, I do the Fetch Data and Preprocess steps within the same code. While distinct steps, I opted for this integration to streamline the process.

Code with I unzip the file

Upon obtaining the archives from Google Drive, the subsequent step involved preprocessing these files — normalizing them and discerning between the data and their corresponding labels. For this purpose, the following code was employed:

Code where I do the treatment of the image of the anime folder (the same process was made on the cartoon folder)

Following this step, its necessary to store the results in Weights and Biases (wandb) for future use.

Code used to send the artifacts into wandb

After transforming our raw data (unprocessed data) into clean data during the preprocessing step, it is essential to perform a step of verification. The data check step ensures that the data obtained from preprocessing is well-suited for creating our model. Anticipating common errors that may be present in the data, I’ve established eight checks to validate the appropriate format of both the images and their respective labels.

The test file

Identifying any issues at this stage mandates a return to the preprocessing phase for correction and subsequent re-verification. This iterative process ensures the integrity and quality of the data before proceeding with model development.

Now that the data is appropriately prepared, we proceed to segregate it into training and testing sets. However, extra caution is necessary to prevent one label from being overrepresented in subsequent steps. To address this concern, we employ a pre-built method from scikit-learn, train_test_split, utilizing the stragefyargument set equal to the variable we wish to distribute evenly.

In this code we fetch the artifacts of wandb and segregate them into train and test

Finally, after completing all the preceding steps, our model begins its training in this phase of our process. However, before initiating the training process, we must once again segregate the training data, distinguishing between the training set and the validation set. This step is exceptionally necessary because, in each epoch of our training, we require a distinct set of data for validation purposes to assess the performance of the model.