Dense Associative Memory for Image Classification with Resnet50 | by jayakumar pujar

This article describes the integration of Dense Associative Memory (DAM) with Convolutional Neural Networks (CNNs) based Resnet50 architecture, to improve pattern recognition capabilities. Associative memory helps to store and retrieve many more patterns than the number of neurons in the network that helped contribute to enhanced performance in tasks such as image classification. The proposed concept aims to address the challenges of traditional memory systems in handling dense and complex patterns by using the feature extraction of Resnet50. I present the results of our experiments that are more reliable than the standard CNNs. In this problem, the network is presented with an image and the task is to label the image.

It is difficult to solve the problem if the number of patterns is more or equal to the number of neurons in the network, or equivalently the number of pixels in an image. It can be solved by altering the standard energy function of associative memory, quadratic interactions between the neurons, by including in its higher-order interactions. We used the energy function that interacts with higher-order interactions so that one can store and accurately retrieve more patterns than the number of neurons. Our work underscores the potential of incorporating dense associative memory into deep learning architectures, paving the way for enhancements in pattern recognition concepts. This article contributes to the ongoing discourse on innovative strategies to improve the efficiency of neural networks for real-world applications. We performed our experiment on the Coffee Beans dataset that is available on Kaggle.

Introduction:

Artificial Intelligence has been showing significant growth in filling the gap between the capabilities of humans and machines. In the last few years, the deep learning (DL) methodology has been deemed the high Standard in the machine learning (ML) community. CNNs are commonly used to power computer vision applications. Residual neural networks are a type of artificial neural network (ANN) that forms networks by stacking residual blocks. ResNet is a specific type of convolutional neural network (CNN). A Resnet50 is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various objects in the image, and be able to classify one from the other. The pre-processing
required in a Resnet50 is much lower as compared to other classification algorithms. ResNet-50 is a 50-layer convolutional neural network (48 convolutional layers, one MaxPool layer, and one average pool layer). The ResNet architecture follows two basic design rules. First, the number of filters in each layer is the same depending on the size of the output feature map. Second, if the feature map’s size is halved, it has double the number of filters to maintain the time complexity of each layer.

Dense Associative Memory
Dense Associative Memory is also known as the Hopfield Network. Hopfield Networks are Recurrent neural networks that are capable of storing and retrieving multiple memories. Hopfield’s unique network architecture was based on a physics model that explains the emergent behavior of the magnetic fields produced by ferromagnetic materials.
It has two types:
1. Discrete Hopfield networks (Binary HN)
2. Continuous Hopfield networks (CHN)

Discrete Hopfield Networks: It is an interconnected NN. Each neuron acts as an input to other neurons but not for the self. It behaves discretely, i.e. it gives finite distinct output. Of two types: Binary (0/1) and Bipolar (-1/1).
Continuous Hopfield Networks: Here the time parameter is continuous which means we can obtain values between 0 to 1. No need to consider binary/bipolar. It converges in one step(optimization) and stores more exponential patterns than BHN (binary). It has an energy function. It either makes less or remains unchanged on an update after every iteration.
Energy Minima Types:
1. Global fixed point allowing the network to average over all stored patterns.
2. Metastable states allow averaging over a subset of patterns, promoting flexibility and adaptability.
3. Fixed points storing a single pattern offer specificity in information storage.

Picture taken from https://twitter.com/DimaKrotov/status/1387770685813035017

Let us break down the components of the DAM equation:

∑𝑘𝑢 = 𝑖 (Outer summation) This part indicates that the energy function is computed for each class μ and then summed over all classes. N (Inner summation) for each class µ there is an inner summation over all features N. xiu represents the ith element of the input vector corresponding to class µ. (𝑥𝑖𝜇𝜎𝑖) Each element of the input vector is multiplied by a u scaling factor 𝜎𝑖. (σi xi ) is raised to the power of ‘n’. ∑𝑘𝑢 =1 After the inner summation
and exponentiation, the terms for each class. The final energy function(E) is the negative sum of the outer summation raised to the power of n. This indicates that the energy is computed for each class, and the contributions are combined in a way that depends on the power ‘n’.

DAM Operation:

Normalization:
inputs_normalized: Normalize the input tensor along the last axis using L2 normalization. This ensures that the input vectors have a unit norm. kernel_normalized: Normalize the kernel matrix along the 0th axis (columns) using L2 normalization. This ensures that each column of the kernel matrix has a unit norm.

Matrix Multiplication:
Perform matrix multiplication between the normalized input (inputs_normalized) and the normalized kernel (kernel_normalized). This results in a matrix where each row represents the similarity of the corresponding input with each class.

Softmax Activation:
Apply the softmax activation function to the output of the matrix multiplication. This operation converts the raw similarities into probabilities, making the output a probability distribution over the classes.

In summary, the DAM layer takes an input tensor, normalizes both the input and the learned weight matrix, computes the similarities between input vectors and class vectors, and then applies softmax to obtain class probabilities. The goal is to learn a set of class vectors in the kernel that captures meaningful relationships between input vectors and classes.

An example of an image classification problem is the case of Coffee Beans Image Classification:
This dataset consists of 2400 images. There are train, test, and validation data sets for 4 classes. All images are 244x244x3. We will train a model with Resnet50 and DAM to classify 4 classes. we will generate images using an Image Data Generator and augment only the train dataset with horizontal and vertical flips. The accuracy of this model is 90%. It can be tuned to the maximum in further experiments.

This code explains the redesign of the Resnet50 -DAM architecture. Please check the below git repo for complete code and results.
https://github.com/jayakumarpujar/DAM-with-Resnet50.git

References:

https://arxiv.org/abs/1606.01164

Source link