![](https://crypto4nerd.com/wp-content/uploads/2023/07/1lWIirq7fGyrSbcXXJGCehQ.jpeg)
In my last article we explored Inception architecture and in a future article I will explain Xception. However, before we proceed to Xception, we need to understand Batch Normalization and its importance in the context of Deep Learning, especially for CNN.
Batch Normalization
Batch Normalization is a technique used in Deep Learning to make neural networks more efficient. When we train a neural network, the weights of the model are adjusted based on input data to make accurate predictions. However, as a network trains, the distribution of the values in each layer’s output can change, making the training process more difficult.
Batch normalization addresses this problem by normalizing the output of each layer. Normalization means transforming the values so that they have a mean close to 0 and a standard deviation close to 1. The normalization formula is:
Mini-Batches
The normalization process is done within small groups of data, which we call “mini-batches”. Mini-batches are used because it is not practical to process all the data at the same time due to memory limitations and computational resources.
When using mini-batches in Batch Normalization, we indeed perform normalization based in statistics (mean and variance), computed from the data within each mini-batch. Due to the degree of variance in the data, this process introduces some noise in the normalization process, which could potentially lead to information loss.
The choice of batch size can influence the amount of noise introduced. In practice, larger batches can reduce noise, but it also increase memory usage and training time. The use of mini-batches is a compromise to make the training process computationally feasible. Without mini-batches the entire dataset would need to be processed at once.
Let’s see an example!