Understanding L1 and L2 Regularization: A Tale of Balancing Bias and Variance | by Sarathi Prabu

The Basics: What is Regularization?

Before we dive into L1 and L2, let’s grasp the concept of regularization itself. In the world of machine learning, when you train a model, you aim to minimize its error or loss function. However, sometimes, models tend to overfit the training data, becoming overly complex and fitting noise instead of the underlying patterns.

Regularization comes to the rescue by adding a penalty term to the loss function, discouraging the model from becoming too complex.

Meet L1 Regularization (Lasso): The Feature Selector

L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator), is like a skilled detective in a crime novel. Its main objective is to select the most relevant features while ignoring the noise.

How does it work?

Well, L1 adds a penalty to the loss function based on the absolute values of the model’s coefficients. Here’s the cool part: it can drive some of these coefficients to exactly zero! In other words, it automatically selects the most important features and discards the less relevant ones. This is incredibly useful when you have a large number of features, and you want to keep your model lean and mean.

L2 Regularization (Ridge): The Smoother

On the other hand, L2 regularization, also called Ridge, is like a soothing balm for a model’s complexity. Instead of being a feature selector, L2 focuses on keeping all the features in check. It does this by adding a penalty based on the square of the model’s coefficients. While it doesn’t drive coefficients to zero like L1, it prevents them from becoming excessively large. This helps in reducing the model’s sensitivity to small changes in the input data, making it more robust and less likely to overfit.

Choosing Between L1 and L2

Now, you might wonder, “When do I use L1, and when do I use L2?” Well, that’s the beauty of machine learning — it’s an art as much as it is science.

L1 (Lasso): Use this when you suspect that only a subset of your features is truly influential, and you want to automatically select them. For example, in financial modeling, L1 can help identify the most crucial economic indicators affecting stock prices.
L2 (Ridge): Employ L2 when you believe all your features are important, but you want to prevent your model from being overly sensitive to any one of them. In scenarios like image recognition, where every pixel might play a role, L2 regularization can be your best friend.

Source link