![](https://crypto4nerd.com/wp-content/uploads/2023/02/1_26y9Y64GU-2oo1hcADfsA.png)
Maximum likelihood is an important principle in machine learning and statistics.
Many of my students have a hard time understanding this concept, so I have decided to write an article that explains this concept in simple terms and, and provide a few examples.
Assume that we have a set of n data points, let’s denote them by X = {x₁, x₂, … , xₙ}. We assume that these points are identically and independently distributed (known as iid) from some probability distribution P(X; θ) with unknown parameters θ. For example, if the data points come from a normal distribution, θ could represent the mean and standard deviation of the distribution.
Our goal is to find the set of parameters that best describes our data set, or in other words maximizes the likelihood of obtaining these data points if we were sampling them from the distribution P.
The likelihood of our model (represented by the parameters θ) is defined as the probability of obtaining the data X given our model, or in other words:
Since the points in X are identically and independently distributed, we can write the probability of X as a product of the probabilities of the individual data points in X:
For practical reasons, the log likelihood is more commonly used:
Maximizing the log likelihood is the same as maximizing the likelihood, since the logarithm function is monotonically increasing.
Maximum Likelihood Example
Assume that we have n points generated from 1D Gaussian distribution and we would like to find the parameters of this distribution
}The likelihood of the parameters m and s given the data is:
}Thus, the log likelihood is:
To find the parameters μ and σ that yield the maximum likelihood, we can take the derivatives of the log likelihood with respect to them and set them to 0:
Conclusion: the maximum likelihood mean is the mean of the given points, and the maximum likelihood standard deviation is the standard deviation of the points.