![](https://crypto4nerd.com/wp-content/uploads/2023/06/0C7HRUKJcvbwx6iNV-1024x681.jpeg)
The binomial distribution is a widely used statistical distribution that Data Scientists should be familiar with, as it appears in numerous contexts. One notable example is its application in supervised learning problems for classification, where the loss function, the cross-entropy loss, is derived from the binomial distribution. In this post, we will explore the intuition, theory, and examples associated with this distribution.
The binomial distribution is a discrete distribution that measures the likelihood of achieving a specific number of successes in a given number of trials. For instance, it can answer the questions “What is the probability of obtaining 2 heads from 5 coin flips?” In this context, each trial represents either a success (coin landing on heads) or a failure (coin landing on tails). These individual trials are known as a Bernoulli trial or process, where each trial essentially poses a binary (yes-no) question.
You can view that the probability of a success is p, therefore, there is a probability of 1-p for it to be a failure, given its binary nature. Consequently, the probability mass function (PMF) takes the following form:
Where X is a random variable from the Bernoulli distribution and k is the outcome of the trial. Notice how if k=1, then the probability is just p.
The binomial distribution is a combination of Bernoulli trials with a given number of successes, k. To derive the binomial PMF, we incorporate both the binomial coefficient and the number of trials, n, into the Bernoulli PMF:
In general, there are conditions for the binomial distribution:
- Number of trials, n, is fixed
- Each trial is independent
- Each trial has two outcomes
- The probability of a success, p, is the same for every trial
Let’s go back to the question we posed before: “What is the probability of obtaining 2 heads from 5 coin flips?”
Notice that the probability of obtaining 2 heads is reasonably small. It is important to remember that this is the probability for exactly 2 heads. Therefore, there are additional possibilities where three, four, or even five heads occur.
To gain a deeper understanding, let’s visualize the distribution of probabilities by plotting it as a function of the number of successes. Essentially, we will be displaying the probability mass function (PMF).
We observe that the most probable outcome, or the expected value, is 0.5, which makes sense. However, do you notice any other characteristics regarding the shape of the distribution? How about if we plot it for 50 trials:
Notice how it increasingly resembles a normal distribution. This phenomenon is referred to as central limit theorem! The central limit theorem states that as the sample size grows larger, the distribution tends to a normal distribution.
Link here for an article that explains the central limit theorem in more depth
In this blog post, we have explored the binomial distribution. This discrete distribution calculates the probability of achieving a specific number of successes within a given number of trials. The binomial probability distribution finds application in diverse industries, including commodity trading, insurance and supply chain operations. Therefore, it is a valuable concept for Data Scientists to be aware of.
The full code is available on my GitHub here:
(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)