![](https://crypto4nerd.com/wp-content/uploads/2023/11/1zluie6nvX3xb3SK1Az_fNg-1024x286.png)
These three carts can be seen as three different data distributions. If we assumed that there are two classes (apples and bananas) initially, then the interpretations that follow would be incorrect. Rather, think of each cart as a different distribution — so the first cart is a data distribution where all data points belong to a single class, and the second & third carts are the data distributions with two classes.
Looking at the example above, it is easy to identify the carts with the most pure or impure data distributions (class distributions to be precise). But in order to have a mathematical quantification of purity in a dataset so that it can be used by an algorithm to make decisions, entropy and Gini Index come to rescue.
Both of these measures look at the probability of occurrence (or presence) of each class in a dataset. In our example, we have a total of 8 data points (fruits) in each case, so we can compute our class probabilities for each of the carts as follows:
Now we are equipped with everything we need to dive into formal definitions of Entropy and Gini Index!
As already discussed, both entropy and gini index are a measure of the degree of uncertainty or randomness in data. While they aim to quantify the same fundamental concept, each has its own mathematical formulation and interpretation to achieve that.
Entropy
Given a labeled dataset where each label comes from a set of n classes, we can compute entropy as follows. Here pi is the probability of randomly picking up an element from class i.
To determine the best split in a decision tree, entropy is used to compute information gain, and the feature contributing to the maximum information gain is selected at a node.
Gini Index
Gini Index attempts to quantify randomness in a dataset by finding an answer to this question — What is the probability of incorrectly labeling an element picked randomly from the given data?
Given a labeled dataset where each label comes from a set of n classes, the formula to calculate gini index is given below. Here, pi is the probability of randomly picking up an element from class i.
This formula is often reframed as follows as well:
(Note: The sum of all class probabilities is 1).
Gini index is an alternative to information gain that can be used in decision tree to determine the quality of split. At a given node, it compares the difference between the gini index of the data before split and weighted sum of gini indices of both branches after split and chooses the one with the highest difference (or gini gain). If this is unclear, don’t worry about it for now since it needs more context, and the goal of this article is to just have a basic intuition behind the meaning of these metrics.
Going back to our example
To make things easier to understand, refer to our shopping cart example, we have three datasets — C1, C2, and C3, each of which has 8 records with labels coming from two classes — [Apple, Banana]. Using the probabilities calculated in the table above, let’s unroll both of these computations for Alice’s cart: