![](https://crypto4nerd.com/wp-content/uploads/2023/08/0enmSv8qZdqnM5Y0r-1024x683.jpeg)
The Gini index is a popular tool within Data Science that is responsible for deciding how decision trees split. However, most practitioners are unaware that the Gini index originally came from economics as a measure of wealth inequality. In this post, I want to dive into the details of the Gini index and its origins.
Introduced by Corrado Gini in 1912, the Gini index (or coefficient) measures the distribution of income within a country or state. A Gini index of 0 indicates perfect inequality where everyone has the same income; this is a uniform distribution. Whereas an index of 1, means maximal equality where essentially one person has all the wealth.
According to World Bank data, Slovenia has one of the worst Gini indexes at 0.24, and South Africa is one of the best at 0.64. Of course, this is just one statistic to describe the economic state of a country. Like everything in statistics, it needs context and further data to explain the whole picture.
A visual way to see the Gini index in action is through the Lorenz curve:
The diagonal dashed orange line represents perfect equality, as the change in cumulative wealth is perfectly linear with the change in cumulative population.
The Lorenz curve is meant to represent the real cumulative wealth vs cumulative population plot. The further the Lorenz curve is from the diagonal line, the greater the income inequality is.
The area between the Lorenz curve and the diagonal line is directly proportional to the Gini index.
Mathematically, this means:
Where A and B are the corresponding areas shown in the plot above, one can see that as the Lorenz line…