Unveiling the Power of Decision Tree Algorithm in Machine Learning | by Chamod Kandage

Introduction

Decision Tree algorithm is a commonly used algorithm in machine learning. It is coming under supervised learning. It can be used in both classification and regression tasks, but it is mostly preferred for classification problems [1]. The purpose of this article is to explore the decision tree algorithm in depth, discuss how it works, what kind of problems it can be solved, advantages and disadvantages etc. Decision Tree is a white box type machine learning algorithm. Which means it gives insights in to the algorithm inner workings, in terms of how it behave / Process data / generate predictions and which variables they give weight to [5]

In-Depth Analysis

When it come to the decision tree algorithm, its important to have an idea about its nodes and terminologies.

Root Node — This is the starting point of the algorithm
Leaf / Terminal Node — Final output of the tree. This cannot segregate further
Splitting — Process of dividing the root node into sub nodes according to the criteria
Internal / Decision Node — A node that symbolize a chance regarding an input feature
Parent / Child Node — Root Node of the tree is the parent node while other nodes are child nodes [1,2,6].

It can be visualized as follows.

Since this model uses the supervised learning technique, decision tree algorithm uses labeled datasets as input. Based on the features, target variable can be a classification (Categorical) or Regression (continuous). This algorithm works in an interesting way.

How it Works

Begins with the root node and finds the best attribute in the data set using Attribute Selection Measure (ASM)
Divide the root node into sub sets that contains possible values for the best attribute
Generate the decision tree node which contains the best attribute.
Recursively make new decision trees using sub set created in step 2 And continue this process until cannot further classify the nodes by obtaining the leaf node [1].

Note that there are two popular techniques for above mentioned Attribute Selection Measure (ASM)

Information Gain — measurement of changes in entropy after segmentation of the dataset based on an attribute. Algorithm always tries to maximize the information gain by having the highest information gain first [1].

Entropy — Metric measure the randomness of data.

Gini Index — Measure of impurity used while creating a decision tree in the CART Algorithm and an attribute with low Gini Index should be preferred with low Gini index should be preferred as a compared to the high Gini Index [1].

Since Decision trees are easy to understand /Interpret, can handle high dimensional data effectively and fast to train and make predictions they are often used in data science projects. Nevertheless, decision trees can be used for the problems such as churn prediction, spam filtering, production recommendation, predict stock prices & housing prices etc. [3]

There are different types of Decision Tree algorithms as well.

ID3(Iterative Dichotomiser 3) — Use in classification applications. It greedily chooses the feature with highest information gain and splits accordingly. Here Entropy, information gain, and recursive partitioning are three key principles. The decision tree may be trimmed after it is constructed in order to enhance generalization and lessen overfitting. In order to do this, nodes that do not considerably improve the correctness of the tree must be removed [6].
C4.5 — This handle both discrete and continuous attributes. The model’s accuracy is increased and overfitting is prevented by its utilization of gain ratio and decreased error pruning. Gain ratio is computed by dividing the information gain by the intrinsic information, which is a measurement of the quantity of data required to characterize an attribute’s values [6].
CART (Classification and Regression Trees) — Here dependent variable and input variables can be categorical/ continuous. And splits each node Binary; split on linear combinations [4].
CHAID (Chi-Square Automatic Interaction Detection) — The measure used for input variable collection is Chi-square. The dependent variable should categorical and input variables should be either categorical or continuous for this type [4].

Advantages of the Decision tree Model

Simple to comprehend.
Quickly translated to a set of principle for production.
Can classify both categorical and numerical outcomes, but the attribute generated must be categorical.
No a priori hypothesizes are taken with consideration to the goodness of the results [4].

Disadvantages of the Decision tree Model

The optimal decision-making mechanism can be deterred and incorrect decisions can follow.
There are lots of layers in the decision tree, which makes it interesting.
For more training samples, the decision tree’s calculation complexity may increase [4].

One of the research problems that Decision Tree can be used is to ‘Detect the fake news articles on social media’. In order to do that a large dataset of news articles, which were labeled as “True News” or “False News” can be chosen based on fact checking with resources. Then relevant features should be extracted from the data. (Eg: reliability level, Sources credibility, Social media engagement etc.) After, data set can be divided in to training and testing data set with a suitable ratio (70%:30%, or 80%:20%). Then using relevant libraries (Eg: sklearn etc.) model can be trained and tested. One of the big advantages on this, is that it can be detect fake news early and take the necessary actions quickly (Eg: Prevent spreading fake News)

Conclusion

This article explores the decision tree algorithm; a widely used algorithm in supervised learning and discussed the core concepts including Nodes, how does it work, types of decision tree algorithms, advantages and disadvantages. And also discussed the importance of this model for the projects including churn prediction, spam filtering, stock price forecasting since it handles the high dimensional data effectively and relatively fast in training and making predictions. Apart from that, identified a research problem where this model can be applied for ‘Detect the fake news articles on social media’ and this early detection capability could be crucial in controlling the spreading of misinformation. However, limitations like susceptibility to overfitting and potential for generating overly complex model were also acknowledged. In conclusion, Decision trees offer a powerful and interpretable approach to machine learning and by understanding the strengths and limitations and other applications it equips user with a valuable tool for tackling various problems in data science.

References

javaTpoint (2021). Machine Learning Decision Tree Classification Algorithm — Javatpoint. [online] www.javatpoint.com. Available at: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm.
GeeksForGeeks (2017). Decision Tree — GeeksforGeeks. [online] GeeksforGeeks. Available at: https://www.geeksforgeeks.org/decision-tree/.
TowardsAnalytic (2023). Decision Tree Algorithm: Understanding and Implementing. [online] TowardsAnalytics. Available at: https://www.towardsanalytic.com/decision-tree-algorithm/ [Accessed 31 Mar. 2024].
Charbuty, B. and Abdulazeez, A. (2021) “Classification Based on Decision Tree Algorithm for Machine Learning”, Journal of Applied Science and Technology Trends, 2(01), pp. 20–28. doi: 10.38094/jastt20165.
Tsang, D. (2023). White Box vs. Black Box Algorithms in Machine Learning. [online] ActiveState. Available at: https://www.activestate.com/blog/white-box-vs-black-box-algorithms-in-machine-learning/.
GeeksforGeeks. (2023). Decision Tree Algorithms. [online] Available at: https://www.geeksforgeeks.org/decision-tree-algorithms/.