![](https://crypto4nerd.com/wp-content/uploads/2023/06/1ErRZZhByi9hdVNVkDQ6F2Q.jpeg)
In this article, we will cover the basics of the regression decision tree model.
A decision tree is a simple model for supervised learning. It breaks down the sample data using if-then rules into multiple possible decision paths, each with an associated outcome. The decision tree is widely used in inductive inference and is known for its simplicity, making it easily understandable and interpretable by humans.
The decision tree can be applied to both regression and classification problems. In this article, we will only focus on the regression decision tree.
The prediction begins at the root node and follows a path through the decision tree until reaching the “leaf” nodes, where the final output is determined. The figures depict an example of a regression decision tree. t represents some threshold.
The decision Tree consists of decision nodes (if-then rule), and each leaf node represents a decision region R_j in the feature space, as seen in Figures 1.1 (a) and (b).
Building Process
The regression decision tree adapts the top-down, greedy approach to building the tree. It’s being greedy by making locally optimal decisions, choosing the “best” split at each step without considering the optimal global solution. This means the algorithm selects the best available option at each step without considering the potential outcomes or exploring alternative paths.
But how the best split is found?
The process is repeated until a specific stopping condition is met. Some conditions that lead to not considering splitting the node are:
- The reduction in RSS is very small.
- The tree exceeded the desired depth.
- The number of samples on the left region side or the right region is too small.
Tree pruning
The adapted approach might generate a complex decision tree T_0 that describes the training data well but performs poorly with testing data, leading to overfitting. A better approach is to prune back the generated complex tree T_0 to obtain a subtree T that is less complex and provides a better interpretation for the dataset. However, considering the vast number of potential subtrees, How do we determine the most effective method for pruning the tree?
To address this issue, we require a method to choose a small set of subtrees for evaluation. The Cost Complexity Pruning method introduces a series of trees identified by a non-negative tuning parameter α. For every α value, there exists a corresponding subtree T.
The tuning parameter α plays a crucial role in balancing the complexity of the subtree with its ability to fit the training data. The selection of α can be achieved by employing the K-fold cross-validation method. Once the optimal α value is determined, it’s applied to the complete dataset to obtain the corresponding subtree.
In a nutshell, the process of building the regression tree can be summarized with the following Algorithm:
And that’s it. If you found this article to be helpful, please follow me 🙂
and don’t forget to clap 😀
[1] Murphy, K. P. (2012, August 24). Machine Learning. MIT Press.
[2] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013, June 24). An Introduction to Statistical Learning. Springer Science & Business Media.