Back to basics: Why I still prefer to use simple models? | by Jason Bardinas

In the era where data science terms such as “machine learning”, “artificial intelligence” and “deep learning” are no longer buzzwords to many, I believe most of us used state-of-the-art algorithms to several data science problems. And I totally get it. They are “fancy” and most of the time, provides better predictive capability than the “traditional” ones. In fact, in my four plus years in data science consulting space, more often than not, guilty as I am, I tend to use these complex models first because I know they will perform better than the simple ones. This saves me time even if I know that I need some benchmark models to begin with. Plus, with the advancement of automated machine learning where most models are fitted by just a line of code or a click of mouse away, we often forget how powerful simple models are. This realisation just occurred to me recently when a heavily excel-dependent clients asked me to solve a regression problem with very limited amount of data.

Simple models are class of machine learning models that are easy to implement and understand due to its relatively small number of parameters. Because they are usually based on simple mathematical equations or algorithms, most predictions made by the models are easier to explain. Examples are these include linear models such as linear regression (numerical prediction) and logistic regression (categorical prediction), Naïve Bayes (a probabilistic model based on Bayes’ theorem and creates a strong independence assumption between input variables), decision trees (a tree structure model where each of its internal nodes, branches, and leaf nodes represent features, decisions based on that features, and class labels, respectively), and k-Nearest Neighbours (k-NN; a model that classifies an input based on the class labels of the k-closest training examples to the input).

Because of its simplicity and based from my experience, simple models often under-performed compared to complex models (e.g. support vector machines, gradient boosting and neural networks). This particularly happened when the relationships are nonlinear and datasets are relatively enough to train the models.

Despite its under-performance, there are several merits on why I still prefer simple models. First, because of its high level of interpretability. Because they only require small number of parameters, they are easy to understand and explain. This makes them ideal for use in applications where transparency and explainability are important, such as in finance. In contrast, complex models like neural networks can be difficult to understand the reasoning behind their predictions. Also, simple models are known for their robustness to noise and outliers, making them less prone to overfitting. Overfitting occurs when a model is too complex and has too many parameters, which allows it to fit the noise in the training data. Unlike complex models where there are several hyperparameters required for fine-tuning, simple models have fewer parameters to fit, so they are less sensitive to small changes in the data and less likely that they will fit the noise in the training data, which allows them to generalise better to new data.

Simple models are also computationally efficient. They require less computational resources and are faster to train and test. This is my main reason why these models are my go-to benchmark models. I find them quick to train and can provide insights depending on what the clients want and/or need. Lastly, in the instance when there are only few training data available or when the cost of data collection, annotation and labelling are higher, these models can be a lifesaver. These models perform comparatively well even with less data to train. This can be a major advantage in the event where the clients can only provide small chunk of datasets for various reasons such as when data gathering, and labelling is costly or difficult.

In my own experience, simple models are often my “hero” especially when there are only limited data available to train. I can still provide acceptable results despite this challenge.

In summary, simple models are easy to understand and less prone to overfitting because they have a small number of parameters, which makes them less likely to fit the noise in the training data. This allows them to generalise better to new data and provide more interpretable results. They are also computationally efficient, and could provide quicker insights with the data which consequently save time and resources to both data scientists and clients.

Source link

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity