AutoML using AutoGluon in Python. NO! THIS ARTICLE IS NOT GENERATED BY… | by Sreedhar Sambamoorthi

NO! THIS ARTICLE IS NOT GENERATED BY CHATGPT 🙂

Machine learning model development as an exercise has transformed tremendously over the years. Go back a decade or so and model development involved several lines of code whether it be exploratory data analysis, modeling, hyperparameter tuning and what not. Now, the world has become more optimised, more streamlined, more productive with very very less lines of code. AutoML hence has taken off significantly and become a very popular tool amongst development teams across the world. One such frameworks is AutoGluon

Installing AutoGluon

First thing first, is to install Autogluon. The method is the same where you use pip command. One thing to ensure is that this package does have dependencies which it automatically uninstalls and installs so I would advise you to create a virtual environment before proceeding

Loading the data and importing the packages

import pandas as pd
import os
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularDataset, TabularPredictor##Setting Workspace and Directory
print(os.getcwd())
os.chdir(__your_location__)
print(os.getcwd())
##Reading Train data
train_data=pd.read_csv('train_data.csv')
#Reading Test Data
test_data=pd.read_csv('test_data.csv')

The next steps would be feature scaling, normalisation and treatment which are not in the scope of this article. It is always better to normalise the features on your own by looking at feature distribution and graphs

Building the model

The model requires initialising the evaluation metric, the dependent variable and the directory in which the results are stored. In the below example, I am using “f1” as the evaluation metric. The dependent variable is “Category” and the models are being put in “output_models_version3” folder

##Building the Binary Classifier#Earmarking basic metrics
evaluation_metric="f1"
data_label="Category"
save_path="output_models_version3"
#Creating the predictor
predictor=TabularPredictor(label=data_label,path=save_path, eval_metric=evaluation_metric)
predictor=predictor.fit(train_data)
predictor.leaderboard(silent=True)

The leaderboard gives you visibility of what all models have been tried and the scores that you are getting on them. As you would see, these are not your average Randome Forest or GBM models but customised models that have been built as part of the model development like “WeightEnsemble_L2”

Leaderboard of the models in the Autogluon package

##Getting feature importanceX = train_data
predictor.feature_importance(X)

Feature importance of the variables in the model

All the models that have been built are stored in the output folder “output_models_version3”

If you go into “models” folder, you will find all the model objects that have been build as part of this model training exercise. This can be helpful when you are doing your model versioning as part of your MLOps exercise.

All the models stored in “models” subfolder

Scoring on testing data

You will need to load the model objects from the folders where the model is stored.

binarypred = TabularPredictor.load("output_models_version3/")

The scoring on test_data will be done using only the best model but you can always specify which model you want to use for scoring

y_pred_proba_binary=binarypred.predict(test_data,model='WeightedEnsemble_L2')

Conclusion

You can experiment with multiple features of this awesome framework like feature generation, time series, text and so on. Do try it out yourself!

Source link

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity