![](https://crypto4nerd.com/wp-content/uploads/2023/07/1hsNDEZO6i-8n_cWt6UpHzA-1024x376.png)
NO! THIS ARTICLE IS NOT GENERATED BY CHATGPT 🙂
Machine learning model development as an exercise has transformed tremendously over the years. Go back a decade or so and model development involved several lines of code whether it be exploratory data analysis, modeling, hyperparameter tuning and what not. Now, the world has become more optimised, more streamlined, more productive with very very less lines of code. AutoML hence has taken off significantly and become a very popular tool amongst development teams across the world. One such frameworks is AutoGluon
Installing AutoGluon
First thing first, is to install Autogluon. The method is the same where you use pip command. One thing to ensure is that this package does have dependencies which it automatically uninstalls and installs so I would advise you to create a virtual environment before proceeding
Loading the data and importing the packages
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularDataset, TabularPredictor##Setting Workspace and Directory
print(os.getcwd())
os.chdir(__your_location__)
print(os.getcwd())
##Reading Train data
train_data=pd.read_csv('train_data.csv')
#Reading Test Data
test_data=pd.read_csv('test_data.csv')
The next steps would be feature scaling, normalisation and treatment which are not in the scope of this article. It is always better to normalise the features on your own by looking at feature distribution and graphs
Building the model
The model requires initialising the evaluation metric, the dependent variable and the directory in which the results are stored. In the below example, I am using “f1” as the evaluation metric. The dependent variable is “Category” and the models are being put in “output_models_version3” folder
##Building the Binary Classifier#Earmarking basic metrics
evaluation_metric="f1"
data_label="Category"
save_path="output_models_version3"
#Creating the predictor
predictor=TabularPredictor(label=data_label,path=save_path, eval_metric=evaluation_metric)
predictor=predictor.fit(train_data)
predictor.leaderboard(silent=True)
The leaderboard gives you visibility of what all models have been tried and the scores that you are getting on them. As you would see, these are not your average Randome Forest or GBM models but customised models that have been built as part of the model development like “WeightEnsemble_L2”
##Getting feature importanceX = train_data
predictor.feature_importance(X)
All the models that have been built are stored in the output folder “output_models_version3”
If you go into “models” folder, you will find all the model objects that have been build as part of this model training exercise. This can be helpful when you are doing your model versioning as part of your MLOps exercise.
Scoring on testing data
You will need to load the model objects from the folders where the model is stored.
binarypred = TabularPredictor.load("output_models_version3/")
The scoring on test_data will be done using only the best model but you can always specify which model you want to use for scoring
y_pred_proba_binary=binarypred.predict(test_data,model='WeightedEnsemble_L2')
Conclusion
You can experiment with multiple features of this awesome framework like feature generation, time series, text and so on. Do try it out yourself!