Machine Learning Practice Problem 1: Credit Score Classification — Part 1 | by Rahul Pandey

So for the past few days, I have been working on some non coding stuff. So in order to brush up my coding skills, I decided to start a series of blogs where I will share my journey on practicing a problem statement with my readers.

So today we are going to start with Credit Score Classification. Today, I’m diving into an article that unravels the mystery behind how banks and credit card companies determine who’s creditworthy. It’s all about credit scores, and these scores are like financial report cards. They tell banks who’s likely to pay back loans without a hitch.

In the modern world, technology, specifically Machine Learning (ML), plays a pivotal role in this. Banks and credit card companies are now using ML algorithms to sift through customer data, analyzing credit histories to classify customers into different credit score categories such as ‘Good’, ‘Standard’, and ‘Poor’.

The article is a treasure trove for anyone keen on learning the ropes of using ML for credit score classification, guiding you through the process with Python. It even offers access to a dataset that’s perfectly suited for this task, labeled based on the credit history of credit card customers. So, if you’re up for the challenge, this article is your gateway to mastering credit score classification using ML and Python! 🚀

Let’s Dive in….

So if we you have ever studied or heard about machine learning implementations, the logical first step would be to read the data using pandas. So here is my first step to read the data and print the first five rows of the dataset.

# Library imports
import pandas as pd# Reading Data
data = pd.read_csv("./data/train.csv")
# Printing Data
print(data.head())

Going forward we are going to check if there is a null value present in any of the column. This is very important as if there is a value missing inside the dataset, it can result in data imbalance.

# Library imports
import pandas as pd# Reading Data
data = pd.read_csv("./data/train.csv")
# Check if there is a null value in any of the columns of the dataset
# isnull() returns true or false for all the column and rows based on presence of null value
# sum() returns the count of true values per column
print(data.isnull().sum())

Next we are try to plot a graph between a persons occupation and his credit score. This can give us some insight about how the credit score feature behaves on varying between the persons occupation.

# Library imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt# Reading Data
data = pd.read_csv("./data/train.csv")
# Plotting a box plot between a persons occupation and there credit score
# inorder to understand how these two factors effect the credit score
# Mapping Credit_Score to numerical values to calculate an average score
score_mapping = {'Poor': 1, 'Standard': 2, 'Good': 3}
data['Numeric_Score'] = data['Credit_Score'].map(score_mapping)
# Define the color palette
palette = {'Poor': 'red', 'Standard': 'yellow', 'Good': 'green'}
# Create the bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x='Occupation', y='Numeric_Score', hue='Credit_Score', data=data, errorbar=None, palette=palette)
# Customize the plot
plt.title('Average Credit Scores Based on Occupation')
plt.xlabel('Occupation')
plt.ylabel('Average Credit Score')
plt.xticks(rotation=45)
plt.yticks(ticks=[1, 2, 3], labels=['Poor', 'Standard', 'Good'])  # Setting y-ticks labels to credit scores
plt.legend(title='Credit Score')
# Show the plot
plt.show()