“Unveiling the Power of ROC Curves and AUC: Your Guide to Evaluating Model Performance in Plain English” | by Nimisha Singh

ROC AUC CURVE

Why do we need ROC CURVE?

“THRESHOLD SELECTION” — This is the main reason for which we need to use ROC CURVE.

This is related to the “Classification Term” very much and we can say with “BINARY CLASSIFICATION”

Threshold Selection :

Ex- We have supposed college data with features of “IQ”, “CGPA” and “Placement” so students get placed or do not have the result in binary “YES” or “NO.
Now will divide the data into two parts “Training set”, and “Test set”. Train model on training data and test model on test data.
So in testing data when we get a result from the model it directly won’t give results in “0” & “1” The model gives results in “Probability” by providing any “number” which will tell whether the student will be placed or not.
Ex- one student’s result prediction is — 0.45 or 45% and like this for all students we will get probability-based results.
Now we have to convert this “Probability result” into the class level “0” & “1” by deciding “THRESHOLD”
We have to choose the threshold value so that we can divide the result into both class levels and when we decide the threshold manually by intuition it can be right or wrong when we compare the result with the actual one.
Ex- We decide Threshold = 0.5 and based on this we decide the student probability is more than the threshold will placed and below threshold student wont placed.

BUT ALWAYS THRESHOLD “0.5” WOULD NOT WORK.

EX — EMAIL CLASSIFICATION (Trained model with a lot of emails for predicting “Spam”, Not Spam” mail identification.

There could be two mistakes can be made by the model –

EMAIL NOT SPAM — PREDICT SPAM

EMAIL SPAM — PREDICT NOT SPAM

“Sometimes both mistakes won’t have similar importance it varies from case to case.

Suppose I got mail for an interview and my model put this mail in spam which is not actually spam mail so my model made a blunder for me I will rely on the model and I will update the Threshold value to — “0.75” and train the model and it will help to reduce this mistake.
This is the power of “threshold” so based on the model performance we can increase or decrease the threshold value.

How much value should choose in “threshold” we can decide by “ROC CURVE”

It is like a “Report Card” for “Binary Classification”.

In one glance we can understand how our model performs.

True Positive (TP): Correctly predicting a label (we predicted “yes”, and it’s“yes”),

True Negative (TN): Correctly predicting the other label (we predicted “no”, and it’s “no”),

False Positive (FP): Falsely Predicting a label (we predicted “yes”, but it’s “no”),

False Negative (FN): Missing and incoming label (we predicted “no”, but it’s “yes”).

True Positive Rate-

TPR = TP / (TP + FN)

It will give an intuition of benefit. How much benefit can get from the system.

Ex- Creating Netflix Churn rate prediction model to find user patterns.

- “1” — Leave the platform, “0”– will not leave the platform

So suppose we have 100 customers who want to leave Netflix and my model detects 80 only so my “TPR” will be 80%. We always want to maximize “TPR” as it can solve the problem better. When “FALSE NEGATIVE” WILL BE ZERO THEN TRUE POSITIVE WILL BE 100 %.

False Positive Rate-

FPR = FP / (FP + TN)

“TREAT IT AS COST” How much expensive model will be? We create any model for getting a solution and if the model does not perform 100 per accuracy base then it is a cost that needs to suffer. If the customer leaves the platform due to a wrong prediction it will add cost because in the churn rate suppose the model said these people leave the platform and we give some benefits to hold those customers and in actuality, they are never thought to leave the platform so we add cost in holding them which was not needed even.

Ex- Email Spam — Out of all those not spam email how many email does our model say is spam?
Ex- Netflix Churn — Out of all those people who are not living the platform model said will leave the platform.

ROC CURVE –

(RECEIVER OPERATOR CHARACTERISTIC)

(BENEFIT & COST MODEL) On “The axis’s FPR and on “The axis’s TPR. Inside all lines called “ROC CURVE”
“Graph always between 0 and 1 as TPR and FPR value can between only 0 and 1.

EXPLANATION:-

Suppose we have student data with few labels and need to find the placement prediction of the student.
So we decided to perform the “Logistic Regression” model as we need prediction in binary. Then divide the data into “training” and “testing set” train the model on the training set and check the result on test data to check the accuracy of the model by trying out different thresholds (0.3,0.5, 0.6,0.8) so for every threshold value we get a “confusion matrix” and with every confusion matrix, we can calculate “TPR & FPR”. So we will get “TPR, FPR” values for all thresholds that we will put on the graph and it will create a “CURVE” which is called “ROC CURVE” and by watching the ROC curve we can decide which threshold is best to use. And near “1” whichever threshold point occurs will choose that as the best threshold value.

WHEN WE DECREASE THE THRESHOLD VERY LOW (0.1 CLOSE TO 0) FPR AND TPR WILL INCREASE AND MODEL PERFORMANCE WILL BE VERY BAD,

NOW IF WE INCREASE THE THRESHOLD AROUND(0.99 CLOSE TO 1) TPR and FPR will DECREASE or can say point come around 0 on the graph. THIS MEANS WE ARE DETECTING VERY LOW PREDICTION IN THE RIGHT MANNER.

Ex- in Email spam, we are predicting very few mail as spam if we take a threshold of 99 percent so if any probability value comes above this rate will count as spam.
In this case, TP will decrease as we did not predict actual spam emails as spam.
And FN will increase as we did not call actual spam as spam. So eventually TPR WILL INCREASE.

Now if I reduce my threshold from 0.99 to 0.85 means model will predict a little more spam. Then True Positive (TP) will Increase and False Positive (FP) decrease. So TPR will Increase.

But on this threshold FPR won’t increase on which rate TPR increases or can say FP(False Positive) won’t increase, FP means the mail which is “NOT SPAM” and the model predicts “SPAM”. So the graph will increase more in the “y direction” compared to the “X direction” so It will start taking “CURVY SHAPE”

import pandas as pddata = pd.read_csv('https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv')
data.head()

Source link

AUC- ROC

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity