Receiving-Operating Characteristic (ROC) curve : a step-by-step explanation | by Jonathan Barsotti

The Receiver Operating Characteristics (ROC) is a graphical plot used to describe the diagnostic ability of a binary classifier. It is extensively used in many fields, spanning from research to industrial applications, as well as military, from which it derives. The ROC curve is also a standard tool in Statistics and Machine Learning for evaluating model performances. It is largely used to quantify the ability of a binary classifier through the variation of the threshold used by the model to discriminate between classes. For each ROC curve point, the threshold determines what goes in class 1 and what in class 2, defining the true positive rate vs false positive rate, reported on the plot. For historical and more formal technical details please check the Wikipedia page at this link Receiver operating characteristic — Wikipedia.

Despite its extensive usage, the ROC curve can often be cause of confusion. With this article, I try fixing the key points behind the construction of a ROC curve, providing a straightforward and simple description of each step.

Let’s start considering a realistic classification problem for which we could use the ROC curve.

Task: we own a grocery store and we want to catalog all our take-away meals. We have a meals-list containing all the meals in the store. For each of them we want to assess the presence of an ingredient, for example gluten.

We assume that our classifier, whatever it is, is already trained. This means it is able to make predictions according to what it has learned before. The ROC curve is going to tell us how good the classifier is at making predictions. The classification consists in labeling each meal with its prediction about the presence or the absence of gluten, respectively indicated with the label 1 and 0.

In the following schema, I broke the ROC curve construction process down into 8 steps, illustrated by the following panel and analyzed in detail below.

Step 1: scores assignment

The model evaluates all the meals and assigns a score to each one, i.e. quantify the likelihood of having/not having the target ingredient with a number.

KEY POINT: conversely to the class assignment phase that is repeated over each threshold (we will see it in Step 4), the score of each meal is assigned once and only once in the whole process, during Step 1.

Step 2: scores ordering

Records are listed by score in descending order, from higher to lower. In the panel example, the score span from -4.2 (min) to +3.8 (max).

Step 3: threshold definition

Let’s assume we want 30 points for our ROC curve. We collect 30 equally spaced points from our score range (-4.2 to +3.8). Then we list them by descendent order: these points are going to become our thresholds values to test. One threshold for each ROC point… but let’s see how!

KEY POINT: thresholds are picked up from the score values range-. Despite this relation between them, scores and threshold have different roles and must be kept separated concepts.

Scores are assigned by the classifier to each meal to quantify the likelihood of being or not a 1 or a 0.

Thresholds are used in ROC curve construction as scores delimiters to determine how to use the score to decide which meal goes into class 1 and which goes into class 2.

Step 4: cycling over threshold

For each threshold, we perform steps from 5 to 7, generating a confusion matrix for each threshold. Steps 5 to 7 permit to the replicate the classification procedure (assigning 1 or 0 according to the score) while using each time a different threshold as discrimination value for the two classes.

As we will see in next steps, for each threshold a classes assignment is performed. Meals with scores above the threshold are classified as ones, while those below as zeros. Then, according to the actual labels of the meals, the true and false positive rates are computed for the specific threshold under consideration. The process is repeated for each threshold. Let’s see the details!

Step 5: classification

During the iterative phase (step 5 to 7), once at time all thresholds are used, one for each ROC point. Then for each point all the meals are classified. The process uses thresholds in descending order, starting from the highest till reaching the lowest.

Chosen a threshold, the meals list is split in 2 parts and all the meals are classified:

scores above threshold → 1
scores below threshold → 0

After classification, for each meal, its prediction is compared with the real label and the algorithm keeps track of all the results counting the occurrences of all the four possible combinations, divided as:

Correct predictions

true positive (TP), prediction = label and both 1
true negative (TN), prediction = label and both 0

Wrong predictions

false positive (FP), prediction = 1, label = 0
false negative (FN), prediction = 0, label = 1

Step 6: confusion matrix

Once all the meals prediction have been checked, their cumulative combinations can be combined into the confusion matrix that is stored in memory.

KEY POINT: each confusion matrix is built considering a specific threshold. Evaluating the model with a different threshold will affect the model/classifier performances, and then the confusion matrix values! The confusion matrix is not an absolute metrics, but depends on the threshold used for the classification.

Step 7

Arrived here, it is possible to use the quantities just obtained for the current threshold to compute the quantities actually plotted on the ROC curve:

true positive rate (tpr)= TP / ( TP + FN )
false positive rate (fpr) = FP / ( FP + TN )

The true positive rate describes how good the model is at “hitting” real positives (i.e. meals that actually contains gluten) and making good predictions of the presence of gluten. Note that it not only depends on how many hits are done (TP, the numerator), but TP are normalized by the “real positives” that have been examined, containing both the correctly hit and the misses (TP + FN, the denominator).

Analogously, the false positive rate describes how frequently the classifier makes mistakes in classifying the presence of gluten. Also in this case, the mistakes absolute value (FP, the numerator) is normalized by the total number of the real negatives (FP + TN, the denominator, i.e. meals that do not contains gluten).

KEY POINT: arrived here, the process restart from Step 5 for the next threshold till the last one is reached. Then move forward to Step 8.

Step 8: plotting the ROC curve

Once both the true and false positive rates are computed for all the thresholds, it only remains to plot them in the graph… And here we are, with our brand new ROC curve!

KEY POINT: the threshold is not explicitly visualized in the ROC curve. However, it is directly embedded in it, since each point of the curve is computed using a specific threshold value that determine the boundary for the scores separation into two classes, above and below the threshold.

In this article I provided a detailed step-by-step guide for the ROC curve construction. After a general introduction, the ROC curve construction process has been broke down into 8 steps, each of which has been explained following along a simplified example.

The procedure to evaluate the prediction performance, remains valid for any kind of binary classifier (e.g. human beings, Machine Learning algorithms, coin tosses, etc…). No matter if the task has been accomplished by a ML algorithm scanning the ingredient list, a person tasting them with closed eyes as well as guessing them with a coin toss: in all the cases, the very same construction process remains generally valid.

If you wish to see how ROC curve is used to interpret binary classifiers performances, I wrote another article in which I show how to compute the ROC curve “by-hands” with a step-by-step guide with Python code implementing all the 8 steps described in this article. You can read the article here.

Source link