![](https://crypto4nerd.com/wp-content/uploads/2023/11/0uSJ4sY3PnvGY19ac.jpg)
Credit card fraud is a persistent problem in the world of finance. Detecting fraud is essential to protect cardholders and financial institutions. In this project, we are going to explore various methods and strategies for fraud detection.
This article follows our journey, from exploring data and trying logistic regression to using advanced techniques like SMOTE and reweighting.
Our main goal is to find the right balance between precision and recall, two vital metrics for fraud detection, while keeping false alarms to a minimum.
Data exploration
The dataset contains transactions made by credit cards that occurred in two days, where we have 492 frauds out of 284,807 transactions.
Due to confidentiality concerns, the original features have not been provided and were transformed using PCA, resulting in 28 features. The only features that were not transformed with PCA are ‘Time’ and ‘Amount.’
- Time: Represents the seconds elapsed between each transaction.
- Amount: Denotes the transaction amount.
- Class: This is the target variable, taking a value of 1 in the case of fraud and 0 otherwise.
The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
Fraud and spam datasets often exhibit class imbalance due to the rarity of these occurrences in real-world scenarios, the higher priority of accurately detecting them, and the potential consequences of misclassification.
We created a bar chart to visualize the distribution of values in the “Class” column of the DataFrame
In the bar chart (Figure 1), we can see that fraudulent transactions “1” occur significantly less frequently compared to legitimate transactions “0”.
Also, the feature ‘Amount’ is highly skewed to the right (Figure 2), this can impact the performance of machine learning models, especially when dealing with sensitive tasks like fraud detection.
So to tackle the problem, we are going to apply logarithmic transformation to the feature.
The following histogram shows the transformed “Amount” feature that is signifcantly better now.
Metrics in fraud detection
Precision and recall are two important evaluation metrics in binary classification tasks like fraud detection, but we are going to prioritize recall over precision in this case.
In the context of fraud detection, missing actual fraudulent transactions (false negatives) can be much more costly and harmful than falsely flagging legitimate transactions as fraud (false positives). A high recall ensures that the model can identify as many actual fraudulent cases as possible, reducing the chances of overlooking fraudulent activities.
Precision is the ratio of true positive predictions to the total predicted positive instances. It represents the accuracy of positive predictions made by the model. A high precision means that when the model predicts an instance as positive (fraudulent), it is very likely to be correct.
Recall is the ratio of true positive predictions to the total actual positive instances in the dataset. It measures the model’s ability to identify all positive instances correctly. High recall means the model can capture a significant portion of actual fraudulent cases.
Logistic regression
In this phase of the analysis, we built and evaluated a simple logistic regression model using Python. First, a logistic regression model is instantiated and then trained on the dataset.
After training, the model is used to make predictions on a test dataset, and the accuracy of the model is computed and displayed, which is approximately 99.92%.
However, due to the imbalance in the dataset, it is noted that accuracy alone may not provide a comprehensive picture of the model’s performance.
To gain deeper insights, additional evaluation metrics are calculated. These metrics are displayed using a custom function called displayMetrics
.
The metrics calculated include precision, recall, F-beta score, and area under the ROC curve (AUC). The precision is approximately 0.89, recall is around 0.64, the F-score is roughly 0.65, and the AUC is about 0.82. These metrics provide a more nuanced view of the logistic regression model’s performance, particularly when dealing with imbalanced datasets.
The precision is good; however, in the context of this specific problem, as previously mentioned, recall holds significant importance. In the upcoming model, the SMOTE sampler will be incorporated to address the issue of data imbalance and assess whether it can lead to an improvement in recall.
Logistic regression with SMOTE sampler
At this phase of the analysis, attention is given to the imbalanced nature of the dataset. To achieve a balanced dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed to establish a balanced dataset.
A SMOTE sampler is instantiated with a specific random state to ensure result reproducibility.
The SMOTE sampler is applied to the training data, leading to the generation of new data points for the purpose of balancing the class distribution. This is accomplished by utilizing the fit_resample
method on the training data (X_train and y_train).
To visualize the effect of SMOTE resampling, we generated a bar chart, illustrating the class distribution after resampling. The classes are distinguished by the colors green and red, with green representing the fraudulent transactions and red the legitimate transactions. .
In contrast to the initial bar chart illustrating the non-resampled data, it is evident that this time, a more balanced distribution is observed, with a satisfactory representation of both classes.
Following the data resampling process, the logistic regression model is retrained using the rebalanced dataset.
After the training, the model’s performance is assessed using the same custom function, displayMetrics
. The results reveal notable changes in performance metrics. Accuracy decreases to approximately 0.98, while precision exhibits a significant reduction to around 0.09. However, the most significant improvement is observed in recall, which increases to approximately 0.90. The F-score and AUC also demonstrate promising values, with the F-score at approximately 0.66 and the AUC at 0.94.
Achieving a recall of 0.9 indicates the model’s effectiveness in identifying critical cases. However, a precision of 0.09 reveals a need for improvement in accuracy. While our primary concern is capturing important cases, it is essential to strike a balance between these metrics. Maintaining a very low precision can lead to the misclassification of numerous legitimate transactions as fraudulent, which is a situation we aim to avoid.
In our next step, Logistic Regression with reweighting will be explored as a means to address this imbalance between metrics.
Logistic regression with reweighting
At this point in the analysis, the steps involved in optimizing the Logistic Regression model with the goal of improving the balance between performance metrics.
Logistic Regression model was intialized with s a random state and an extended maximum iteration limit of 2000 to ensure convergence.
A parameter grid is set up to explore different combinations of hyperparameters and class weights. These hyperparameters include ‘class_weight’ and the regularization parameter ‘C’, where ‘class_weight’ is varied between different configurations: ‘balanced’, {0: 0.172, 1: 0.828}, {0: 0.3, 1: 0.7}, and {0: 0.2, 1: 0.8}. The ‘C’ values are selected from [0.01, 0.1, 1, 10, 100].
After running the grid search, the best parameters and the corresponding best model are accessed. The best model reflects the hyperparameter configuration that maximizes the F1 score.
The best model is further trained using the training data to refine its performance.
The model’s performance is evaluated using the displayMetrics
function. The results indicate that accuracy is approximately 1.00, precision is around 0.86, recall is roughly 0.80, the F1 score is approximately 0.80, and the AUC is about 0.90.
These results are showing a significant improvement in the balance between precision and recall, indicating that the optimization process has been successful. The model now maintains a high level of accuracy while also achieving a balanced trade-off between correctly identifying positive cases and minimizing false positives.
Conclusion
In summary, this project explored credit card fraud detection and the challenge of handling data imbalances. We embarked on a journey involving data exploration, logistic regression, and the application of techniques like SMOTE and reweighting.
Our findings revealed the importance of achieving a balance between precision and recall. SMOTE significantly improved recall, while reweighting struck a better balance.
Notably, the third model, “logistic regression with reweighting,” emerged as the most successful in addressing this specific problem, delivering the best results.
But with an 80% recall, we acknowledge that we must continue to explore more models. Our aim is to enhance recall further, as achieving an 80% recall rate still falls short of the desired level of effectiveness.
One of the key takeaways from this analysis is the recognition that accuracy is not always the paramount metric. Instead, for specific problems like fraud detection, recall takes precedence. This project emphasizes the importance of considering recall as a critical performance metric in situations where the cost of missing fraudulent transactions is substantial.