Machine learning is a valuable approach for addressing fraud problems due to its ability to handle complex classification tasks. However, apart from labeling and data infrastructure challenges, there are other important considerations to keep in mind.
Alternative methods to combat fraud, such as human review, exist but come with their own drawbacks. Setting up a human review queue, assigning tasks, providing appropriate tools, and measuring performance can be costly. Additionally, human reviewers are prone to errors due to bias, missed steps, or undefined processes. Moreover, the volume of information to review, including factors like identifier velocity, previous history, and connected fraudulent accounts, can overwhelm human reviewers within limited timeframes. Despite these challenges, human review offers advantages like the ability to provide easily explainable decisions and leverage real-life knowledge and experience. For instance, humans can recognize suspicious combinations like “John Doe” as a potential red flag.
Heuristic rules serve as another alternative for fraud detection and can be effective in handling urgent fraud attacks. They can be quickly written and deployed assuming the necessary infrastructure support. However, relying solely on heuristic rules may lead to issues when certain patterns change shortly after an attack. Managing a large number of rules with complex logic can introduce errors.
In machine learning, fraud detection is typically approached as a classification problem. The goal is to delineate the boundary between fraud and non-fraud cases within a multi-dimensional space, where each feature represents a dimension (e.g., account age). Features are often normalized, categorized, or embedded. Decision tree-based models are commonly used due to their efficacy in handling imbalanced datasets, as labeled fraud cases are often scarce. Having around 1,000 or more positive/fraud training data points is typically sufficient. To address overfitting, additional training data can be added based on the specific problem.
Neural network models are increasingly popular but require more robust features. Decision tree models are less impacted by empty or sparse features. If underfitting or performance plateaus are observed with non-neural network models as the dataset size increases, neural network models can be a suitable alternative.
Choosing the right metrics to evaluate the model is crucial and depends on the specific business goals, such as reducing fraud loss or balancing user experience and fraud loss. Conducting error analysis on misclassified cases can improve the model. Mislabeling may require improving human labeling, while lacking input features may necessitate updating or introducing new features.
ML models have certain disadvantages, such as difficulties in explaining decisions, particularly with neural network models. However, techniques like feature removal can shed light on decision differences. Development cycles for ML models can be lengthy, requiring expertise and infrastructure support. Despite these challenges, machine learning remains a powerful tool in combating fraud when applied judiciously.