12/19/2022
Using Graph Learning to Combat Fraud and Abuse
Author(s): Blaz Stojanovic, Ivaylo Bahtchevanov
Having a fully automated detection system at scale is critical for organizations to ensure trust with their customers, however this is incredibly difficult to do effectively in practice. In this blog post, we’ll dive into the mechanics of these systems and talk about some of the traditional approaches. We’ll also introduce Graph Neural Networks and explain why they work particularly well at solving these problems and overcome the major challenges that more traditional approaches face. We’ll also cover some of the most effective tools, both open source and commercial, to solve this problem based on your technical or organizational requirements.
The World of Fraud and Abuse
The world of fraud and abuse is extremely lucrative for malicious actors, and is becoming more and more lucrative each year, per FTC data American consumers lost over $5.8 billion dollars in 2021, an increase of over 70% compared to 2020. Institutions aren’t safe either, Juniper Research estimates that the e-commerce industry alone lost $41.4 billion to fraud in 2022. It is becoming crucial for institutions to protect their users, both in order to circumvent direct financial costs, as well as avoiding reputational damage.
The quickly evolving landscape of fintech, online marketplaces, and the digitalization of goods and services resulted in rapidly evolving fraud and abuse patterns. The sheer volume of data generated by these systems renders any manual approaches of detection infeasible. Below are some of the common scenarios that come up.
Fraud and abuse scenarios
There are countless examples of fraud and abuse in the real world. Here are some of the most common scenarios.
Banking Fraud
With the move towards digital banking, most payments today are entirely online. Banks compete with one another to provide the best digital experience for customers. The digital interfaces provide a greater surface area for attack while the reduction in steps in authentication and verification of identity makes it easier to circumvent safeguards.
Most banking and transactional fraud begins with account or identity theft – malicious actors obtain credentials and use them to access digital assets, make purchases, withdraw funds, apply for credit cards etc. Alternatively, these bad actors can create fake online accounts and use them for loan application fraud and credit card fraud.
Ecommerce Fraud and Abuse
Ecommerce fraud also typically starts with identity theft or account takeovers. Malicious actors use the accounts for obtaining assets from retailers or withdraw accounts, or they can create fake accounts to exploit promotions or referral incentives. These actors can also perform prohibited or illicit transactions using a legitimate merchant’s credentials. In the case of marketplaces, you can have merchants using fake accounts to create more reviews as a means of attracting customers and improving visibility.
Abusive behavior on Social Media Platforms
Social media platforms are particularly vulnerable to abusive behavior because of the network effects and broader reach within the platform. Fake accounts are often created as a means of conducting malicious activity for economic, political, and personal gain. Abuse here can include a broad range of activity – automated policy-violating content, spamming, or spreading misinformation. These behaviors can create massive usability issues and are difficult to track.
Approaches for building a large scale detection system
These scenarios are typically best suited for an automated system that captures usage activity of the entire network to identify malicious activity and its corresponding bad actors.
In practice a successful detection system will include an automated classification system that works in conjunction with a fraud or abuse prevention team. The automated system will categorize behavior and flag anomalous activity while the team will manually review the anomalies and either classify new fraud patterns or identify false positives, both of which will improve the overall accuracy of detection. The interaction between an intelligent system and the human-in-the-loop should be seamless and streamlined in order to minimize the number of false positives while providing actionable insights via explainability.
Typically, there are two use cases that the system should address:
- Use Case 1: We know what the fraud/abuse behavior looks like – we just need to identify and prevent it
- Use Case 2: We don’t know all the behaviors we want to stop – we need to proactively uncover new malicious patterns in the data, and then apply use case 1.
The system should also take into consideration the following design requirements:
- Ability to scale to extremely large data domains
- Minimize false positives (they can severely disrupt the user experience)
- Ability to operate efficiently while only a very small fraction of the data is manually reviewed
- Ability to perform well with highly imbalanced datasets (given the extremely low ratio of fraud to legitimate events)
Next, we’ll review the two common approaches – rule-based systems and classical ML – and a newer approach – graph machine learning.
Approach One: Rule-based systems
Rule-based systems are always the first (and sometimes only) line of defense to many detection systems because they are simple to understand/explain, computationally inexpensive, and scalable. They follow a direct mapping of “if this then that” to capture clear signs of malicious behavior.
Examples of rules could be:
- If an account has a flow of more than $20.000 in cash in less than 14 days then raise an alert.
- If a user has made more than 50 reviews in a day then raise an alert.
- If a user authenticates from a different locations over the course of a day, then make them re-authenticate.
In practice, you need hundreds or thousands of these rules to capture the different scenarios. When a new transaction or user activity comes in, that action is checked against all of the rules.
This works well if you have perfectly sandboxed what defines all examples of fraud or abuse and can then address use case 1. However, to address use case 2, an organization would need to employ entire teams of data and security analysts that must fully understand the domain and the data and manually capture all possible scenarios into rules. This is incredibly time consuming and labor expensive. Even if you can achieve some reasonable success with this approach, over time bad actors will learn how to circumvent the rules and adapt to perform malicious activity undetected (or at least remain undetected until it is too late). Systems that rely on these rules are too rigid to adapt to new behaviors quickly enough to detect malicious behavior until after it has occurred.
See the common behavior loop:
Approach Two: Traditional Machine Learning
Machine learning can tackle a higher level of complexity behind fraud and abuse patterns than typical rules-based systems. It is relatively fast and efficient, scalable, and not as labor intensive. That said, this approach requires both a dedicated data science team to consistently implement and improve models as well as a team of domain experts that work together with the data team to ensure that the data is properly understood, features are designed correctly, and the models are accurately interpreted. We can split use cases 1 and 2 into two broad categories of machine learning: supervised learning and anomaly detection (or unsupervised learning) respectively.
Supervised learning
This approach lets us classify a type of behavior or a specific actor as fraudulent or not by creating a clear mapping function between the inputs (the data representing behavior) and the output (the classification of that behavior). Rather than dedicating significant resources to devise the decision rules ourselves, we allow the model to infer and learn all of the rules from the data provided.
In addition to the expensive labor and time of the dedicated ML and subject matter expert teams, this approach has additional requirements on the data. In order for a mapping to be learned, the system requires a fully labeled dataset – where each behavior is properly tagged as good or bad (often even more granular categorization is needed for bad activity). It is important to note here that an incorrectly labeled dataset will render the models useless. Ensuring a high quality labeled dataset is very difficult – to ensure correctness, labels are often manually collected, which is both labor intensive and expensive.
Ideally, we want our supervised learning models to learn as efficiently as possible, leverage as many signals from the input as they can, and learn robustly so it can generalize to yet unseen or unlabeled data. This is impossible to accomplish if the existing data doesn’t capture a broad enough representation of potential behavior of both good and bad behavior. In particular, models should generalize despite the fact that there are very little examples of real malicious activity for each category of harmful behavior.
Anomaly Detection
A common approach for detecting malicious behavior is to learn more generic patterns in behavior and then identify anything that deviates from the ordinary and flag the outliers. This is useful for capturing new patterns of fraud as described in use case 2 – the predictions from this approach can then be used as specific negative labels for a supervised approach.
For example, in the case of a stolen account, the malicious actor will perform actions that deviate from the typical usage, which will appear as anomalies in the data. These outliers can flag fraud, and the data teams can retroactively update supervised models to detect these anomalous behaviors.
Sufficient high-quality data is still essential for the model to be useful. Low-quality data will lead to bad predictions, but even good data will yield poor results with an improper modeling approach.
Limitations of Traditional Machine Learning: A Data Representation Problem
Classical ML approaches are subject to what we refer to as the data representation problem: the models are unable to take in the data directly and properly represent all of the behavior and interactions within the platform. Traditional approaches require the data to be preprocessed, distilled, and stored in static tabular form before it can be used for training and prediction. This transformation discards most of the structural, temporal, and contextual information, while the model operates on each node as a completely separate and independent entity. However behavior in a network doesn’t occur in isolation but in a broader context of other interactions. Leveraging the entire context is immensely valuable in a predictive model that benefits from additional signals. Traditional feature engineering attempts to encode the structural information in new features, but it is extremely difficult to condense the rich interactions of entities into static tabular features while not losing most of the signal.
This brings us to our third and final approach – GNNs – which will address the pain points of traditional methods mentioned above and fully leverage the graph properties in making predictions.
Approach Three: Graph Neural Networks
Graph machine learning (or more concretely GNNs) is the overall best approach for tackling large scale fraud and abuse problems.
Why Graphs
In the case of both a payments or transactions platform and a social network, you can model the world as a series of entities and their respective relationships with the other entities on the network. These scenarios are best captured and described as a heterogeneous graph, where there are different types of entities and different types of relations or links between them.
Here are some examples:
- A financial transaction network consists of entities that include customers, credit cards, and banks. The relationships between entities include transactions of credit cards, loans, movement of money between individuals or institutions, or other financial interactions between entities
- An e-commerce website can be composed of entities such as users, stores, products, and ads. Links between entities include users buying items, stores listing items, and users reviewing stores. Each of these edges may then carry additional information, e.g. date and time of purchase or the review rating and accompanying text.
- In social networks, entities represent users and content that users post (images, videos, comments, posts, etc.) and links represent the interactions (comments or reactions to content posted, direct messaging, viewing, etc.)
Graphs capture the relational properties that exist naturally in the data and represent a complex network of entities and links, where each entity exists in the context of its connected components. By capturing the structural information, graph models can make predictions with very little data on a given entity, which is very useful for minimizing false positives and tackling the imbalance label problem. Graph models require significantly less feature engineering than traditional ML because they are designed to operate on the raw data and learn the structural properties and interactions of the network automatically. A good example of this is in Twitter’s spam detection models, where their data teams designed 24 new features to significantly improve their classification based on graphs. These features attempted to capture network structure of data (such as clustering coefficient, and betweenness centrality).
Graph Neural Networks
GNNs are designed to operate directly on graph data and capture signals from the entire network with each prediction. This translates to better entity embeddings to encode the structure and characteristics of the local node neighborhood. These improved embeddings have considerable impact on model performance (you can read Facebook’s approach in leveraging graph ML to reduce abusive content by 27% by explicitly encoding their entire social graph as deep entities).
GNNs build robust embeddings by passing and encoding messages between entities which can then fuel downstream tasks. Recent advances allow for GNNs to scale to massive graphs, which make them more useful in fraud and abuse workflows. The engineering team at AirBnB took this approach to understand host behavior and ensure trustworthiness of listings across all users on the platforms. GNNs provide a particular advantage when there is little to no historical data on an entity – such as when a new host signs up for the AirBnB platform. GNNs then focus on the new entity’s connections to construct a more detailed understanding of the user.
GNNs are very scalable in nature, improving both throughput and latency. In e-commerce, Ebay developed an explainable fraud transaction detection system that scaled to over 1.1 billion nodes and 3.7 billion edges based on a GNN model design. This approach outperformed their baselines by more than 2% while simultaneously achieving low latency in execution. Similarly, Amazon employed GNNs to fight abuse and fraud on their complex, dynamic and large-scale e-commerce networks. Their hybrid RNN-GNN architecture successfully learned from very few labeled examples and elegantly extended GNNs to solve for very challenging temporal problems. It outperformed other large-scale dynamic graph baselines in abuse classification tasks by up to 14% AUROC and provided a 10x memory improvement.
GNN-powered embeddings also greatly benefit us in the unsupervised settings, such as in use case 2. The embeddings significantly improve anomaly detection by identifying more granular patterns – anomalies can be learned at the entity level and at the subgraph level. By tracking how node embeddings change over time, the system becomes significantly better at understanding how the node interacts in the context of the broader network, which is incredibly useful in discovering changes in user behavior. A sudden account take-over would be immediately flagged and identified.
Recent advances in the field of GNN explainability makes these models particularly good at providing meaningful explanations, making the predictions more actionable. Methods like GNNExplainer and integrated gradients determine which components of our data contribute more to the predictions allowing us to better explain fraudulent behavior and understand possible model mistakes. These explanations can be at node, subgraph, but also feature level, improving over explanation methods in the classical ML domain.
GNNs significantly improve performance on detection across all of our previously-outlined design dimensions. They have been demonstrated to work at scale while also effectively learning from very few examples, and generalizing to unseen events due to their implicit understanding of data structure. Graph learning reduces the amount of false positives, even on very imbalanced datasets. Finally, the graph structure captured by GNNs can be further leveraged for more granular explainability of predictions and anomaly detection, both of which can be done at subgraph, entity, relation, or feature level.
Implementing Graphs in Practice
While GNNs are extremely effective general tools, there are significant challenges for implementing them in production. Going from raw data to a fully constructed graph is not a trivial process, and you need to find a way to store the graph as well as the features efficiently. Graphs can be arbitrary in size with a complex topological structure. Unlike textual and image data, graph data does not have any underlying fixed ordering, which means the graph models need to be agnostic to the data structure. The nodes can have multimodal features, and each data modality has to be encoded appropriately. To capture real-word dynamics, graph structure needs to be able to change over time.
As a result of these challenges, setting up the graph and models requires specialized knowledge and tools.
PyG
If you’re looking to build a platform from scratch, a great tool to accelerate time-to-value is PyG (PyTorch Geometric), one of the most popular and commonly-used graph learning libraries across industry and research. PyG makes it easy to build and manage GNNs by providing unified APIs and abstractions that follow the design principles of PyTorch. You can stack different lego blocks together from over a hundred contributed GNN architectures and validate against hundreds of benchmark datasets to design your own custom state-of-the-art model.
Leading enterprises have built core predictive analytics and pipelines using PyG – Spotify’s homepage recommendations, Airbus’s anomaly detection, and AstraZeneca’s drug discovery are all powered by PyG.
Kumo
Regardless of the approach, building an enterprise-scale fraud detection service takes considerable time and resources in the form of a large team of data scientists, data platform engineers, and security domain experts.
Kumo makes it easy to be up-and-running in minimal time while getting the most value out of your existing data. Kumo effectively automates the process of going from raw data to explainable and actionable predictions without large, dedicated teams. All a user needs to do is connect their data warehouse or raw data storage, and Kumo will assemble the graph under the hood. Once the graph is created, it becomes quick and easy to make any number of predictions on the data using a simple SQL-like interface – without any additional effort in data processing or retraining for subsequent predictions.
The traditional process of target label engineering, feature engineering, architecture and hyper-parameter search, and ML Ops are fully abstracted away, making it easy for an analyst to perform the same predictions that would typically involve many engineers and many cycles of data engineering, training, and model tuning.
Finally, Kumo takes the burden of building and enabling the governance and compliance that a security-focused platform would require off the end user. The control plan comes with enterprise-grade tooling for authentication and permissions, security, monitoring/logging, and compute management.
If you are interested in learning more, please reach out!