Ad Fraud Detection
“Is this impression from a bot?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Is this impression from a bot?
Ad fraud costs the industry $84B annually. Rule-based filters catch known patterns but miss sophisticated bot networks that mimic human behavior. These bots share IP ranges, rotate device fingerprints, and generate realistic click patterns that pass individual-level checks. For an ad network processing $2B in spend, a 10% fraud rate means $200M lost to bots.
Quick answer
Graph neural networks detect ad fraud by identifying coordinated bot networks that appear legitimate in isolation but form conspicuous clusters in the device-IP-publisher graph. While rule-based systems catch known patterns, GNNs detect structural anomalies -- shared IP subnets, correlated click timing, device fingerprint cycling -- reducing fraud losses by 60-80%.
Approaches compared
4 ways to solve this problem
1. Rule-based filters (IP blocklists, velocity checks)
Flag impressions that exceed click velocity thresholds, come from known datacenter IPs, or match known bot signatures. The industry baseline.
Best for
Catching known fraud patterns quickly. Low latency, easy to deploy, and fully transparent.
Watch out for
Sophisticated bots rotate IPs, throttle click rates, and mimic human behavior specifically to evade these rules. Rule-based filters catch 30-40% of fraud at best, and false positive rates climb as you tighten thresholds.
2. Anomaly detection (isolation forests, autoencoders)
Train unsupervised models on normal traffic patterns and flag deviations. Catches unusual behavior without needing labeled fraud examples.
Best for
Detecting new fraud patterns that rules haven't been written for yet. Good complement to rule-based systems.
Watch out for
High false positive rates. Treats each impression independently, so coordinated fraud that mimics normal per-device behavior slips through. Cannot detect the network structure of bot farms.
3. Supervised classification (XGBoost on device features)
Train a classifier on labeled fraud/legitimate data using device, IP, and behavioral features. More accurate than rules for known fraud types.
Best for
Strong when you have high-quality labeled training data and the fraud patterns are stable over time.
Watch out for
Requires expensive manual labeling. Degrades quickly as fraudsters adapt. Treats each device independently, missing the coordinated nature of bot networks.
4. KumoRFM (relational graph ML)
Connect impressions, devices, IPs, publishers, and click patterns into a single graph. The GNN detects bot network structure: shared subnets, correlated timing, fingerprint cycling, and publisher concentration anomalies.
Best for
Detecting sophisticated coordinated fraud. The graph structure reveals bot networks that look legitimate at the individual device level but form obvious clusters when you see the connections.
Watch out for
Requires device-level impression data with IP and publisher connections. Most effective when you have enough traffic volume to see network-level patterns (1M+ daily impressions).
Key metric: SAP SALT benchmark: graph-aware fraud models achieve 91% accuracy vs 75% for feature-engineered and 63% for rule-based approaches.
Why relational data changes the answer
Ad fraud is a network problem. A single bot impression can look perfectly normal: reasonable click timing, a real-looking device fingerprint, a residential IP address. But zoom out and you see 47 devices on the same /24 subnet, all clicking the same 3 ads on the same publisher within the same 10-minute window. That coordination is invisible to any model that evaluates impressions independently. Rule-based systems check each impression against thresholds. Supervised classifiers score each device on its own features. Neither can see the forest for the trees.
Relational models read the impression-device-IP-publisher graph and learn what coordinated fraud looks like structurally. They detect that a cluster of devices sharing an IP range, exhibiting correlated click timing, and concentrating on a single publisher forms a pattern that legitimate traffic never produces. This is why graph-based fraud detection catches 60-80% of sophisticated bot traffic while rule-based systems cap out at 30-40%. The SAP SALT benchmark shows similar relational advantages: 91% accuracy for graph-aware models vs 75% for feature-engineered approaches vs 63% for rule-based systems.
Detecting ad fraud one impression at a time is like trying to spot a pickpocket ring by watching each person individually on security cameras. Each pickpocket looks like a normal shopper. But if you overlay their movements on a single map, you see them working in coordinated patterns -- one distracts, another bumps, a third lifts the wallet. Graph-based fraud detection is that overhead map. It reveals the coordination that individual-level analysis cannot see.
How KumoRFM solves this
Graph-powered intelligence for advertising
Kumo builds a graph connecting impressions, devices, IPs, publishers, and click patterns. Bot networks that appear legitimate in isolation form conspicuous clusters in the graph: shared IP subnets, correlated click timing, device fingerprint cycling, and abnormal publisher concentration. The GNN detects these structural anomalies without hand-crafted rules, adapting as fraud tactics evolve.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
IMPRESSIONS
| impression_id | device_id | ip_address | publisher_id | timestamp |
|---|---|---|---|---|
| IMP801 | DEV001 | 192.168.1.50 | PUB01 | 2025-03-01 02:14 |
| IMP802 | DEV002 | 192.168.1.51 | PUB01 | 2025-03-01 02:14 |
| IMP803 | DEV003 | 10.0.0.88 | PUB02 | 2025-03-01 09:30 |
DEVICES
| device_id | device_type | os | fingerprint_hash |
|---|---|---|---|
| DEV001 | Mobile | Android | FP-AA1 |
| DEV002 | Mobile | Android | FP-AA2 |
| DEV003 | Desktop | Windows | FP-BB1 |
IPS
| ip_address | asn | geo | datacenter |
|---|---|---|---|
| 192.168.1.50 | AS12345 | US-East | True |
| 192.168.1.51 | AS12345 | US-East | True |
| 10.0.0.88 | AS67890 | US-West | False |
PUBLISHERS
| publisher_id | name | category | fraud_history_rate |
|---|---|---|---|
| PUB01 | QuickClicks | News | 12.4% |
| PUB02 | TechReview | Technology | 0.8% |
CLICK_PATTERNS
| device_id | clicks_last_hour | avg_time_between_clicks | unique_ads |
|---|---|---|---|
| DEV001 | 147 | 0.4s | 3 |
| DEV002 | 132 | 0.5s | 3 |
| DEV003 | 4 | 45s | 4 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(IMPRESSIONS.is_fraud, 0, 1, hours) FOR EACH IMPRESSIONS.impression_id
Prediction output
Every entity gets a score, updated continuously
| IMPRESSION_ID | DEVICE_ID | FRAUD_PROB | VERDICT |
|---|---|---|---|
| IMP801 | DEV001 | 0.96 | Fraud |
| IMP802 | DEV002 | 0.94 | Fraud |
| IMP803 | DEV003 | 0.03 | Legitimate |
Understand why
Every prediction includes feature attributions — no black boxes
Impression IMP801 -- Device DEV001
Predicted: 96% fraud probability
Top contributing features
IP subnet cluster size
47 devices on /24
31% attribution
Click velocity (last hour)
147 clicks
26% attribution
Datacenter IP flag
True
20% attribution
Publisher historical fraud rate
12.4%
14% attribution
Device fingerprint rotation frequency
3 per hour
9% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about ad fraud detection
How do you detect sophisticated ad fraud bots?
Graph-based detection is the most effective approach for sophisticated bots. These bots mimic human behavior at the individual level, evading rule-based filters. But they cannot hide their network structure: shared IP subnets, correlated click timing, and device fingerprint rotation patterns form obvious clusters in the device-IP-publisher graph.
What percentage of ad traffic is fraudulent?
Industry estimates range from 5-15% of programmatic ad traffic, with some publishers and channels seeing rates above 20%. For an ad network processing $2B in spend, even a 10% fraud rate means $200M lost to bots. Graph-based detection recovers 60-80% of this by catching coordinated fraud that rule-based systems miss.
How do you reduce false positives in ad fraud detection?
Graph-based models reduce false positives because they require convergence of multiple network-level signals (IP clustering, timing correlation, publisher concentration) rather than triggering on single thresholds. A high click rate from a legitimate power user won't trigger a fraud flag because the network context is completely different from a bot farm.
What data do you need for ad fraud detection?
Impression logs with device IDs, IP addresses, publisher IDs, and timestamps. Click events with timing data. Device fingerprint attributes and IP metadata (ASN, datacenter flag, geolocation). The key is having the relational connections between these entities, not just aggregated features per device.
How fast can graph-based fraud detection adapt to new tactics?
Graph models retrain on new data continuously, detecting novel fraud patterns within days of their emergence. Because the model learns structural anomalies rather than specific rule signatures, new bot tactics that change individual device behavior but maintain coordinated network structure are caught immediately without writing new rules.
Bottom line: An ad network processing $2B in annual spend recovers $120-160M by catching sophisticated bot networks that rule-based systems miss. Kumo's graph reveals coordinated fraud clusters across devices, IPs, and publishers that appear legitimate in isolation.
Related use cases
Explore more ad tech use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




