Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
6Binary Classification · Fraud Detection

Ad Fraud Detection

Is this impression from a bot?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Is this impression from a bot?

Ad fraud costs the industry $84B annually. Rule-based filters catch known patterns but miss sophisticated bot networks that mimic human behavior. These bots share IP ranges, rotate device fingerprints, and generate realistic click patterns that pass individual-level checks. For an ad network processing $2B in spend, a 10% fraud rate means $200M lost to bots.

Quick answer

Graph neural networks detect ad fraud by identifying coordinated bot networks that appear legitimate in isolation but form conspicuous clusters in the device-IP-publisher graph. While rule-based systems catch known patterns, GNNs detect structural anomalies -- shared IP subnets, correlated click timing, device fingerprint cycling -- reducing fraud losses by 60-80%.

Approaches compared

4 ways to solve this problem

1. Rule-based filters (IP blocklists, velocity checks)

Flag impressions that exceed click velocity thresholds, come from known datacenter IPs, or match known bot signatures. The industry baseline.

Best for

Catching known fraud patterns quickly. Low latency, easy to deploy, and fully transparent.

Watch out for

Sophisticated bots rotate IPs, throttle click rates, and mimic human behavior specifically to evade these rules. Rule-based filters catch 30-40% of fraud at best, and false positive rates climb as you tighten thresholds.

2. Anomaly detection (isolation forests, autoencoders)

Train unsupervised models on normal traffic patterns and flag deviations. Catches unusual behavior without needing labeled fraud examples.

Best for

Detecting new fraud patterns that rules haven't been written for yet. Good complement to rule-based systems.

Watch out for

High false positive rates. Treats each impression independently, so coordinated fraud that mimics normal per-device behavior slips through. Cannot detect the network structure of bot farms.

3. Supervised classification (XGBoost on device features)

Train a classifier on labeled fraud/legitimate data using device, IP, and behavioral features. More accurate than rules for known fraud types.

Best for

Strong when you have high-quality labeled training data and the fraud patterns are stable over time.

Watch out for

Requires expensive manual labeling. Degrades quickly as fraudsters adapt. Treats each device independently, missing the coordinated nature of bot networks.

4. KumoRFM (relational graph ML)

Connect impressions, devices, IPs, publishers, and click patterns into a single graph. The GNN detects bot network structure: shared subnets, correlated timing, fingerprint cycling, and publisher concentration anomalies.

Best for

Detecting sophisticated coordinated fraud. The graph structure reveals bot networks that look legitimate at the individual device level but form obvious clusters when you see the connections.

Watch out for

Requires device-level impression data with IP and publisher connections. Most effective when you have enough traffic volume to see network-level patterns (1M+ daily impressions).

Key metric: SAP SALT benchmark: graph-aware fraud models achieve 91% accuracy vs 75% for feature-engineered and 63% for rule-based approaches.

Why relational data changes the answer

Ad fraud is a network problem. A single bot impression can look perfectly normal: reasonable click timing, a real-looking device fingerprint, a residential IP address. But zoom out and you see 47 devices on the same /24 subnet, all clicking the same 3 ads on the same publisher within the same 10-minute window. That coordination is invisible to any model that evaluates impressions independently. Rule-based systems check each impression against thresholds. Supervised classifiers score each device on its own features. Neither can see the forest for the trees.

Relational models read the impression-device-IP-publisher graph and learn what coordinated fraud looks like structurally. They detect that a cluster of devices sharing an IP range, exhibiting correlated click timing, and concentrating on a single publisher forms a pattern that legitimate traffic never produces. This is why graph-based fraud detection catches 60-80% of sophisticated bot traffic while rule-based systems cap out at 30-40%. The SAP SALT benchmark shows similar relational advantages: 91% accuracy for graph-aware models vs 75% for feature-engineered approaches vs 63% for rule-based systems.

Detecting ad fraud one impression at a time is like trying to spot a pickpocket ring by watching each person individually on security cameras. Each pickpocket looks like a normal shopper. But if you overlay their movements on a single map, you see them working in coordinated patterns -- one distracts, another bumps, a third lifts the wallet. Graph-based fraud detection is that overhead map. It reveals the coordination that individual-level analysis cannot see.

How KumoRFM solves this

Graph-powered intelligence for advertising

Kumo builds a graph connecting impressions, devices, IPs, publishers, and click patterns. Bot networks that appear legitimate in isolation form conspicuous clusters in the graph: shared IP subnets, correlated click timing, device fingerprint cycling, and abnormal publisher concentration. The GNN detects these structural anomalies without hand-crafted rules, adapting as fraud tactics evolve.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

IMPRESSIONS

impression_iddevice_idip_addresspublisher_idtimestamp
IMP801DEV001192.168.1.50PUB012025-03-01 02:14
IMP802DEV002192.168.1.51PUB012025-03-01 02:14
IMP803DEV00310.0.0.88PUB022025-03-01 09:30

DEVICES

device_iddevice_typeosfingerprint_hash
DEV001MobileAndroidFP-AA1
DEV002MobileAndroidFP-AA2
DEV003DesktopWindowsFP-BB1

IPS

ip_addressasngeodatacenter
192.168.1.50AS12345US-EastTrue
192.168.1.51AS12345US-EastTrue
10.0.0.88AS67890US-WestFalse

PUBLISHERS

publisher_idnamecategoryfraud_history_rate
PUB01QuickClicksNews12.4%
PUB02TechReviewTechnology0.8%

CLICK_PATTERNS

device_idclicks_last_houravg_time_between_clicksunique_ads
DEV0011470.4s3
DEV0021320.5s3
DEV003445s4
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(IMPRESSIONS.is_fraud, 0, 1, hours)
FOR EACH IMPRESSIONS.impression_id
3

Prediction output

Every entity gets a score, updated continuously

IMPRESSION_IDDEVICE_IDFRAUD_PROBVERDICT
IMP801DEV0010.96Fraud
IMP802DEV0020.94Fraud
IMP803DEV0030.03Legitimate
4

Understand why

Every prediction includes feature attributions — no black boxes

Impression IMP801 -- Device DEV001

Predicted: 96% fraud probability

Top contributing features

IP subnet cluster size

47 devices on /24

31% attribution

Click velocity (last hour)

147 clicks

26% attribution

Datacenter IP flag

True

20% attribution

Publisher historical fraud rate

12.4%

14% attribution

Device fingerprint rotation frequency

3 per hour

9% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about ad fraud detection

How do you detect sophisticated ad fraud bots?

Graph-based detection is the most effective approach for sophisticated bots. These bots mimic human behavior at the individual level, evading rule-based filters. But they cannot hide their network structure: shared IP subnets, correlated click timing, and device fingerprint rotation patterns form obvious clusters in the device-IP-publisher graph.

What percentage of ad traffic is fraudulent?

Industry estimates range from 5-15% of programmatic ad traffic, with some publishers and channels seeing rates above 20%. For an ad network processing $2B in spend, even a 10% fraud rate means $200M lost to bots. Graph-based detection recovers 60-80% of this by catching coordinated fraud that rule-based systems miss.

How do you reduce false positives in ad fraud detection?

Graph-based models reduce false positives because they require convergence of multiple network-level signals (IP clustering, timing correlation, publisher concentration) rather than triggering on single thresholds. A high click rate from a legitimate power user won't trigger a fraud flag because the network context is completely different from a bot farm.

What data do you need for ad fraud detection?

Impression logs with device IDs, IP addresses, publisher IDs, and timestamps. Click events with timing data. Device fingerprint attributes and IP metadata (ASN, datacenter flag, geolocation). The key is having the relational connections between these entities, not just aggregated features per device.

How fast can graph-based fraud detection adapt to new tactics?

Graph models retrain on new data continuously, detecting novel fraud patterns within days of their emergence. Because the model learns structural anomalies rather than specific rule signatures, new bot tactics that change individual device behavior but maintain coordinated network structure are caught immediately without writing new rules.

Bottom line: An ad network processing $2B in annual spend recovers $120-160M by catching sophisticated bot networks that rule-based systems miss. Kumo's graph reveals coordinated fraud clusters across devices, IPs, and publishers that appear legitimate in isolation.

Topics covered

ad fraud detection AIbot traffic detectioninvalid traffic MLclick fraud predictionimpression fraud modelKumoRFM fraudprogrammatic fraud detectionad verification AI

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.