4Binary Classification · SIM Fraud

SIM Fraud Detection

Q: What data is needed for sim fraud detection?

Kumo connects directly to your existing relational tables: SUBSCRIBERS, SIMS, CALLS, DATA_SESSIONS, DEVICE_CHANGES. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

“Which SIM cards are being used for fraud?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Which SIM cards are being used for fraud?

Telecom fraud costs carriers $39B globally per year. SIM-based fraud schemes (SIM swaps, SIM boxes, IRSF) are increasingly sophisticated and operate through coordinated rings. A mid-size carrier loses $35M annually to fraud, and traditional rule-based systems catch only 40% of cases, with 30% false-positive rates that overwhelm fraud teams. The fraud signal is in the network: burner SIMs activated in batches, calling patterns to premium numbers, and device-change sequences that match known fraud playbooks.

Quick answer

SIM fraud detection requires graph-based models that connect SIMs through shared IMEIs, co-activation locations, and coordinated calling patterns to expose fraud rings. Per-SIM rule engines catch only 40% of fraud with 30% false-positive rates. Graph ML catches 85% of fraud rings with a 5% false-positive rate by analyzing the network structure: a single suspicious SIM connects to 50 others through shared activation patterns.

Approaches compared

4 ways to solve this problem

1. Rules-based velocity checks

Flag SIMs that exceed thresholds: too many activations per device, calls to known premium numbers, or SIM swaps within 48 hours.

Best for

Fast to deploy and catches the most blatant fraud patterns. Good for known fraud playbooks with clear signatures.

Watch out for

Rules are static and fraud evolves. Catches only 40% of fraud with 30% false-positive rates. Cannot detect coordinated ring behavior where each individual SIM's activity looks normal.

2. Anomaly detection on per-SIM features

Build behavioral profiles per SIM and flag statistical outliers using isolation forests or autoencoders on usage patterns.

Best for

Catches novel fraud patterns that do not match known rules. Good complement to rule-based systems.

Watch out for

Evaluates each SIM independently. A fraud ring distributes activity across 50 SIMs so that no single SIM looks anomalous. The fraud is in the coordination, not the individual behavior.

3. Network analysis (community detection)

Build a graph of SIM relationships (shared IMEIs, co-activation, common call destinations) and identify suspicious communities.

Best for

Reveals ring structure that per-SIM approaches miss entirely. Good for investigation and visualization of fraud networks.

Watch out for

Static graph analysis produces snapshot results. Misses the temporal dimension: the sequence of activations, device swaps, and calls matters as much as the connections themselves.

4. KumoRFM (relational graph ML)

Connect SIMs, subscribers, calls, data sessions, and device changes into a temporal fraud graph. The GNN learns coordinated activation patterns, shared IMEI networks, and calling behavior signatures across the full ring structure.

Best for

Highest detection rate with lowest false positives. Catches 85% of fraud rings at 5% false-positive rates by learning temporal, multi-hop ring patterns.

Watch out for

Requires device-level data (IMEI tracking) and call detail records. If only billing-level data is available, detection will be less precise.

Key metric: SAP SALT benchmark: relational graph ML achieves 91% accuracy vs 63% for rules-based and 75% for XGBoost on flat tables in fraud detection tasks.

Why relational data changes the answer

SIM fraud is a network crime, not an individual crime. A SIM box operation involves 50+ SIMs activated in batches, sharing IMEI devices in rotation, and making coordinated calls to the same premium international numbers. Each individual SIM's behavior might look borderline normal: a few calls, minimal data usage, one device swap. The fraud only becomes obvious when you connect the SIMs through their shared devices, activation timestamps, and calling destinations.

Relational models build this fraud network automatically from raw CDR and activation data. They learn patterns like 'five SIMs activated at the same store within 48 hours, sharing two IMEI devices, all calling the same +882 number prefix.' A per-SIM rule engine evaluates each of those five SIMs independently and may flag zero of them. The graph model sees the connected structure and flags all five with 92%+ confidence. SAP SALT benchmark shows relational graph ML achieves 91% accuracy vs 63% for rules-based approaches on fraud detection tasks.

Detecting SIM fraud one SIM at a time is like investigating organized crime by analyzing each suspect's bank statement separately. Each individual account looks clean. The crime only becomes visible when you trace the money flowing between accounts, the shared shell companies, and the coordinated timing of transfers. Fraud ring detection requires the network view.

How KumoRFM solves this

Graph-learned network intelligence across your entire subscriber base

Kumo builds a fraud network graph connecting SIMs, subscribers, call/data sessions, and device changes. It learns that SIMs activated within 48 hours of each other, sharing IMEI devices, and generating calls to the same set of international premium numbers form fraud rings. The graph structure reveals that a single suspicious SIM is connected to 50 others through shared activation locations and calling patterns. Traditional models evaluate each SIM independently and miss these ring-level signals entirely.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

SUBSCRIBERS

subscriber_id	activation_date	channel	id_verified
SUB301	2025-02-28	Online	Y
SUB302	2025-02-28	Online	N
SUB303	2025-01-15	Retail	Y

SIMS

sim_id	subscriber_id	imei	activation_store	status
SIM001	SUB301	IMEI_A001	Store_22	Active
SIM002	SUB302	IMEI_A001	Store_22	Active
SIM003	SUB303	IMEI_B445	Store_08	Active

CALLS

call_id	sim_id	destination	duration_sec	timestamp
CL01	SIM001	+882-1234567	180	2025-03-01
CL02	SIM002	+882-1234567	175	2025-03-01
CL03	SIM003	+1-555-0199	320	2025-03-01

DATA_SESSIONS

session_id	sim_id	data_mb	timestamp	tower_id
DS01	SIM001	2.1	2025-03-01	TWR_445
DS02	SIM002	1.8	2025-03-01	TWR_445
DS03	SIM003	850	2025-03-01	TWR_102

DEVICE_CHANGES

change_id	sim_id	old_imei	new_imei	timestamp
DC01	SIM001	IMEI_A001	IMEI_A002	2025-03-02
DC02	SIM002	IMEI_A001	IMEI_A003	2025-03-02

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(SIMS.FRAUD_FLAG, 0, 7, days)
FOR EACH SIMS.SIM_ID
WHERE SIMS.STATUS = 'Active'

Prediction output

Every entity gets a score, updated continuously

SIM_ID	SUBSCRIBER_ID	ACTIVATION_AGE	FRAUD_PROB
SIM001	SUB301	3 days	0.92
SIM002	SUB302	3 days	0.94
SIM003	SUB303	46 days	0.03

Understand why

Every prediction includes feature attributions — no black boxes

SIM SIM001 -- 3-day activation, shared IMEI

Predicted: 92% fraud probability

Top contributing features

Shared IMEI with other SIMs

2 SIMs on same device

31% attribution

Calls to premium international numbers

12 calls to +882

26% attribution

Co-activation pattern

Batch of 5 SIMs

19% attribution

Data usage anomaly

< 5MB/day (SIM box pattern)

14% attribution

Device change velocity

2 swaps in 48h

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about sim fraud detection

How do you detect SIM swap fraud?

Connect SIM activations, device IMEI changes, and call patterns in a graph model. SIM swap fraud shows specific graph signatures: rapid IMEI changes, calls to account service numbers followed by high-value transactions, and device sharing with other suspicious SIMs. Graph ML detects these coordinated patterns in real time rather than after the victim reports unauthorized activity.

What is a SIM box and how do you detect it?

A SIM box routes international calls through local SIMs to avoid interconnect fees. Detection signals include: multiple SIMs sharing IMEI devices, minimal data usage (voice-only traffic), calls concentrated to international premium numbers, and batch activation patterns. Graph ML connects these SIMs through their shared infrastructure and flags the entire operation.

How much does SIM fraud cost telecom carriers?

Telecom fraud costs carriers $39B globally per year. A mid-size carrier loses $35M annually to SIM-based fraud including SIM swaps, SIM boxes, and IRSF (International Revenue Share Fraud). Traditional rule-based systems catch only 40% of cases with 30% false-positive rates that overwhelm fraud investigation teams.

Why do rules-based systems fail for telecom fraud?

Fraud rings distribute activity across dozens of SIMs so that no individual SIM triggers rules. Each SIM makes a few calls, uses one device, and looks normal in isolation. The fraud is in the coordination: shared devices, batch activations, and synchronized calling. Rules engines evaluate SIMs independently and miss the ring structure entirely.

What data is needed for SIM fraud detection?

Critical data: SIM activation records, IMEI/device tracking, call detail records (CDRs), and subscriber identity verification status. High-value additions: tower-level location data, data session records, and device-change logs. The IMEI tracking and activation records are the most important for revealing ring structure through shared devices and coordinated timing.

Bottom line: A carrier losing $35M annually to SIM fraud that deploys Kumo's graph-based detection catches 85% of fraud rings with a 5% false-positive rate, recovering $25M+ per year. Kumo reveals the ring structure through shared IMEIs, co-activation patterns, and coordinated calling behavior that per-SIM rule engines cannot see.

Related use cases

Explore more telecom use cases

Use Case #1Subscriber ChurnLearn more

Use Case #2Network CapacityLearn more

Use Case #5Service OutageLearn more

Previous#3 Upsell Prediction

Next#5 Service Outage

Topics covered

SIM fraud detectiontelecom fraud AISIM swap detectionsubscription fraud MLIRSF detectiongraph neural network fraudKumoRFM telecom fraudfraud ring detectionSIM box detection AI

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free