Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset7 min read

DGraphFin: The Largest Public Fraud Detection Benchmark

DGraphFin is a 3.7-million-node fintech social graph designed for fraud detection. At nearly 20x the size of Elliptic Bitcoin, it is the closest public benchmark to the scale of real-world financial fraud systems -- and one of the hardest GNN benchmarks in any domain.

PyTorch Geometric

TL;DR

  • 1DGraphFin has 3,700,550 user nodes, 4,300,999 directed edges, 17 features, and binary fraud labels. It is the largest public financial fraud graph.
  • 2At 3.7M nodes, neighbor sampling is mandatory. Full-batch training is impossible. This tests production-relevant GNN engineering.
  • 3GCN achieves only ~72% AUROC with 17 sparse features. The challenge is detecting fraud from social network structure with limited per-node information.
  • 4DGraphFin demonstrates that fraud detection at fintech scale requires both graph reasoning and scalable infrastructure.

3.7M

Nodes

4.3M

Edges

17

Features

Fraud (binary)

Task

What DGraphFin contains

DGraphFin is a directed social network from a real fintech platform. The 3,700,550 nodes represent users. The 4,300,999 directed edges represent social connections (following, messaging, referral relationships). Each node has 17 features (anonymized user attributes). The binary task is to identify fraudulent accounts.

Unlike Elliptic Bitcoin (where edges are transaction flows), DGraphFin's edges are social connections. Fraud detection here relies on social network patterns: fraudulent accounts often form clusters (fraud rings), share referral chains, or exhibit unusual connection patterns. The 17 features per node are intentionally sparse, forcing models to rely more on graph structure than on feature richness.

Why DGraphFin matters

DGraphFin is the only public benchmark that approaches production fraud detection scale. Elliptic Bitcoin has 203K nodes; real banks have 100M+ accounts. DGraphFin's 3.7M nodes sit between these extremes, large enough to require production-grade training strategies (neighbor sampling, distributed computation) while still being feasible on academic hardware.

The low feature dimensionality (17) is also realistic. In privacy- constrained financial applications, per-user features may be limited by regulation. Graph structure becomes the primary signal. DGraphFin tests whether GNNs can detect fraud primarily from social network topology, a scenario common in fintech where user identity information is limited.

Loading DGraphFin

load_dgraph_fin.py
# pip install dgraph  (or download from dgraph.xinye.com)
from dgraph.graph import load_data
from torch_geometric.loader import NeighborLoader

data = load_data(name='DGraphFin', raw_dir='/tmp/DGraphFin')

print(f"Nodes: {data.num_nodes}")   # 3700550
print(f"Edges: {data.num_edges}")   # 4300999
print(f"Features: {data.num_features}")  # 17

# Neighbor sampling required at this scale
loader = NeighborLoader(
    data, num_neighbors=[10, 5],
    batch_size=2048, input_nodes=data.train_mask,
)

DGraphFin requires the dgraph package. Neighbor sampling is essential at 3.7M nodes.

Common tasks and benchmarks

Binary node classification (fraud vs legitimate) with AUROC as the primary metric. GCN: ~72%, GraphSAGE: ~74%, GAT: ~73%. These numbers are notably lower than Elliptic Bitcoin (~95% AUROC) due to the sparser features and social (rather than transaction) graph structure. Methods that combine structural features with neighborhood sampling perform best. The computational challenge of training on 3.7M nodes is as much a benchmark as the accuracy.

Example: fraud ring detection in fintech

Fintech platforms face organized fraud: criminal groups create networks of fake accounts that refer each other, transact with each other, and collaborate to exploit promotions or lending products. These fraud rings are invisible at the individual account level (each account looks normal) but obvious in the social graph (the cluster of mutually connected new accounts with identical behavior). DGraphFin benchmarks exactly this graph-based ring detection at realistic scale.

Published benchmark results

Fraud detection on DGraphFin. Metric is AUROC. Higher is better.

MethodAUROC (%)YearPaper
MLP~71.42022Huang et al.
GCN~72.02022Huang et al.
GraphSAGE~74.02022Huang et al.
GAT~73.02022Huang et al.
XGBoost + feat~71.82022Huang et al.

Note: The modest AUROC values reflect the genuine difficulty of social-graph fraud detection with sparse features. GNN gains over MLP are small but consistent.

Original Paper

DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection

Xuanwen Huang, Yang Yang, Yang Wang, Chunping Wang, Zhisheng Zhang, Jiarong Xu, Lei Chen, Michalis Vazirgiannis (2022). NeurIPS Datasets and Benchmarks Track

Read paper →

Original data source

DGraphFin is available from the DGraph project site hosted by XinYe (Finvolution Group). Install via pip install dgraph.

cite_dgraphfin.bib
@inproceedings{huang2022dgraph,
  title={DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection},
  author={Huang, Xuanwen and Yang, Yang and Wang, Yang and Wang, Chunping and Zhang, Zhisheng and Xu, Jiarong and Chen, Lei and Vazirgiannis, Michalis},
  booktitle={NeurIPS Datasets and Benchmarks Track},
  year={2022}
}

BibTeX citation for the DGraphFin dataset.

Which dataset should I use?

DGraphFin vs Elliptic Bitcoin: DGraphFin is 18x larger (3.7M vs 203K nodes) but has fewer features (17 vs 165). DGraphFin is a social graph; Elliptic is a transaction graph. Use DGraphFin for scale testing; Elliptic for feature-rich fraud analysis.

DGraphFin vs OGB-Products: Similar scale (~3.7M vs ~2.4M nodes) but different domains. DGraphFin is fintech fraud; OGB-Products is product classification. Use the one matching your domain.

DGraphFin vs synthetic fraud datasets: DGraphFin uses real anonymized data from a fintech platform, making it more realistic than synthetically generated fraud graphs.

From benchmark to production

Production fintech fraud systems add transaction data (payment amounts, timing, merchants), device fingerprints, geolocation, and behavioral biometrics to the social graph. The graph becomes heterogeneous: users, devices, merchants, and transactions are different node types. Real-time scoring requirements (decisions within 100ms of a transaction) add engineering constraints. And the graph evolves continuously as new users join and new connections form.

Frequently asked questions

What is the DGraphFin dataset?

DGraphFin is a large-scale directed social network from a fintech platform with 3,700,550 user nodes, 4,300,999 edges, and 17 features per node. The binary classification task identifies fraudulent accounts. It is part of the DGraph benchmark suite.

How does DGraphFin compare to Elliptic Bitcoin?

DGraphFin is 18x larger by nodes (3.7M vs 203K) and represents a fintech social network (user-to-user connections) rather than a transaction graph. It has fewer features per node (17 vs 165) but more realistic scale for production fraud systems.

How do I load DGraphFin?

Install the dgraph package (`pip install dgraph`) and use `from dgraph.graph import load_data; data = load_data(name='DGraphFin', raw_dir='/tmp/DGraphFin')`. The dataset is a single large graph with 3.7M nodes. Neighbor sampling with NeighborLoader is required for GPU training.

What makes DGraphFin challenging?

Three factors: massive scale (3.7M nodes requires sampling-based training), extreme class imbalance (fraud is rare), and the social graph structure (fraud detection via user-to-user relationships rather than transaction patterns). The 17 features are intentionally sparse to test structural learning.

What results are expected on DGraphFin?

GCN achieves ~72% AUROC, GraphSAGE ~74%, GAT ~73%. The relatively modest AUROC (compared to Elliptic's 95%+) reflects the task difficulty: detecting fraud from social connections with only 17 features is harder than from 165-feature transaction data.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.