3.7M
Nodes
4.3M
Edges
17
Features
Fraud (binary)
Task
What DGraphFin contains
DGraphFin is a directed social network from a real fintech platform. The 3,700,550 nodes represent users. The 4,300,999 directed edges represent social connections (following, messaging, referral relationships). Each node has 17 features (anonymized user attributes). The binary task is to identify fraudulent accounts.
Unlike Elliptic Bitcoin (where edges are transaction flows), DGraphFin's edges are social connections. Fraud detection here relies on social network patterns: fraudulent accounts often form clusters (fraud rings), share referral chains, or exhibit unusual connection patterns. The 17 features per node are intentionally sparse, forcing models to rely more on graph structure than on feature richness.
Why DGraphFin matters
DGraphFin is the only public benchmark that approaches production fraud detection scale. Elliptic Bitcoin has 203K nodes; real banks have 100M+ accounts. DGraphFin's 3.7M nodes sit between these extremes, large enough to require production-grade training strategies (neighbor sampling, distributed computation) while still being feasible on academic hardware.
The low feature dimensionality (17) is also realistic. In privacy- constrained financial applications, per-user features may be limited by regulation. Graph structure becomes the primary signal. DGraphFin tests whether GNNs can detect fraud primarily from social network topology, a scenario common in fintech where user identity information is limited.
Loading DGraphFin
# pip install dgraph (or download from dgraph.xinye.com)
from dgraph.graph import load_data
from torch_geometric.loader import NeighborLoader
data = load_data(name='DGraphFin', raw_dir='/tmp/DGraphFin')
print(f"Nodes: {data.num_nodes}") # 3700550
print(f"Edges: {data.num_edges}") # 4300999
print(f"Features: {data.num_features}") # 17
# Neighbor sampling required at this scale
loader = NeighborLoader(
data, num_neighbors=[10, 5],
batch_size=2048, input_nodes=data.train_mask,
)DGraphFin requires the dgraph package. Neighbor sampling is essential at 3.7M nodes.
Common tasks and benchmarks
Binary node classification (fraud vs legitimate) with AUROC as the primary metric. GCN: ~72%, GraphSAGE: ~74%, GAT: ~73%. These numbers are notably lower than Elliptic Bitcoin (~95% AUROC) due to the sparser features and social (rather than transaction) graph structure. Methods that combine structural features with neighborhood sampling perform best. The computational challenge of training on 3.7M nodes is as much a benchmark as the accuracy.
Example: fraud ring detection in fintech
Fintech platforms face organized fraud: criminal groups create networks of fake accounts that refer each other, transact with each other, and collaborate to exploit promotions or lending products. These fraud rings are invisible at the individual account level (each account looks normal) but obvious in the social graph (the cluster of mutually connected new accounts with identical behavior). DGraphFin benchmarks exactly this graph-based ring detection at realistic scale.
Published benchmark results
Fraud detection on DGraphFin. Metric is AUROC. Higher is better.
| Method | AUROC (%) | Year | Paper |
|---|---|---|---|
| MLP | ~71.4 | 2022 | Huang et al. |
| GCN | ~72.0 | 2022 | Huang et al. |
| GraphSAGE | ~74.0 | 2022 | Huang et al. |
| GAT | ~73.0 | 2022 | Huang et al. |
| XGBoost + feat | ~71.8 | 2022 | Huang et al. |
Note: The modest AUROC values reflect the genuine difficulty of social-graph fraud detection with sparse features. GNN gains over MLP are small but consistent.
Original Paper
DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection
Xuanwen Huang, Yang Yang, Yang Wang, Chunping Wang, Zhisheng Zhang, Jiarong Xu, Lei Chen, Michalis Vazirgiannis (2022). NeurIPS Datasets and Benchmarks Track
Read paper →Original data source
DGraphFin is available from the DGraph project site hosted by XinYe (Finvolution Group). Install via pip install dgraph.
@inproceedings{huang2022dgraph,
title={DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection},
author={Huang, Xuanwen and Yang, Yang and Wang, Yang and Wang, Chunping and Zhang, Zhisheng and Xu, Jiarong and Chen, Lei and Vazirgiannis, Michalis},
booktitle={NeurIPS Datasets and Benchmarks Track},
year={2022}
}BibTeX citation for the DGraphFin dataset.
Which dataset should I use?
DGraphFin vs Elliptic Bitcoin: DGraphFin is 18x larger (3.7M vs 203K nodes) but has fewer features (17 vs 165). DGraphFin is a social graph; Elliptic is a transaction graph. Use DGraphFin for scale testing; Elliptic for feature-rich fraud analysis.
DGraphFin vs OGB-Products: Similar scale (~3.7M vs ~2.4M nodes) but different domains. DGraphFin is fintech fraud; OGB-Products is product classification. Use the one matching your domain.
DGraphFin vs synthetic fraud datasets: DGraphFin uses real anonymized data from a fintech platform, making it more realistic than synthetically generated fraud graphs.
From benchmark to production
Production fintech fraud systems add transaction data (payment amounts, timing, merchants), device fingerprints, geolocation, and behavioral biometrics to the social graph. The graph becomes heterogeneous: users, devices, merchants, and transactions are different node types. Real-time scoring requirements (decisions within 100ms of a transaction) add engineering constraints. And the graph evolves continuously as new users join and new connections form.