203,769
Nodes
234,355
Edges
165
Features
Fraud (binary)
Task
What Elliptic Bitcoin contains
Elliptic Bitcoin is a real Bitcoin transaction graph provided by Elliptic, a blockchain analytics company. The 203,769 nodes represent Bitcoin transactions. The 234,355 directed edges represent payment flows (output of one transaction is input to another). Each transaction has 165 features: 94 local features (timestamps, amounts, fees) and 71 aggregated features from 1-hop neighbors.
The graph spans 49 timesteps. Only ~46,000 transactions are labeled: ~4,500 as illicit (associated with ransomware, darknet markets, etc.) and ~42,000 as licit. The remaining ~158,000 are unlabeled. The extreme class imbalance (~2% illicit) and label scarcity mirror real fraud detection challenges.
Why Elliptic Bitcoin matters
Before Elliptic, fraud detection research relied on synthetic or tabular datasets. Elliptic provided the first public real-world financial graph, enabling researchers to study how fraud patterns manifest in transaction network topology. The key finding: illicit transactions form clusters in the payment graph. Fraudsters send funds through chains of transactions to launder money, creating distinctive subgraph patterns that per-transaction feature models cannot detect.
The temporal dimension adds realism. Fraud patterns evolve: as enforcement targets one laundering method, criminals switch to another. Models trained on early timesteps must generalize to new fraud patterns in later timesteps. This temporal shift is the hardest challenge in production fraud detection.
Loading Elliptic Bitcoin in PyG
from torch_geometric.datasets import EllipticBitcoinDataset
dataset = EllipticBitcoinDataset(root='/tmp/Elliptic')
data = dataset[0]
print(f"Nodes: {data.num_nodes}") # 203769
print(f"Edges: {data.num_edges}") # 234355
print(f"Features: {data.num_features}") # 165
# Note: only ~46K nodes are labeled (data.y != -1)
labeled = (data.y >= 0).sum()
print(f"Labeled nodes: {labeled}")Check your PyG version for EllipticBitcoinDataset availability. Manual loading from CSVs is an alternative.
Common tasks and benchmarks
Binary node classification (illicit vs licit) with temporal splits. Standard evaluation uses AUROC and F1. GCN: ~95% AUROC. GAT: ~96%. Temporal GNNs (EvolveGCN): ~97%+. The practical benchmark is precision-recall: how many illicit transactions can you catch while keeping false positives below 5%? This mirrors the real-world constraint where investigators can only review a limited number of flagged transactions.
Example: anti-money laundering
Banks spend $25+ billion annually on AML compliance. Each suspicious activity report requires manual investigation costing $50-500. Current rule-based systems generate 95%+ false positives, wasting investigator time. GNN-based detection on transaction graphs reduces false positives by identifying genuine fraud patterns (chain-like fund flows through shell accounts) while filtering out routine transactions that happen to trigger rules. Elliptic demonstrates this approach on real Bitcoin data.
Published benchmark results
Illicit transaction detection on Elliptic Bitcoin. Metric is AUROC (area under the ROC curve). Higher is better.
| Method | AUROC (%) | Year | Paper |
|---|---|---|---|
| Random Forest | ~97.7 | 2019 | Weber et al. |
| Logistic Regression | ~93.2 | 2019 | Weber et al. |
| GCN | ~95.0 | 2019 | Weber et al. |
| GAT | ~96.0 | 2020 | Pareja et al. |
| EvolveGCN | ~97.2 | 2020 | Pareja et al. |
| Skip-GCN | ~96.5 | 2020 | Weber et al. |
Note: Random Forest on the 165 features (which include 1-hop aggregates) is a strong baseline. GNN advantages are more pronounced in precision-recall at high-precision operating points.
Original Paper
Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics
Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, Charles E. Leiserson (2019). KDD Workshop on Anomaly Detection in Finance
Read paper →Original data source
The Elliptic Bitcoin dataset is provided by Elliptic and is available on Kaggle.
@inproceedings{weber2019anti,
title={Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics},
author={Weber, Mark and Domeniconi, Giacomo and Chen, Jie and Weidele, Daniel Karl I and Bellei, Claudio and Robinson, Tom and Leiserson, Charles E},
booktitle={KDD Workshop on Anomaly Detection in Finance},
year={2019}
}BibTeX citation for the Elliptic Bitcoin dataset.
Which dataset should I use?
Elliptic vs DGraphFin: Elliptic is a Bitcoin transaction graph (203K nodes, 165 features). DGraphFin is a fintech social graph (3.7M nodes, 17 features). Use Elliptic for transaction-level fraud detection; use DGraphFin for social-network fraud at scale.
Elliptic vs IEEE-CIS Fraud: IEEE-CIS is tabular (no graph structure). Elliptic provides the transaction graph. Use Elliptic to study graph-based fraud detection specifically.
Elliptic vs OGB-Products: Both are large single graphs. OGB-Products is product co-purchase (no fraud). Use Elliptic for fraud domain; OGB-Products for scalability benchmarks.
From benchmark to production
Elliptic has 200K nodes. A major bank processes 100M+ transactions daily. Production fraud detection requires real-time scoring (millisecond latency), heterogeneous graphs (accounts, merchants, devices, locations), and continuous model updates as fraud patterns evolve. The temporal and class imbalance challenges in Elliptic are real, but the scale gap is 1000x.