Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset7 min read

CiteSeer: The Sparser Citation Benchmark That Separates Good GNNs from Great Ones

CiteSeer is the harder sibling in the Planetoid benchmark family. With fewer edges per node than Cora, it tests whether your GNN can extract signal from sparse neighborhoods -- a challenge that mirrors real-world data where connections are incomplete.

PyTorch Geometric

TL;DR

  • 1CiteSeer contains 3,327 papers (nodes) with 9,104 citations (edges) and 3,703-dimensional bag-of-words features. Papers belong to one of 6 categories.
  • 2It is sparser than Cora: fewer edges despite more nodes. This makes neighborhood aggregation less effective and lowers benchmark scores across all GNN architectures.
  • 3GCN achieves ~70.3% on CiteSeer vs ~81.5% on Cora. GAT gets ~72.5%. The gap between methods widens on sparser graphs.
  • 4CiteSeer reveals GNN weaknesses hidden by Cora's density. If your model matches GCN on Cora but underperforms on CiteSeer, it likely struggles with limited neighborhood information.

3,327

Nodes

9,104

Edges

3,703

Features

6

Classes

What CiteSeer contains

CiteSeer is a citation network of scientific papers from the CiteSeer digital library. Each of the 3,327 nodes is a paper. Each of the 9,104 edges is a citation link. Node features are 3,703-dimensional bag-of-words vectors derived from paper text. The task is to classify papers into 6 categories: Agents, AI, DB, IR, ML, and HCI.

Compared to Cora, CiteSeer has higher-dimensional features but a sparser graph. The average node degree is about 2.7 (vs 3.9 in Cora), meaning each paper has fewer citation connections to learn from. This sparsity is the defining challenge.

Why CiteSeer matters

CiteSeer fills a specific gap in the benchmark ecosystem. Cora is dense enough that even simple aggregation works well. CiteSeer's sparsity forces models to extract more from less. This tests two capabilities: how well the model uses node features when graph structure is limited, and whether attention mechanisms (GATConv, TransformerConv) can identify the few high-value connections in a sparse neighborhood.

Real-world graphs are often sparse. A fraud detection graph has millions of legitimate transactions for every suspicious one. A recommendation graph has far more items than any user has interacted with. CiteSeer's sparsity, while mild, points toward these real challenges.

Loading CiteSeer in PyG

load_citeseer.py
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root='/tmp/CiteSeer', name='CiteSeer')
data = dataset[0]

print(f"Nodes: {data.num_nodes}")        # 3327
print(f"Edges: {data.num_edges}")        # 9104
print(f"Features: {data.num_features}")  # 3703
print(f"Classes: {dataset.num_classes}") # 6
print(f"Avg degree: {data.num_edges / data.num_nodes:.1f}")  # ~2.7

Same Planetoid API as Cora. The standard split uses 120 training nodes (20 per class).

Original Paper

CiteSeer: An Automatic Citation Indexing System

C. Lee Giles, Kurt D. Bollacker, Steve Lawrence (1998). ACM DL '98

Read paper →

Benchmark comparison (standard Planetoid split)

MethodAccuracyYearPaper
MLP (no graph)~58.0%--Baseline
GCN70.3%2017Kipf & Welling
GAT72.5%2018Velickovic et al.
APPNP71.8%2019Klicpera et al.
GCNII73.4%2020Chen et al.

Which Planetoid dataset should I use?

Cora (2,708 nodes, avg degree 3.9) is the easiest -- use it to confirm your code works. CiteSeer (3,327 nodes, avg degree 2.7) is sparser, dropping GCN from 81% to 70%; use it to test robustness to sparse neighborhoods. PubMed (19,717 nodes, avg degree 4.5) is the largest and has only 3 classes; use it to test scalability. Run all three together: if your method improves on Cora but degrades on CiteSeer, it may depend too heavily on graph density.

Common tasks and benchmarks

Like Cora, the standard task is transductive semi-supervised node classification with 120 labeled training nodes (20 per class), 500 validation nodes, and 1,000 test nodes. Benchmark results: GCN ~70.3%, GAT ~72.5%, APPNP ~71.8%, GCNII ~73.4%. The 10-point gap between CiteSeer and Cora scores across all methods confirms that sparsity is the primary difficulty, not model architecture.

Data source

The original CiteSeer dataset is available from the LINQS group at UC Santa Cruz. The Planetoid version used by PyG is downloaded automatically.

BibTeX citation

citeseer.bib
@inproceedings{giles1998citeseer,
  title={CiteSeer: An Automatic Citation Indexing System},
  author={Giles, C. Lee and Bollacker, Kurt D. and Lawrence, Steve},
  booktitle={Proceedings of the Third ACM Conference on Digital Libraries},
  pages={89--98},
  year={1998}
}

@inproceedings{yang2016revisiting,
  title={Revisiting Semi-Supervised Learning with Graph Embeddings},
  author={Yang, Zhilin and Cohen, William and Salakhutdinov, Ruslan},
  booktitle={ICML},
  year={2016}
}

Cite Giles et al. for the dataset, Yang et al. for the Planetoid split.

Example: sparse graphs in enterprise

Most enterprise graphs resemble CiteSeer more than Cora. Consider a B2B SaaS platform where companies are nodes and business relationships are edges. Most companies have only a handful of connections. A recommendation system must predict which other companies a given customer might want to work with, using sparse relationship data plus rich company features -- the same tradeoff CiteSeer presents.

From benchmark to production

CiteSeer teaches that graph sparsity degrades GNN performance. In production, this problem is amplified: new users have zero connections (cold start), new products have no purchase history, and new accounts have no transaction graph. Handling sparsity requires architectures that balance feature-based and structure-based learning.

Frequently asked questions

What is the CiteSeer dataset?

CiteSeer is a citation network of 3,327 scientific papers classified into 6 categories. Each paper has a 3,703-dimensional bag-of-words feature vector. Edges represent citations. It is part of the Planetoid benchmark suite alongside Cora and PubMed.

How does CiteSeer compare to Cora?

CiteSeer has more nodes (3,327 vs 2,708) but fewer edges (9,104 vs 10,556), making it sparser. It has higher-dimensional features (3,703 vs 1,433) and fewer classes (6 vs 7). The sparser graph makes it harder for GNNs: GCN scores ~70% on CiteSeer vs ~81% on Cora.

How do I load CiteSeer in PyTorch Geometric?

Use `from torch_geometric.datasets import Planetoid; dataset = Planetoid(root='/tmp/CiteSeer', name='CiteSeer')`. The API is identical to Cora -- just change the name parameter.

What is a good accuracy on CiteSeer?

GCN achieves ~70.3%, GAT reaches ~72.5%, and state-of-the-art methods can exceed 75%. CiteSeer is harder than Cora due to its sparser graph structure, so expect lower numbers across all methods.

Should I benchmark my GNN on CiteSeer?

Yes, if you are also benchmarking on Cora and PubMed. The three Planetoid datasets together show how your method handles different graph densities and feature dimensions. CiteSeer's sparsity often reveals weaknesses that Cora's denser structure hides.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.