What is a citation network?

A citation network is a directed graph where nodes represent academic papers and edges represent citations (paper A cites paper B). Node features are typically bag-of-words or language model embeddings of the paper's abstract. The classic GNN benchmark task is classifying each paper into its research topic using both its content and citation structure.

Why are citation networks so popular for GNN research?

Three reasons: (1) They are real-world graphs with meaningful structure. (2) They are small enough for quick experimentation (Cora has 2,708 nodes). (3) The task (paper classification) is intuitive and demonstrates the value of graph structure: papers that cite each other tend to be in the same field.

What are the standard citation network datasets?

Cora (2,708 papers, 7 classes), CiteSeer (3,327 papers, 6 classes), and PubMed (19,717 papers, 3 classes) are the classic small benchmarks. ogbn-arxiv (169,343 papers, 40 classes) and ogbn-papers100M (111M papers) are large-scale alternatives from Open Graph Benchmark.

Citation Networks: Academic Papers as Nodes, Citations as Edges | Kumo.ai

A citation network is a graph where academic papers are nodes and citations are directed edges. Paper A citing paper B creates an edge from A to B. Each paper node carries features derived from its content (bag-of-words or language model embeddings) and a label indicating its research topic. The task is node classification: predict each paper's topic from its content and citation context.

Why citation networks demonstrate GNN value

The key property of citation networks is homophily: papers that cite each other tend to be in the same field. In Cora, 81% of edges connect papers with the same label. This means a GNN that aggregates neighbor information naturally receives confirming evidence about the node's class.

A logistic regression on paper features alone achieves 59% accuracy on Cora. Adding citation structure via a 2-layer GCN raises this to 81.5%. The 22.5% improvement comes entirely from graph structure: knowing what a paper cites and what cites it.

Standard datasets

Cora: 2,708 papers, 5,429 edges, 7 classes (Case-Based, Genetic Algorithms, Neural Networks, Probabilistic Methods, Reinforcement Learning, Rule Learning, Theory). Features: 1,433-dim binary word vectors.
CiteSeer: 3,327 papers, 4,732 edges, 6 classes. Similar structure to Cora but slightly harder (lower homophily).
PubMed: 19,717 papers, 44,338 edges, 3 classes (Diabetes Type 1, Type 2, Experimental). TF-IDF features.
ogbn-arxiv: 169,343 papers, 1.2M edges, 40 classes. From OGB, realistic scale with temporal split.

What GNNs learn on citation networks

After 2 layers of message passing on a citation network:

Layer 1: Each paper absorbs the topic signals from papers it cites and papers that cite it. A neural networks paper cited by 5 other neural networks papers gets a strong topic signal.
Layer 2: Each paper absorbs 2-hop context: the topics of papers cited by its citations. This captures broader field relationships.

The result: even papers with ambiguous content (a paper about “learning” that could be RL or neural networks) get classified correctly based on their citation neighborhood.

Limitations as benchmarks

Citation networks have important limitations for evaluating GNNs:

Too small: Cora has 2,708 nodes. Variance across random seeds is 1-2%, making it hard to distinguish methods.
High homophily: 81% same-class edges means even simple label propagation works well. Does not test performance on heterophilous graphs.
Transductive evaluation: All nodes are visible during training (only labels are masked). This does not reflect production settings where new nodes arrive constantly.
Not temporal: Standard splits are random, but citations are temporal (you can only cite older papers). Temporal splits give different, more realistic results.

For serious GNN evaluation, use OGB (ogbn-arxiv, ogbn-products) or RelBench, which address all four limitations.

Key Takeaways

1Citation networks are the canonical GNN benchmark: papers as nodes, citations as directed edges, topic classification as the task. Cora (2,708 nodes) is where GCN, GAT, and GraphSAGE were first demonstrated.
2Graph structure adds 22.5% accuracy on Cora (59% features-only vs 81.5% with GCN) because papers that cite each other share topics (81% same-class edge rate).
3Three classic datasets (Cora, CiteSeer, PubMed) shaped the first generation of GNN research. ogbn-arxiv (169K nodes) is the modern, larger alternative.
4Limitations: too small, too homophilous, transductive, and non-temporal. Results on citation networks do not reliably predict performance on production graph tasks.
5Use citation networks for learning and quick prototyping. Use OGB and RelBench for serious architecture evaluation and production-oriented research.

Citation Networks: Academic Papers as Nodes, Citations as Edges

Why citation networks demonstrate GNN value

Standard datasets

What GNNs learn on citation networks

Limitations as benchmarks

Frequently asked questions

What is a citation network?

Why are citation networks so popular for GNN research?

What are the standard citation network datasets?

Related

From the Kumo Learn Hub

Learn more about graph ML