65,755
Nodes
251,550
Edges
61,278
Features
186
Classes
What NELL contains
NELL (Never-Ending Language Learning) is a knowledge graph created by a CMU project that continuously extracts structured facts from web text. The dataset has 65,755 entity nodes connected by 251,550 relational edges of multiple types. Each entity has a 61,278- dimensional sparse feature vector derived from its textual descriptions. The 186-class task assigns each entity to its correct category (person, location, organization, concept, etc.).
The features are extremely sparse: each entity has only a handful of non-zero values across 61,278 dimensions. This sparsity reflects the nature of web-extracted knowledge: each entity appears in only a few contexts, creating a very high-dimensional but very sparse representation. Efficient sparse matrix operations are essential for processing NELL without running out of memory.
Why NELL matters
NELL is the primary GNN benchmark for knowledge graph reasoning. Knowledge graphs are used throughout industry: Google's Knowledge Graph powers search results, Amazon's product knowledge graph drives recommendations, and enterprise knowledge graphs organize corporate data. Entity typing (classifying entities in the graph) is a foundational task that enables these applications.
The 186-class, sparse-feature setting also tests GNN robustness to challenging input conditions. On citation networks (7 classes, 1,433 features), GCN achieves 81%. On NELL (186 classes, 61K sparse features), it drops to 66%. This gap reveals how much benchmark difficulty varies and why methods should be evaluated across diverse datasets.
Loading NELL in PyG
from torch_geometric.datasets import NELL
dataset = NELL(root='/tmp/NELL')
data = dataset[0]
print(f"Nodes: {data.num_nodes}") # 65755
print(f"Edges: {data.num_edges}") # 251550
print(f"Features: {data.num_features}") # 61278
print(f"Classes: {dataset.num_classes}") # 186
# Features are sparse -- handle accordingly
print(f"Feature type: {data.x.layout}") # sparse_coo or sparse_csrFeatures are sparse tensors. Ensure your model handles sparse input (most PyG layers do).
Common tasks and benchmarks
Node classification (entity typing) with a semi-supervised split. GCN: ~66%, GAT: ~68%, APPNP: ~67%. The 186-class task with sparse features is significantly harder than standard citation benchmarks. Methods that project sparse features to dense embeddings before GNN processing tend to perform better than those that operate directly on sparse inputs.
Example: enterprise knowledge management
A large enterprise has millions of documents, products, people, and processes connected by various relationships. Automatically typing entities in this knowledge graph (classifying a new document as a contract, a technical spec, or a marketing asset) enables intelligent search, automated routing, and compliance monitoring. NELL's entity typing task is exactly this classification at knowledge graph scale.
Published benchmark results
Node classification accuracy on NELL with the standard semi-supervised split. Higher is better.
| Method | Accuracy (%) | Year | Paper |
|---|---|---|---|
| GCN | ~66.0 | 2017 | Kipf & Welling |
| GAT | ~68.0 | 2018 | Velickovic et al. |
| APPNP | ~67.0 | 2019 | Klicpera et al. |
| SGC | ~66.5 | 2019 | Wu et al. |
| GraphSAGE | ~66.2 | 2017 | Hamilton et al. |
The 186-class task with extremely sparse features makes NELL significantly harder than citation network benchmarks.
Original Paper
Never-Ending Learning
Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapa Nakashole, Emmanouil Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, Joel Welling (2018). Communications of the ACM, 61(5), 103-115
Original data source
The NELL knowledge graph is from the CMU Never-Ending Language Learning project. The GNN benchmark version (used in the GCN paper) is available from the NELL project page. The processed version used by PyG follows the split from Kipf & Welling (2017).
@article{carlson2010toward,
title={Toward an Architecture for Never-Ending Language Learning},
author={Carlson, Andrew and Betteridge, Justin and Kisiel, Bryan and Settles, Burr and Hruschka, Estevam and Mitchell, Tom},
journal={AAAI},
volume={5},
pages={3},
year={2010}
}BibTeX citation for the original NELL project.
Which dataset should I use?
NELL vs Cora/CiteSeer: NELL is a knowledge graph with 186 classes and 61K sparse features. Cora is a citation network with 7 classes and 1,433 features. Use NELL to test models on harder, more realistic knowledge graph tasks.
NELL vs FB15k-237: NELL is for entity typing (node classification). FB15k-237 is for link prediction (predicting missing triples). Different tasks on knowledge graphs.
NELL vs Wikidata5M: Wikidata5M is much larger (5M entities) and supports both link prediction and entity typing. Use NELL for a manageable single-GPU benchmark; Wikidata5M for scale testing.
From benchmark to production
Production knowledge graphs have billions of entities, hundreds of relation types, and continuous updates as new facts are extracted. They also require link prediction (what new facts are likely true?) and knowledge completion (what facts are missing?) alongside entity typing. Multi-relational GNNs (RGCN, HGT) are essential for handling the diverse relationship types.