What is the NELL dataset?

NELL (Never-Ending Language Learning) is a knowledge graph with 65,755 entity nodes, 251,550 relational edges, and 61,278-dimensional sparse features. The task is entity typing: classifying each entity into one of 186 categories. It was extracted from the CMU NELL project that continuously learns facts from the web.

How do I load NELL in PyTorch Geometric?

Use `from torch_geometric.datasets import NELL; dataset = NELL(root='/tmp/NELL')`. The dataset has sparse features (61,278 dimensions) that require special handling -- use sparse tensors to avoid memory issues.

Why are NELL's features 61,278-dimensional?

NELL's features are entity embeddings from the knowledge graph extraction process. The extreme sparsity (most values are zero) encodes entity descriptions from web text. Handling this sparse, high-dimensional input is a unique challenge among GNN benchmarks.

What accuracy should I expect on NELL?

GCN achieves ~66% accuracy, GAT ~68%. The 186-class task with sparse features is challenging. Methods that handle high-dimensional sparse inputs well (learned feature projections, attention) have an advantage.

How does NELL relate to knowledge graphs in production?

NELL demonstrates entity typing on a real knowledge graph. Production knowledge graphs (Google Knowledge Graph, enterprise ontologies) use similar entity classification to organize information. GNNs on these graphs enable automatic categorization, link prediction, and knowledge completion.

NELL Dataset: Large-Scale Knowledge Graph | PyG Guide

65,755

Nodes

251,550

Edges

61,278

Features

186

Classes

What NELL contains

NELL (Never-Ending Language Learning) is a knowledge graph created by a CMU project that continuously extracts structured facts from web text. The dataset has 65,755 entity nodes connected by 251,550 relational edges of multiple types. Each entity has a 61,278- dimensional sparse feature vector derived from its textual descriptions. The 186-class task assigns each entity to its correct category (person, location, organization, concept, etc.).

The features are extremely sparse: each entity has only a handful of non-zero values across 61,278 dimensions. This sparsity reflects the nature of web-extracted knowledge: each entity appears in only a few contexts, creating a very high-dimensional but very sparse representation. Efficient sparse matrix operations are essential for processing NELL without running out of memory.

Why NELL matters

NELL is the primary GNN benchmark for knowledge graph reasoning. Knowledge graphs are used throughout industry: Google's Knowledge Graph powers search results, Amazon's product knowledge graph drives recommendations, and enterprise knowledge graphs organize corporate data. Entity typing (classifying entities in the graph) is a foundational task that enables these applications.

The 186-class, sparse-feature setting also tests GNN robustness to challenging input conditions. On citation networks (7 classes, 1,433 features), GCN achieves 81%. On NELL (186 classes, 61K sparse features), it drops to 66%. This gap reveals how much benchmark difficulty varies and why methods should be evaluated across diverse datasets.

Loading NELL in PyG

load_nell.py

from torch_geometric.datasets import NELL

dataset = NELL(root='/tmp/NELL')
data = dataset[0]

print(f"Nodes: {data.num_nodes}")        # 65755
print(f"Edges: {data.num_edges}")        # 251550
print(f"Features: {data.num_features}")  # 61278
print(f"Classes: {dataset.num_classes}") # 186
# Features are sparse -- handle accordingly
print(f"Feature type: {data.x.layout}")  # sparse_coo or sparse_csr

Features are sparse tensors. Ensure your model handles sparse input (most PyG layers do).

Common tasks and benchmarks

Node classification (entity typing) with a semi-supervised split. GCN: ~66%, GAT: ~68%, APPNP: ~67%. The 186-class task with sparse features is significantly harder than standard citation benchmarks. Methods that project sparse features to dense embeddings before GNN processing tend to perform better than those that operate directly on sparse inputs.

Example: enterprise knowledge management

A large enterprise has millions of documents, products, people, and processes connected by various relationships. Automatically typing entities in this knowledge graph (classifying a new document as a contract, a technical spec, or a marketing asset) enables intelligent search, automated routing, and compliance monitoring. NELL's entity typing task is exactly this classification at knowledge graph scale.

Published benchmark results

Node classification accuracy on NELL with the standard semi-supervised split. Higher is better.

Method	Accuracy (%)	Year	Paper
GCN	~66.0	2017	Kipf & Welling
GAT	~68.0	2018	Velickovic et al.
APPNP	~67.0	2019	Klicpera et al.
SGC	~66.5	2019	Wu et al.
GraphSAGE	~66.2	2017	Hamilton et al.

The 186-class task with extremely sparse features makes NELL significantly harder than citation network benchmarks.

Original Paper

Never-Ending Learning

Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapa Nakashole, Emmanouil Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, Joel Welling (2018). Communications of the ACM, 61(5), 103-115

Original data source

The NELL knowledge graph is from the CMU Never-Ending Language Learning project. The GNN benchmark version (used in the GCN paper) is available from the NELL project page. The processed version used by PyG follows the split from Kipf & Welling (2017).

cite_nell.bib

@article{carlson2010toward,
  title={Toward an Architecture for Never-Ending Language Learning},
  author={Carlson, Andrew and Betteridge, Justin and Kisiel, Bryan and Settles, Burr and Hruschka, Estevam and Mitchell, Tom},
  journal={AAAI},
  volume={5},
  pages={3},
  year={2010}
}

BibTeX citation for the original NELL project.

Which dataset should I use?

NELL vs Cora/CiteSeer: NELL is a knowledge graph with 186 classes and 61K sparse features. Cora is a citation network with 7 classes and 1,433 features. Use NELL to test models on harder, more realistic knowledge graph tasks.

NELL vs FB15k-237: NELL is for entity typing (node classification). FB15k-237 is for link prediction (predicting missing triples). Different tasks on knowledge graphs.

NELL vs Wikidata5M: Wikidata5M is much larger (5M entities) and supports both link prediction and entity typing. Use NELL for a manageable single-GPU benchmark; Wikidata5M for scale testing.

From benchmark to production

Production knowledge graphs have billions of entities, hundreds of relation types, and continuous updates as new facts are extracted. They also require link prediction (what new facts are likely true?) and knowledge completion (what facts are missing?) alongside entity typing. Multi-relational GNNs (RGCN, HGT) are essential for handling the diverse relationship types.

NELL: Entity Typing on a Web-Scale Knowledge Graph