12,311
Shapes
1,024
Typical Points
3 (xyz)
Features
40
Classes
What ModelNet40 contains
ModelNet40 is a dataset of 12,311 3D CAD models from 40 common object categories. Each model is provided as a mesh and typically subsampled to 1,024 or 2,048 points from the object's surface. Points have 3 features (x, y, z coordinates). The 40 categories include everyday objects: airplane, bathtub, bed, bench, bookshelf, bottle, bowl, car, chair, cone, cup, curtain, desk, door, dresser, flower pot, glass box, guitar, keyboard, lamp, laptop, mantel, monitor, night stand, person, piano, plant, radio, range hood, sink, sofa, stairs, stool, table, tent, toilet, TV stand, vase, wardrobe, and xbox.
Why ModelNet40 matters
ModelNet40 tests global shape understanding: given a complete 3D object, what is it? This capability is fundamental to robotics (a robot must identify objects before manipulating them), autonomous driving (the perception system must classify surrounding objects), and augmented reality (virtual objects must interact appropriately with real ones).
Compared to ShapeNet (which tests part-level understanding), ModelNet40 focuses on the graph-level readout problem: how do you aggregate point-level GNN embeddings into a single representation that captures the shape's global identity? Global max pooling, global mean pooling, and Set2Set attention are all tested on ModelNet40.
Loading ModelNet40 in PyG
from torch_geometric.datasets import ModelNet
from torch_geometric.transforms import SamplePoints
# Sample 1024 points per shape for efficiency
dataset = ModelNet(root='/tmp/ModelNet', name='40',
pre_transform=SamplePoints(1024))
print(f"Shapes: {len(dataset)}") # 12311
print(f"Classes: {dataset.num_classes}") # 40
shape = dataset[0]
print(f"Points: {shape.pos.shape}") # [1024, 3]
print(f"Label: {shape.y.item()}")SamplePoints reduces point count for efficiency. Standard splits: 9,843 train, 2,468 test.
Common tasks and benchmarks
40-class shape classification. Standard evaluation uses overall accuracy on the test split. PointNet: 89.2%. PointNet++: 91.9%. DGCNN: 92.9%. Point Transformer: 93.7%. PointNeXt: 94.0%. With a 1,024-point subsample (common for efficiency), results are slightly lower. The standard practice is to compare methods at the same point count.
Example: warehouse robotics
An e-commerce warehouse uses robotic arms to pick and sort packages. Depth cameras produce point clouds of objects on conveyor belts. The robot must classify each object (box, bag, bottle, envelope) to determine the correct grasping strategy and sorting destination. ModelNet40 benchmarks exactly this 3D classification from point cloud data. Production systems add noise robustness, partial object handling, and real-time speed constraints.
Published benchmark results
Overall accuracy on ModelNet40 test split with 1,024 input points (unless noted). Higher is better.
| Method | OA (%) | Year | Paper |
|---|---|---|---|
| PointNet | 89.2 | 2017 | Qi et al. |
| PointNet++ | 91.9 | 2017 | Qi et al. |
| DGCNN | 92.9 | 2019 | Wang et al. |
| Point Transformer | 93.7 | 2021 | Zhao et al. |
| PointNeXt | 94.0 | 2022 | Qian et al. |
| PointMLP | 94.1 | 2022 | Ma et al. |
Original Paper
3D ShapeNets: A Deep Representation for Volumetric Shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao (2015). CVPR
Read paper →Original data source
ModelNet40 is maintained by Princeton and available from modelnet.cs.princeton.edu. The aligned version commonly used for point cloud classification is available from the PointNet++ PyTorch repository.
@inproceedings{wu20153d,
title={3D ShapeNets: A Deep Representation for Volumetric Shapes},
author={Wu, Zhirong and Song, Shuran and Khosla, Aditya and Yu, Fisher and Zhang, Linguang and Tang, Xiaoou and Xiao, Jianxiong},
booktitle={CVPR},
pages={1912--1920},
year={2015}
}BibTeX citation for the ModelNet dataset.
Which dataset should I use?
ModelNet40 vs ShapeNet: ModelNet40 is for whole-shape classification (40 classes). ShapeNet is for per-point part segmentation (50 part categories). Different tasks.
ModelNet40 vs ScanObjectNN: ModelNet40 has clean synthetic CAD models. ScanObjectNN has real scanned objects with noise, occlusion, and background clutter. ScanObjectNN is harder and more representative of real-world conditions.
ModelNet40 vs ModelNet10: ModelNet10 has 10 classes and fewer shapes. Use ModelNet40 for comprehensive benchmarking; ModelNet10 only for quick prototyping.
From benchmark to production
Production 3D classification handles sensor noise (LiDAR and depth cameras produce noisy point clouds), partial observations (objects are often occluded), scene-level complexity (multiple overlapping objects), and real-time constraints. ModelNet40's clean, complete shapes are a controlled starting point. Domain-specific fine-tuning on noisy real-world data is essential for production deployment.