ModelNet40 is a dataset of 12,311 3D CAD shapes from 40 common object categories (airplane, bathtub, bed, bench, bookshelf, etc.). Each shape is provided as a mesh; practitioners typically sample 1,024 or 2,048 points for training. The task is object classification: identify the object category from its 3D shape.

How does ModelNet40 differ from ShapeNet?

ModelNet40 is for object classification (one label per shape), while ShapeNet is for part segmentation (one label per point). ModelNet40 has more categories (40 vs 16) but fewer shapes (12K vs 16K). ModelNet40 tests global shape understanding; ShapeNet tests local part understanding.

How do I load ModelNet40 in PyTorch Geometric?

Use `from torch_geometric.datasets import ModelNet; dataset = ModelNet(root='/tmp/ModelNet', name='40')`. Each element is a point cloud with 3D coordinates and a class label.

What accuracy should I expect on ModelNet40?

PointNet achieves ~89.2%, PointNet++ ~91.9%, DGCNN ~92.9%, and Point Transformer ~93.7%. The standard evaluation uses overall accuracy on the test split. Results above 93% are competitive.

Why use graphs for 3D classification instead of voxels or meshes?

Point cloud graphs are more memory-efficient than voxel grids (which grow cubically with resolution) and more flexible than fixed-topology meshes. KNN graphs from point clouds adapt to the object's surface density and capture local geometric patterns efficiently.

ModelNet40 Dataset: 3D Object Classification | PyG Guide

12,311

Shapes

1,024

Typical Points

3 (xyz)

Features

Classes

What ModelNet40 contains

ModelNet40 is a dataset of 12,311 3D CAD models from 40 common object categories. Each model is provided as a mesh and typically subsampled to 1,024 or 2,048 points from the object's surface. Points have 3 features (x, y, z coordinates). The 40 categories include everyday objects: airplane, bathtub, bed, bench, bookshelf, bottle, bowl, car, chair, cone, cup, curtain, desk, door, dresser, flower pot, glass box, guitar, keyboard, lamp, laptop, mantel, monitor, night stand, person, piano, plant, radio, range hood, sink, sofa, stairs, stool, table, tent, toilet, TV stand, vase, wardrobe, and xbox.

Why ModelNet40 matters

ModelNet40 tests global shape understanding: given a complete 3D object, what is it? This capability is fundamental to robotics (a robot must identify objects before manipulating them), autonomous driving (the perception system must classify surrounding objects), and augmented reality (virtual objects must interact appropriately with real ones).

Compared to ShapeNet (which tests part-level understanding), ModelNet40 focuses on the graph-level readout problem: how do you aggregate point-level GNN embeddings into a single representation that captures the shape's global identity? Global max pooling, global mean pooling, and Set2Set attention are all tested on ModelNet40.

Loading ModelNet40 in PyG

load_modelnet40.py

from torch_geometric.datasets import ModelNet
from torch_geometric.transforms import SamplePoints

# Sample 1024 points per shape for efficiency
dataset = ModelNet(root='/tmp/ModelNet', name='40',
                   pre_transform=SamplePoints(1024))
print(f"Shapes: {len(dataset)}")        # 12311
print(f"Classes: {dataset.num_classes}") # 40

shape = dataset[0]
print(f"Points: {shape.pos.shape}")  # [1024, 3]
print(f"Label: {shape.y.item()}")

SamplePoints reduces point count for efficiency. Standard splits: 9,843 train, 2,468 test.

Common tasks and benchmarks

40-class shape classification. Standard evaluation uses overall accuracy on the test split. PointNet: 89.2%. PointNet++: 91.9%. DGCNN: 92.9%. Point Transformer: 93.7%. PointNeXt: 94.0%. With a 1,024-point subsample (common for efficiency), results are slightly lower. The standard practice is to compare methods at the same point count.

Example: warehouse robotics

An e-commerce warehouse uses robotic arms to pick and sort packages. Depth cameras produce point clouds of objects on conveyor belts. The robot must classify each object (box, bag, bottle, envelope) to determine the correct grasping strategy and sorting destination. ModelNet40 benchmarks exactly this 3D classification from point cloud data. Production systems add noise robustness, partial object handling, and real-time speed constraints.

Published benchmark results

Overall accuracy on ModelNet40 test split with 1,024 input points (unless noted). Higher is better.

Method	OA (%)	Year	Paper
PointNet	89.2	2017	Qi et al.
PointNet++	91.9	2017	Qi et al.
DGCNN	92.9	2019	Wang et al.
Point Transformer	93.7	2021	Zhao et al.
PointNeXt	94.0	2022	Qian et al.
PointMLP	94.1	2022	Ma et al.

Original Paper

3D ShapeNets: A Deep Representation for Volumetric Shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao (2015). CVPR

Read paper →

Original data source

ModelNet40 is maintained by Princeton and available from modelnet.cs.princeton.edu. The aligned version commonly used for point cloud classification is available from the PointNet++ PyTorch repository.

cite_modelnet.bib

@inproceedings{wu20153d,
  title={3D ShapeNets: A Deep Representation for Volumetric Shapes},
  author={Wu, Zhirong and Song, Shuran and Khosla, Aditya and Yu, Fisher and Zhang, Linguang and Tang, Xiaoou and Xiao, Jianxiong},
  booktitle={CVPR},
  pages={1912--1920},
  year={2015}
}

BibTeX citation for the ModelNet dataset.

Which dataset should I use?

ModelNet40 vs ShapeNet: ModelNet40 is for whole-shape classification (40 classes). ShapeNet is for per-point part segmentation (50 part categories). Different tasks.

ModelNet40 vs ScanObjectNN: ModelNet40 has clean synthetic CAD models. ScanObjectNN has real scanned objects with noise, occlusion, and background clutter. ScanObjectNN is harder and more representative of real-world conditions.

ModelNet40 vs ModelNet10: ModelNet10 has 10 classes and fewer shapes. Use ModelNet40 for comprehensive benchmarking; ModelNet10 only for quick prototyping.

From benchmark to production

Production 3D classification handles sensor noise (LiDAR and depth cameras produce noisy point clouds), partial observations (objects are often occluded), scene-level complexity (multiple overlapping objects), and real-time constraints. ModelNet40's clean, complete shapes are a controlled starting point. Domain-specific fine-tuning on noisy real-world data is essential for production deployment.

ModelNet40: Classifying 3D Objects from Point Cloud Graphs