What is the ShapeNet dataset?

ShapeNet is a dataset of 16,881 3D shapes from 16 object categories (airplane, chair, table, etc.). Each shape is represented as a point cloud averaging 2,616 points. Points have 3D coordinates as features. The task is part segmentation: labeling each point with one of 50 part categories (e.g., wing, tail, body for airplanes).

How do I load ShapeNet in PyTorch Geometric?

Use `from torch_geometric.datasets import ShapeNet; dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'])`. You can load specific categories or all 16. Each element is a point cloud graph.

How are point clouds converted to graphs in PyG?

PyG constructs k-nearest-neighbor (KNN) graphs from the 3D coordinates. Each point becomes a node, and edges connect each point to its K nearest spatial neighbors. This converts unstructured point clouds into graphs that standard GNN layers can process.

What is part segmentation?

Part segmentation is a per-point classification task: assign each point in a 3D shape to the correct part (wing, body, tail, engine for airplanes). It is the 3D equivalent of semantic segmentation in 2D images. The model must understand both global shape context and local geometry.

What methods work best on ShapeNet?

PointNet++ and DGCNN (Dynamic Graph CNN using EdgeConv) are strong baselines. Point Transformer and PointNeXt represent the current state of the art. GNN layers like EdgeConv that dynamically update graph connectivity per layer outperform static KNN graph approaches.

ShapeNet Dataset: 3D Point Cloud Segmentation | PyG Guide

16,881

Shapes

2,616

Avg Points

3 (xyz)

Features

Part Classes

What ShapeNet contains

ShapeNet is a large-scale dataset of 3D shapes from 16 object categories: airplane, bag, cap, car, chair, earphone, guitar, knife, lamp, laptop, motorbike, mug, pistol, rocket, skateboard, and table. Each shape is represented as a point cloud -- a set of points in 3D space sampled from the object's surface. Each point has 3 features: its x, y, z coordinates. The average shape has 2,616 points.

The task is part segmentation: label each point with its part identity. An airplane has wings, body, tail, and engines. A chair has legs, seat, back, and arms. The 50 part categories span all 16 object categories. This requires the model to understand both the global shape (is this an airplane or a chair?) and local geometry (is this point on a wing or the body?).

Why ShapeNet matters

ShapeNet is the standard benchmark for 3D shape understanding, a field with direct applications in autonomous driving (understanding surrounding objects from LiDAR point clouds), robotics (grasping objects by understanding their parts), and manufacturing (quality inspection via 3D scanning). GNNs are particularly well-suited for point clouds because the spatial neighbor graph captures the local geometry that determines part identity.

The dataset also drove the development of key GNN layers. DGCNN (Dynamic Graph CNN) introduced EdgeConv, which recomputes the neighbor graph in learned feature space at each layer. This dynamic connectivity outperforms static KNN graphs because the most relevant neighbors for classification are not always the spatially closest points.

Loading ShapeNet in PyG

load_shapenet.py

from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'])
print(f"Shapes: {len(dataset)}")

shape = dataset[0]
print(f"Points: {shape.num_nodes}")     # ~2600
print(f"Coords: {shape.pos.shape}")     # [N, 3]
print(f"Part labels: {shape.y.shape}")  # [N] per-point labels

Load specific categories or omit the parameter for all 16. shape.pos has 3D coordinates.

Common tasks and benchmarks

Per-point part segmentation evaluated by mean IoU (intersection over union) across part categories. PointNet: ~83.7% mIoU. PointNet++: ~85.1%. DGCNN (EdgeConv): ~85.2%. Point Transformer: ~86.6%. PointNeXt: ~87.0%. The steady improvements reflect advances in local geometry encoding and global context aggregation.

Example: autonomous vehicle perception

Self-driving cars use LiDAR sensors that produce point clouds of the surrounding environment. Segmenting these point clouds (this cluster of points is a car, that cluster is a pedestrian, those points are the road surface) is a safety-critical task. ShapeNet trains the per-point classification capability; production autonomous driving systems apply it to real-time LiDAR data at 10-20 frames per second with millions of points.

Published benchmark results

Part segmentation on ShapeNet measured by mean IoU (intersection over union) across part categories. Higher is better.

Method	mIoU (%)	Year	Paper
PointNet	83.7	2017	Qi et al.
PointNet++	85.1	2017	Qi et al.
DGCNN	85.2	2019	Wang et al.
Point Transformer	86.6	2021	Zhao et al.
PointNeXt	87.0	2022	Qian et al.
PointMLP	86.1	2022	Ma et al.

Original Paper

ShapeNet: An Information-Rich 3D Model Repository

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu (2015). arXiv preprint

Read paper →

Original data source

The ShapeNet Core dataset is available from shapenet.org. The part segmentation annotations used in PyG were provided by Yi et al. (2016) and are available from the Stanford ShapeNet Part page.

cite_shapenet.bib

@techreport{chang2015shapenet,
  title={ShapeNet: An Information-Rich 3D Model Repository},
  author={Chang, Angel X and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher},
  year={2015},
  institution={Stanford University --- Princeton University --- Toyota Technological Institute at Chicago},
  note={arXiv:1512.03012}
}

BibTeX citation for the ShapeNet dataset.

Which dataset should I use?

ShapeNet vs ModelNet40: ShapeNet is for part segmentation (per-point labels). ModelNet40 is for whole-shape classification (one label per object). Choose based on your task: do you need to label parts of an object, or classify the whole thing?

ShapeNet vs S3DIS: ShapeNet has clean CAD shapes. S3DIS is real indoor 3D scans with noise and occlusion. Use ShapeNet for controlled benchmarks; S3DIS for real-world robustness.

ShapeNet vs ScanObjectNN: ScanObjectNN has real scanned objects with background noise. ShapeNet has perfect synthetic shapes. ScanObjectNN is harder and more realistic.

From benchmark to production

Production point cloud processing handles much larger scenes (100K+ points per frame vs. 2.6K per shape), real-time constraints (10Hz processing for autonomous driving), and noisy sensor data (LiDAR has distance-dependent resolution and occlusion artifacts). The clean, complete shapes in ShapeNet are a starting point; production robustness requires training on noisy, partial, and dynamic data.

ShapeNet: 3D Shape Understanding Through Point Cloud Graphs