Point cloud processing with GNNs treats unstructured 3D point sets as graphs by connecting nearby points with edges. LiDAR sensors, depth cameras, and photogrammetry produce millions of 3D points with no inherent connectivity. By building a graph from spatial proximity, GNNs can learn local geometric patterns (surfaces, edges, corners) through message passing, enabling object classification, scene segmentation, and shape detection.
Graph construction from point clouds
Two methods for connecting points into a graph:
from torch_geometric.nn import knn_graph, radius_graph
# Method 1: K-nearest neighbors
# Each point connects to its K closest points
edge_index = knn_graph(pos, k=16, batch=batch)
# Pros: uniform connectivity, every point has exactly K neighbors
# Cons: edge lengths vary (short in dense areas, long in sparse)
# Method 2: Radius-based connectivity
# Each point connects to all points within radius r
edge_index = radius_graph(pos, r=0.1, batch=batch)
# Pros: captures local density, physically meaningful threshold
# Cons: variable degree (dense areas have many neighbors)
# Edge features: relative position encodes local geometry
row, col = edge_index
edge_attr = pos[col] - pos[row] # relative 3D vectorK-NN is simpler and more common. Radius-based is better when point density varies significantly (e.g., LiDAR scans that are dense nearby and sparse far away).
Why local geometry matters
The power of GNNs over point-independent methods (PointNet) comes from local geometric context:
- Flat surface: All neighbors are coplanar. Normal vectors are parallel. Low curvature.
- Edge: Neighbors split into two planes meeting at the point. High curvature in one direction.
- Corner: Neighbors spread in three or more directions. High curvature in all directions.
- Cylindrical surface: Neighbors curve in one direction but are straight in another.
A GNN that aggregates neighbor positions learns to distinguish these patterns, enabling recognition of geometric primitives (planes, cylinders, spheres) and complex shapes (vehicles, pedestrians, buildings).
Hierarchical processing
Real point clouds have millions of points. Processing all of them at full resolution is computationally expensive. The solution: hierarchical coarsening and upsampling, analogous to U-Net in image processing:
- Downsample: Use farthest point sampling to select a representative subset (e.g., 1M → 64K → 4K → 256 points).
- Encode: At each level, build a k-NN graph and apply message passing. Pool (aggregate) point features from fine to coarse level.
- Decode: Upsample features from coarse to fine using interpolation and skip connections. Each point receives predictions at the original resolution.
Key architectures
- PointNet++: Hierarchical point processing with local set abstraction (essentially a GNN with mean aggregation and multi-scale grouping).
- DGCNN: Dynamic graph CNN that rebuilds the k-NN graph in feature space after each layer. State of the art on many classification benchmarks.
- Point Transformer: Applies self-attention (graph transformer) within local neighborhoods. Best accuracy on large-scale segmentation.
- KPConv: Kernel point convolution, a continuous convolution operator on point clouds. Efficient and accurate for outdoor scenes.
Applications
- Autonomous driving: 3D object detection and tracking from LiDAR. Detecting cars, pedestrians, and cyclists in real-time.
- Robotics: Scene understanding for robotic manipulation. Identifying graspable surfaces and obstacles.
- Architecture/Construction: Converting laser scans to building information models (scan-to-BIM).
- Manufacturing: Quality inspection by comparing scanned parts to CAD models. Detecting sub-millimeter defects.