How are point clouds converted to graphs?

Each 3D point becomes a node with features (coordinates, color, normal vector). Edges connect nearby points using k-nearest neighbors (k-NN) or radius-based connectivity. The resulting graph captures local geometric structure: flat surfaces, edges, corners, and curves become distinct neighborhoods.

Why use GNNs instead of PointNet for point clouds?

PointNet processes each point independently (no local structure). GNNs aggregate neighbor information, capturing local geometry that PointNet misses. A point on a flat surface has different neighbors than a point on a corner. GNNs learn these local patterns. PointNet++ added local structure, essentially becoming a hierarchical GNN.

What are common applications of point cloud GNNs?

Autonomous driving (3D object detection from LiDAR), robotics (scene understanding for manipulation), architecture (building information modeling from scans), manufacturing (quality inspection from 3D scans), and augmented reality (real-time scene segmentation).

Point Cloud Processing with GNNs: Treating 3D Points as Graphs | Kumo.ai

Point cloud processing with GNNs treats unstructured 3D point sets as graphs by connecting nearby points with edges. LiDAR sensors, depth cameras, and photogrammetry produce millions of 3D points with no inherent connectivity. By building a graph from spatial proximity, GNNs can learn local geometric patterns (surfaces, edges, corners) through message passing, enabling object classification, scene segmentation, and shape detection.

Graph construction from point clouds

Two methods for connecting points into a graph:

point_cloud_graph.py

from torch_geometric.nn import knn_graph, radius_graph

# Method 1: K-nearest neighbors
# Each point connects to its K closest points
edge_index = knn_graph(pos, k=16, batch=batch)
# Pros: uniform connectivity, every point has exactly K neighbors
# Cons: edge lengths vary (short in dense areas, long in sparse)

# Method 2: Radius-based connectivity
# Each point connects to all points within radius r
edge_index = radius_graph(pos, r=0.1, batch=batch)
# Pros: captures local density, physically meaningful threshold
# Cons: variable degree (dense areas have many neighbors)

# Edge features: relative position encodes local geometry
row, col = edge_index
edge_attr = pos[col] - pos[row]  # relative 3D vector

K-NN is simpler and more common. Radius-based is better when point density varies significantly (e.g., LiDAR scans that are dense nearby and sparse far away).

Why local geometry matters

The power of GNNs over point-independent methods (PointNet) comes from local geometric context:

Flat surface: All neighbors are coplanar. Normal vectors are parallel. Low curvature.
Edge: Neighbors split into two planes meeting at the point. High curvature in one direction.
Corner: Neighbors spread in three or more directions. High curvature in all directions.
Cylindrical surface: Neighbors curve in one direction but are straight in another.

A GNN that aggregates neighbor positions learns to distinguish these patterns, enabling recognition of geometric primitives (planes, cylinders, spheres) and complex shapes (vehicles, pedestrians, buildings).

Hierarchical processing

Real point clouds have millions of points. Processing all of them at full resolution is computationally expensive. The solution: hierarchical coarsening and upsampling, analogous to U-Net in image processing:

Downsample: Use farthest point sampling to select a representative subset (e.g., 1M → 64K → 4K → 256 points).
Encode: At each level, build a k-NN graph and apply message passing. Pool (aggregate) point features from fine to coarse level.
Decode: Upsample features from coarse to fine using interpolation and skip connections. Each point receives predictions at the original resolution.

Key architectures

PointNet++: Hierarchical point processing with local set abstraction (essentially a GNN with mean aggregation and multi-scale grouping).
DGCNN: Dynamic graph CNN that rebuilds the k-NN graph in feature space after each layer. State of the art on many classification benchmarks.
Point Transformer: Applies self-attention (graph transformer) within local neighborhoods. Best accuracy on large-scale segmentation.
KPConv: Kernel point convolution, a continuous convolution operator on point clouds. Efficient and accurate for outdoor scenes.

Applications

Autonomous driving: 3D object detection and tracking from LiDAR. Detecting cars, pedestrians, and cyclists in real-time.
Robotics: Scene understanding for robotic manipulation. Identifying graspable surfaces and obstacles.
Architecture/Construction: Converting laser scans to building information models (scan-to-BIM).
Manufacturing: Quality inspection by comparing scanned parts to CAD models. Detecting sub-millimeter defects.

Key Takeaways

1Point clouds become graphs by connecting nearby points (k-NN or radius). The resulting graph encodes local geometry: surfaces, edges, corners, and curves become distinct neighborhoods.
2GNNs outperform point-independent methods by learning local geometric patterns through message passing. A corner has different neighbors than a flat surface.
3Hierarchical processing (coarsen, encode, upsample) handles million-point clouds efficiently. This is the 3D equivalent of the U-Net encoder-decoder architecture.
4Dynamic graph construction (DGCNN) rebuilds the graph in feature space after each layer, connecting semantically similar points regardless of spatial distance.
5Applications span autonomous driving, robotics, architecture, and manufacturing, anywhere 3D sensing produces point clouds that need semantic understanding.

Point Cloud Processing: Treating 3D Point Sets as Graphs for Learning

Graph construction from point clouds

Why local geometry matters

Hierarchical processing

Key architectures

Applications

Frequently asked questions

How are point clouds converted to graphs?

Why use GNNs instead of PointNet for point clouds?

What are common applications of point cloud GNNs?

Related

From the Kumo Learn Hub

Learn more about graph ML