Why use GNNs for manufacturing anomaly detection?

Manufacturing equipment has interconnected sensors: temperature, vibration, pressure, flow rate. An anomaly in one sensor often manifests as subtle changes across correlated sensors before the primary sensor crosses its threshold. GNNs model these inter-sensor dependencies, detecting anomalies earlier than independent sensor monitoring.

What does a sensor graph look like?

Sensors are nodes with time-series features. Edges connect physically related sensors (on the same machine), causally related sensors (upstream/downstream in the process), and statistically correlated sensors. The graph encodes the physical topology of the manufacturing process.

How do you detect anomalies with GNNs (no explicit labels)?

Train the GNN to predict normal sensor readings given the graph context (autoencoder approach). At inference, high reconstruction error indicates an anomaly: the actual reading deviates from what the graph context predicts. This unsupervised approach works without labeled anomaly data.

What is the advantage over threshold-based anomaly detection?

Threshold-based systems flag individual sensors that exceed fixed limits. GNNs detect contextual anomalies: a reading that is within normal range for the sensor but anomalous given what other related sensors are showing. This catches subtle, multi-sensor anomalies hours or days before a threshold breach.

Can KumoRFM detect manufacturing anomalies?

KumoRFM takes your sensor data tables (sensors, readings, machines, maintenance records) and predicts anomaly or failure risk with one PQL query. It captures inter-sensor dependencies and temporal patterns automatically.

Anomaly Detection with PyG: GNN on Manufacturing Sensor Graphs | PyG Guide

The business problem

Unplanned downtime costs manufacturers an estimated $50 billion annually. A single hour of downtime on an automotive production line can cost $1-2 million. Predictive maintenance aims to detect equipment degradation before failure, enabling planned repairs during scheduled maintenance windows. The challenge: anomalies often manifest as subtle multi-sensor patterns long before any individual sensor exceeds its threshold.

Why flat ML fails

Independent monitoring: Threshold-based systems monitor each sensor independently. A temperature slightly above average, combined with vibration slightly below average, might indicate bearing wear. Neither sensor triggers alone.
No causal chains: A pressure drop in Sensor A causes flow changes in Sensor B 30 seconds later. Flat models cannot capture these causal propagation patterns.
Late detection: By the time a single sensor breaches its threshold, the fault has often progressed to where unplanned downtime is unavoidable. Early detection requires multi-sensor pattern analysis.
No process topology: Sensors on the same machine or same process stage have stronger correlations. The physical topology matters for anomaly interpretation.

The relational schema

schema.txt

Node types:
  Sensor   (id, type, unit, normal_range, machine_id)
  Machine  (id, type, age, maintenance_history)
  Process  (id, stage, product_type, cycle_time)

Edge types:
  Sensor  --[on_machine]-->   Machine
  Sensor  --[correlated]-->   Sensor  (pearson_r, lag_seconds)
  Sensor  --[upstream_of]-->  Sensor  (process_flow_order)
  Machine --[in_process]-->   Process

Sensors are connected by physical proximity (same machine), statistical correlation, and process flow order.

PyG architecture: GNN autoencoder for anomaly scoring

anomaly_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import GATConv, HeteroConv, Linear

class AnomalyGNN(torch.nn.Module):
    def __init__(self, sensor_dim, hidden_dim=64, heads=4):
        super().__init__()
        self.sensor_lin = Linear(sensor_dim, hidden_dim)
        self.machine_lin = Linear(-1, hidden_dim)

        # Encoder: learn normal patterns from graph context
        self.conv1 = HeteroConv({
            ('sensor', 'correlated', 'sensor'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('sensor', 'upstream_of', 'sensor'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('sensor', 'on_machine', 'machine'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        self.conv2 = HeteroConv({
            ('sensor', 'correlated', 'sensor'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('sensor', 'upstream_of', 'sensor'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        # Decoder: predict expected sensor readings
        self.decoder = torch.nn.Sequential(
            Linear(hidden_dim, 32),
            torch.nn.ReLU(),
            Linear(32, sensor_dim),
        )

    def forward(self, x_dict, edge_index_dict):
        x_dict['sensor'] = self.sensor_lin(x_dict['sensor'])
        x_dict['machine'] = self.machine_lin(x_dict['machine'])

        x_dict = {k: F.relu(v) for k, v in
                  self.conv1(x_dict, edge_index_dict).items()}
        x_dict = self.conv2(x_dict, edge_index_dict)

        # Reconstruct expected sensor readings
        predicted = self.decoder(x_dict['sensor'])
        return predicted

    def anomaly_score(self, actual, predicted):
        # Reconstruction error = anomaly score
        return (actual - predicted).pow(2).mean(dim=-1)

GNN autoencoder: train on normal data to predict sensor readings from graph context. At inference, high reconstruction error = anomaly. No labeled anomaly data needed.

Expected performance

Anomaly detection is measured by Precision@K (at fixed recall) and detection lead time, not AUROC:

Threshold monitoring: ~60% precision at 90% recall, detection at threshold breach
Isolation Forest (flat features): ~70% precision at 90% recall
GNN autoencoder: ~85% precision at 90% recall, 2-6 hours early detection
KumoRFM (supervised failures): ~87% precision at 90% recall

Or use KumoRFM in one line

KumoRFM PQL

PREDICT failure_risk FOR machine
USING sensor, machine, process, reading_history

One PQL query. KumoRFM captures inter-sensor dependencies and temporal patterns for predictive maintenance.

Key Takeaways

1Manufacturing anomaly detection is a graph problem. Sensors are physically and causally connected, and anomalies propagate through these dependencies before individual thresholds breach.
2GNN autoencoder learns normal patterns from graph context. No labeled anomaly data is needed, just normal operating data for training.
3GNN autoencoders detect anomalies 2-6 hours before threshold-based systems, with 30-50% fewer false alarms. Multi-sensor contextual analysis catches degradation patterns invisible to per-sensor monitoring.
4Production systems need SCADA/MES integration, edge deployment for latency, and CMMS integration for automated work orders.
5KumoRFM delivers early failure warnings with one PQL query. No sensor graph construction, no autoencoder training, no anomaly threshold tuning.

Anomaly Detection: GNN on Manufacturing Sensor Graphs