Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Node Regression: Predicting Continuous Values for Graph Nodes

Node regression is the continuous-valued counterpart to node classification. It predicts a numerical score, amount, or probability for each node using both its own features and information from its graph neighborhood.

PyTorch Geometric

TL;DR

  • 1Node regression predicts a continuous value per node using message passing to incorporate neighborhood information. Same architecture as classification, different output head and loss.
  • 2Enterprise applications: revenue forecasting per customer, credit risk scores, demand prediction per product, customer lifetime value, continuous fraud probability scores.
  • 3Architecture: GNN encoder (GCNConv layers) + linear head with 1 output + MSE loss. The GNN encoder is identical to classification; only the output and loss differ.
  • 4On RelBench, GNN-based regression on relational graphs outperforms flat-table methods across enterprise forecasting tasks because multi-hop patterns capture cross-table predictive signals.
  • 5In PyG: replace softmax + cross-entropy with linear(hidden, 1) + MSE loss. Everything else (GNN layers, training loop, neighbor sampling) stays the same.

Node regression predicts a continuous numerical value for each node in a graph by combining the node's own features with information aggregated from its neighborhood through message passing. It is the regression counterpart to node classification. The GNN architecture is identical; only the output layer (linear with 1 output instead of num_classes outputs) and loss function (MSE instead of cross-entropy) differ. Node regression powers enterprise forecasting tasks: revenue prediction, demand estimation, risk scoring, and lifetime value calculation.

Why it matters for enterprise data

Most enterprise prediction tasks are regression problems: “How much will this customer spend next quarter?” “What is this product's expected demand?” “What is this loan's probability of default?” These are continuous values, not categories.

Flat-table regression uses only the entity's own features. Node regression on a relational graph adds the entity's full context: a customer's predicted revenue incorporates their order history, the products they bought, the categories trending among similar customers, and their interaction with support. This cross-table signal is captured automatically through 2-3 layers of message passing.

How node regression works

node_regression.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv

class NodeRegressor(torch.nn.Module):
    def __init__(self, in_dim, hidden_dim):
        super().__init__()
        self.conv1 = SAGEConv(in_dim, hidden_dim)
        self.conv2 = SAGEConv(hidden_dim, hidden_dim)
        # Output head: 1 value per node
        self.head = torch.nn.Linear(hidden_dim, 1)

    def forward(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, p=0.3, training=self.training)
        x = self.conv2(x, edge_index)
        return self.head(x).squeeze(-1)  # [num_nodes]

model = NodeRegressor(in_dim=16, hidden_dim=64)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(200):
    model.train()
    pred = model(data.x, data.edge_index)
    # MSE loss on labeled nodes only
    loss = F.mse_loss(pred[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

# Inference: predict continuous value for all nodes
model.eval()
predictions = model(data.x, data.edge_index)  # continuous scores

Identical to node classification except: Linear(hidden, 1) instead of Linear(hidden, num_classes), and mse_loss instead of cross_entropy.

Concrete example: customer lifetime value prediction

A subscription business wants to predict 12-month customer lifetime value (CLV):

  • Customer nodes: features = [tenure_months, plan_tier, monthly_charge]
  • Order nodes: features = [amount, item_count, discount_applied]
  • Product nodes: features = [price, category, margin]
  • Edges: customer → order (placed), order → product (contains)
  • Target: total revenue from each customer over the next 12 months (continuous $)

After 2 SAGEConv layers, each customer's embedding captures:

  • Their own subscription level and tenure
  • Their purchasing patterns (average order value, frequency)
  • The products they buy (high-margin vs. low-margin, growing vs. declining categories)

The regression head predicts a dollar amount. The model learns that customers buying growing-category, high-margin products have higher CLV, even if their current spending is moderate.

Limitations and what comes next

  1. Target distribution: Enterprise regression targets (revenue, claim amounts) are often heavily right-skewed. Log-transforming targets before training and exponentiating predictions improves performance significantly.
  2. Temporal leakage: When predicting future values (next-quarter revenue), the graph must only include edges from before the prediction date. Future edges leak the answer. Temporal splits are essential.
  3. Uncertainty quantification: Point predictions (single values) are often insufficient for enterprise decision-making. Extending node regression to produce prediction intervals requires ensemble methods or probabilistic GNNs.

Frequently asked questions

What is node regression?

Node regression is the task of predicting a continuous numerical value for each node in a graph. Like node classification, it uses message passing to aggregate neighborhood information, but the output is a real number (or vector of real numbers) per node rather than a class label. The model is trained with a regression loss (MSE or MAE) instead of cross-entropy.

What are enterprise examples of node regression?

Revenue forecasting (predicted next-quarter revenue per customer), credit score estimation (predicted default probability as a continuous score), demand prediction (predicted units sold per product), customer lifetime value (predicted total future spend), and risk scoring (continuous fraud probability per transaction). Any continuous-valued entity-level prediction on relational data maps to node regression.

How is node regression different from standard regression?

Standard regression uses each entity's own features. Node regression adds information from the entity's relational neighborhood. A product's demand prediction uses not just the product's attributes but also the purchasing patterns of its customers, the performance of competing products, and seasonal signals from the retail calendar, all captured through message passing on the relational graph.

What loss function is used for node regression?

Mean Squared Error (MSE) is the default. Mean Absolute Error (MAE) is more robust to outliers. Huber loss combines both: MSE for small errors, MAE for large errors. For enterprise tasks with skewed targets (revenue, claim amounts), log-transforming the target before training often improves performance.

Can I use the same GNN architecture for regression and classification?

Yes, the GNN encoder (message passing layers) is identical. Only the output head differs: use a linear layer with 1 output and MSE loss for regression instead of num_classes outputs with cross-entropy for classification. Everything else (architecture, hyperparameters, training procedure) is the same.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.