FiLMConv: Feature-wise Linear Modulation for GNNs | PyG Guide

FiLMConv: Conditioning Graph Convolution with Feature Modulation

FiLMConv applies feature-wise linear modulation to GNNs: each edge generates per-feature scale and shift parameters that modulate how neighbor information is transformed. It is more expressive than scalar attention (GATConv) and more efficient than full per-edge weight matrices (NNConv).

PyTorch Geometric

Original Paper

GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation

Brockschmidt (2019). ICML 2020

Read paper →

What FiLMConv does

FiLMConv modulates the feature transformation on a per-edge, per-feature basis:

For each edge (i, j), generate scale (gamma) and shift (beta) from neighbor j's features
Apply modulation to the transformed target features: output = gamma * (W * h_i) + beta
Aggregate modulated messages across all neighbors

The math (simplified)

FiLMConv formula

# Generate per-edge modulation parameters
gamma_ij = W_gamma · h_j + b_gamma   # scale (per feature)
beta_ij  = W_beta  · h_j + b_beta    # shift (per feature)

# Modulate the target node's transformed features
m_ij = gamma_ij * (W · h_i) + beta_ij

# Aggregate
h_i' = AGG({ m_ij : j in N(i) })

Comparison:
  GATConv:  scalar alpha_ij * W · h_j     (1 param per edge)
  FiLMConv: gamma_ij * W · h_i + beta_ij  (2d params per edge)
  NNConv:   NN(e_ij) · h_j                (d*d params per edge)

FiLMConv sits between GAT's scalar attention and NNConv's full weight matrix in terms of expressiveness and parameter count.

PyG implementation

film_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import FiLMConv

class FiLMNet(torch.nn.Module):
    def __init__(self, in_channels, hidden, out_channels, num_relations=1):
        super().__init__()
        self.conv1 = FiLMConv(in_channels, hidden,
                               num_relations=num_relations)
        self.conv2 = FiLMConv(hidden, out_channels,
                               num_relations=num_relations)

    def forward(self, x, edge_index, edge_type=None):
        x = F.relu(self.conv1(x, edge_index, edge_type))
        x = self.conv2(x, edge_index, edge_type)
        return x

# Homogeneous graph (no edge types)
model = FiLMNet(64, 64, num_classes)

# Heterogeneous graph (with edge types)
model = FiLMNet(64, 64, num_classes, num_relations=5)

FiLMConv optionally takes edge_type for heterogeneous graphs. With num_relations=1, it operates on homogeneous graphs.

When to use FiLMConv

Structured reasoning tasks. Program analysis, scene graph understanding, and logical reasoning where the relationship between nodes should modulate feature transformation.
When you need more than scalar attention. If GATConv's single attention weight per edge is too coarse, FiLMConv provides per-feature modulation.
Multi-relational graphs without HGTConv complexity. FiLMConv supports num_relations natively, providing a simpler alternative to full heterogeneous layers.

When not to use FiLMConv

When scalar attention suffices. If GATConv achieves your target accuracy, FiLMConv adds unnecessary parameters.
Very large graphs. Per-feature modulation increases memory usage. For billion-edge graphs, simpler layers with sampling are more practical.

Frequently asked questions

What is FiLMConv in PyTorch Geometric?

FiLMConv implements GNN-FiLM from Brockschmidt (2019). It applies feature-wise linear modulation to graph convolution: instead of using a fixed weight matrix, it generates per-edge scale and shift parameters from neighbor features. This conditions the transformation on the specific relationship between two nodes.

What is feature-wise linear modulation (FiLM)?

FiLM is a conditioning technique from computer vision. It applies a learned scale (gamma) and shift (beta) to each feature channel: output = gamma * input + beta. In GNN-FiLM, gamma and beta are generated from the source node's features, so the transformation adapts to each specific neighbor.

How does FiLMConv differ from GATConv?

GATConv learns a scalar attention weight per edge (how much to listen to each neighbor). FiLMConv learns per-feature scale and shift parameters per edge (how to transform each feature dimension based on the neighbor). FiLMConv modulates the entire feature vector, not just its magnitude.

When should I use FiLMConv?

Use FiLMConv for heterogeneous or multi-relational graphs where the relationship between nodes should modulate the feature transformation, not just weight the aggregation. It is particularly effective for program analysis, scene graphs, and other structured reasoning tasks.

Is FiLMConv more expressive than GATConv?

In a sense, yes. GATConv learns one scalar per edge (attention weight). FiLMConv learns 2*d parameters per edge (d scales + d shifts), giving it finer control over how each feature dimension is transformed. However, this comes with more parameters and higher computational cost.