Why are foreign keys natural graph edges?

A foreign key is a reference from one row to another. It says 'this order belongs to this customer' or 'this product is in this category.' That reference IS an edge: it connects two entities with a typed relationship. The edge type is determined by the foreign key column name and the tables it connects.

Are foreign key edges directed or undirected?

Foreign keys create directed edges (child table points to parent table: order -> customer). However, for GNN message passing, edges are typically made bidirectional: information flows from customer to order (what kind of customer placed this order?) AND from order to customer (what orders has this customer placed?). PyG handles this with to_undirected() or by adding reverse edge types.

What about composite foreign keys?

Composite foreign keys (referencing multiple columns) create the same edge structure: one edge between the referencing row and the referenced row. The composite key identifies a unique target node. For example, a composite FK on (store_id, department_id) creates one edge to the specific store-department combination.

What happens when a foreign key is NULL?

A NULL foreign key means no relationship exists: the order has no assigned customer yet, or the product has no category. In the graph, this simply means no edge is created. The node exists but is disconnected from that particular relationship type. GNNs handle disconnected nodes naturally; they just receive no messages along that edge type.

Foreign Key as Edge: How PK-FK Relationships Create Graph Structure | Kumo.ai

Every foreign key reference in a relational database is a graph edge. This is the atomic insight that connects relational databases to graph neural networks. When the orders table has a customer_id column referencing the customers table, every non-NULL value in that column creates a directed edge from an order node to a customer node. The graph does not need to be constructed or designed. It already exists in the schema.

Anatomy of a FK edge

A single foreign key relationship creates:

Source node: the row containing the FK value (child table row)
Target node: the row referenced by the FK value (parent table row)
Edge type: named by the relationship, e.g., “order places customer” or more precisely “order.customer_id -> customer.customer_id”
Direction: from child to parent (the FK points to the PK)

fk_to_edges.py

# orders table
# order_id | customer_id | amount | date
#    1001  |     42      | 67.50  | 2024-01-15
#    1002  |     42      | 123.00 | 2024-01-20
#    1003  |     17      | 45.99  | 2024-01-21

# This creates 3 edges:
# order_1001 -> customer_42
# order_1002 -> customer_42
# order_1003 -> customer_17

# In PyG edge_index format:
import torch
edge_index = torch.tensor([
    [1001, 1002, 1003],  # source (order nodes)
    [  42,   42,   17],  # target (customer nodes)
])

Three rows with customer_id FK create three edges. Customer 42 has degree 2 (two orders). Customer 17 has degree 1.

Bidirectional message passing

Foreign keys are inherently directional: the child references the parent. But for GNN message passing, we want information to flow both ways:

Forward (FK direction): order -> customer. The order sends its features (amount, date) to the customer. After aggregation, the customer knows about its orders.
Reverse: customer -> order. The customer sends its features (age, location) to the order. After aggregation, each order knows about the customer who placed it.

In PyG, this is handled by adding reverse edge types. For a heterogeneous graph, each FK creates both a forward and a reverse edge type. This doubles the edge count but enables full bidirectional information flow.

Multiple FKs per table

Tables often have multiple foreign keys. An order_items table references both orders (which order) and products (which product). Each FK creates its own edge type:

order_item.order_id → order.order_id: edge type “belongs_to_order”
order_item.product_id → product.product_id: edge type “of_product”

This creates a path: customer → order → order_item → product. Three hops. Three layers of message passing propagate product information all the way to the customer node. The GNN learns which product-level features matter for customer-level predictions without any manual feature engineering.

Handling NULL foreign keys

Not every FK value is populated. An order might have coupon_id = NULL (no coupon applied). In the graph, this simply means no edge exists along that relationship for that node. The order node still exists and participates in message passing through its other edges (customer_id, etc.). It just receives no messages along the coupon edge type.

GNNs handle this gracefully. The aggregation function (sum, mean, max) over an empty set of messages returns a zero vector or a learnable default. The model learns that “no coupon edge” itself can be informative (full-price purchases might correlate differently with churn than discounted purchases).

Self-referential foreign keys

Some tables reference themselves: an employees table with a manager_id FK pointing back to employees.employee_id. This creates a directed graph within a single node type: employee -[reports_to]-> employee. The result is an organizational hierarchy graph where GNNs can propagate information up and down the reporting chain.

From schema to graph: automation

The FK-to-edge mapping is entirely automatable. Given a database connection:

Read the schema: extract table names, column types, PK and FK constraints
Create node types: one per table
Create nodes: one per row, with column values as features
Create edge types: one per FK relationship (plus reverse)
Create edges: one per non-NULL FK value

No human decisions are required. No domain knowledge is needed. The graph structure is fully determined by the schema. This automation is what makes schema-agnostic encoding possible: a single GNN architecture can process any database.

Key Takeaways

1Every non-NULL foreign key value creates a directed edge from the child row-node to the parent row-node. The edge type is determined by the FK relationship (orders.customer_id -> customers).
2Bidirectional message passing requires adding reverse edges. Forward: order sends features to customer. Reverse: customer sends features to order. Both directions carry useful information for GNN learning.
3Multiple FKs per table create multi-hop paths. Customer -> order -> order_item -> product -> category is a 4-hop path formed by 4 FK relationships. GNNs traverse these automatically.
4NULL FK values mean no edge. The node exists but receives no messages along that edge type. GNNs handle this gracefully and can learn that the absence of a relationship is itself informative.
5The FK-to-edge mapping is fully automatable from the database schema. No domain knowledge needed. This automation is the foundation of schema-agnostic relational deep learning.

Foreign Key as Edge: How Primary-Foreign Key Relationships Create Graph Structure