Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide6 min read

Foreign Key as Edge: How Primary-Foreign Key Relationships Create Graph Structure

Every foreign key in your database is a graph edge hiding in plain sight. When order #1001 references customer_id = 42, that is a directed edge from the order node to the customer node. Multiply this across every FK in every table and you have a rich heterogeneous graph.

PyTorch Geometric

TL;DR

  • 1A foreign key reference IS a graph edge. It connects two row-nodes (child row to parent row) with a typed, directed relationship. The edge type is determined by the FK column and the tables it connects.
  • 2Edge direction follows the FK: order -> customer (the order references the customer). For GNN message passing, edges are made bidirectional so information flows both ways: what orders a customer has AND what customer placed an order.
  • 3Multiple FKs in one table create multiple edge types. An order_items table with order_id FK and product_id FK creates two edge types: order_item -> order and order_item -> product.
  • 4NULL FKs mean no edge: the node exists but is disconnected from that relationship type. GNNs handle this naturally; the node receives no messages along that edge type.
  • 5This FK-as-edge mapping is the atomic building block of relational deep learning. Every cross-table pattern that a GNN learns flows along these FK edges.

Every foreign key reference in a relational database is a graph edge. This is the atomic insight that connects relational databases to graph neural networks. When the orders table has a customer_id column referencing the customers table, every non-NULL value in that column creates a directed edge from an order node to a customer node. The graph does not need to be constructed or designed. It already exists in the schema.

Anatomy of a FK edge

A single foreign key relationship creates:

  • Source node: the row containing the FK value (child table row)
  • Target node: the row referenced by the FK value (parent table row)
  • Edge type: named by the relationship, e.g., “order places customer” or more precisely “order.customer_id -> customer.customer_id”
  • Direction: from child to parent (the FK points to the PK)
fk_to_edges.py
# orders table
# order_id | customer_id | amount | date
#    1001  |     42      | 67.50  | 2024-01-15
#    1002  |     42      | 123.00 | 2024-01-20
#    1003  |     17      | 45.99  | 2024-01-21

# This creates 3 edges:
# order_1001 -> customer_42
# order_1002 -> customer_42
# order_1003 -> customer_17

# In PyG edge_index format:
import torch
edge_index = torch.tensor([
    [1001, 1002, 1003],  # source (order nodes)
    [  42,   42,   17],  # target (customer nodes)
])

Three rows with customer_id FK create three edges. Customer 42 has degree 2 (two orders). Customer 17 has degree 1.

Bidirectional message passing

Foreign keys are inherently directional: the child references the parent. But for GNN message passing, we want information to flow both ways:

  • Forward (FK direction): order -> customer. The order sends its features (amount, date) to the customer. After aggregation, the customer knows about its orders.
  • Reverse: customer -> order. The customer sends its features (age, location) to the order. After aggregation, each order knows about the customer who placed it.

In PyG, this is handled by adding reverse edge types. For a heterogeneous graph, each FK creates both a forward and a reverse edge type. This doubles the edge count but enables full bidirectional information flow.

Multiple FKs per table

Tables often have multiple foreign keys. An order_items table references both orders (which order) and products (which product). Each FK creates its own edge type:

  • order_item.order_idorder.order_id: edge type “belongs_to_order”
  • order_item.product_idproduct.product_id: edge type “of_product”

This creates a path: customer → order → order_item → product. Three hops. Three layers of message passing propagate product information all the way to the customer node. The GNN learns which product-level features matter for customer-level predictions without any manual feature engineering.

Handling NULL foreign keys

Not every FK value is populated. An order might have coupon_id = NULL (no coupon applied). In the graph, this simply means no edge exists along that relationship for that node. The order node still exists and participates in message passing through its other edges (customer_id, etc.). It just receives no messages along the coupon edge type.

GNNs handle this gracefully. The aggregation function (sum, mean, max) over an empty set of messages returns a zero vector or a learnable default. The model learns that “no coupon edge” itself can be informative (full-price purchases might correlate differently with churn than discounted purchases).

Self-referential foreign keys

Some tables reference themselves: an employees table with a manager_id FK pointing back to employees.employee_id. This creates a directed graph within a single node type: employee -[reports_to]-> employee. The result is an organizational hierarchy graph where GNNs can propagate information up and down the reporting chain.

From schema to graph: automation

The FK-to-edge mapping is entirely automatable. Given a database connection:

  1. Read the schema: extract table names, column types, PK and FK constraints
  2. Create node types: one per table
  3. Create nodes: one per row, with column values as features
  4. Create edge types: one per FK relationship (plus reverse)
  5. Create edges: one per non-NULL FK value

No human decisions are required. No domain knowledge is needed. The graph structure is fully determined by the schema. This automation is what makes schema-agnostic encoding possible: a single GNN architecture can process any database.

Frequently asked questions

Why are foreign keys natural graph edges?

A foreign key is a reference from one row to another. It says 'this order belongs to this customer' or 'this product is in this category.' That reference IS an edge: it connects two entities with a typed relationship. The edge type is determined by the foreign key column name and the tables it connects.

Are foreign key edges directed or undirected?

Foreign keys create directed edges (child table points to parent table: order -> customer). However, for GNN message passing, edges are typically made bidirectional: information flows from customer to order (what kind of customer placed this order?) AND from order to customer (what orders has this customer placed?). PyG handles this with to_undirected() or by adding reverse edge types.

What about composite foreign keys?

Composite foreign keys (referencing multiple columns) create the same edge structure: one edge between the referencing row and the referenced row. The composite key identifies a unique target node. For example, a composite FK on (store_id, department_id) creates one edge to the specific store-department combination.

What happens when a foreign key is NULL?

A NULL foreign key means no relationship exists: the order has no assigned customer yet, or the product has no category. In the graph, this simply means no edge is created. The node exists but is disconnected from that particular relationship type. GNNs handle disconnected nodes naturally; they just receive no messages along that edge type.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.