Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Why Every Relational Database is a Graph: Rows are Nodes, Foreign Keys are Edges

The mapping is not a metaphor. It is a mathematical equivalence. Every relational database schema defines a heterogeneous graph where tables are node types, rows are nodes, foreign keys are edges, and columns are features. GNNs exploit this structure directly.

PyTorch Geometric

TL;DR

  • 1Every relational database is already a graph. Tables map to node types. Rows map to nodes. Foreign keys map to edges. Column values map to node features. No data transformation is needed.
  • 2The graph is heterogeneous: different tables produce different node types (customer, order, product) with different feature sets. Different foreign keys produce different edge types (places, contains, belongs-to).
  • 3This mapping is lossless: the graph retains all information from the original database. Every query you can express in SQL has a corresponding graph traversal. But the graph also enables operations SQL cannot express efficiently: multi-hop pattern matching, structural similarity, neighborhood aggregation.
  • 4Many-to-many relationships (junction tables) become either intermediate nodes with edges to both tables, or direct edges between the two related tables.
  • 5This insight is the foundation of relational deep learning: if every database is a graph, then GNNs can learn from any database without custom feature engineering.

Every relational database is a graph. This is not an analogy. It is a direct mathematical mapping. Take any relational database schema with tables and foreign key constraints. Each table defines a node type. Each row in a table is a node of that type. Each foreign key relationship between two tables defines an edge type. Each row referencing another row via a foreign key creates an edge between the two corresponding nodes.

The resulting graph is heterogeneous (multiple node types, multiple edge types) and carries rich features (column values on each node). This graph already exists implicitly in every SQL database. Making it explicit unlocks graph neural network-based learning.

The mapping in detail

database_to_graph.sql
-- Relational database schema
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR, age INT, city VARCHAR
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    amount DECIMAL, order_date DATE
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR, price DECIMAL, category VARCHAR
);

CREATE TABLE order_items (
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT
);

-- Graph interpretation:
-- Node types: customer, order, product, order_item
-- Edge types: customer -[places]-> order
--             order -[contains]-> order_item
--             order_item -[of_product]-> product
-- Node features: column values of each row

The database schema IS the graph schema. Primary keys identify nodes. Foreign keys define edges. The mapping is automatic.

Tables become node types

Each table in the schema corresponds to a node type in the graph. The customers table produces customer nodes. The orders table produces order nodes. The products table produces product nodes. Each node type has its own set of features (the table's columns).

Rows become nodes

Each row in a table becomes a specific node. Customer Alice (row 1 in customers) becomes node Alice with features {name: “Alice”, age: 32, city: “NYC”}. Order #1001 becomes node 1001 with features {amount: 67.50, date: “2024-01-15”}.

Foreign keys become edges

Each foreign key relationship creates edges between nodes. If order #1001 has customer_id = 1 (Alice), there is an edge from Alice to order #1001 of type “places.” If order_item row references order #1001 and product #55, there are edges connecting these three nodes.

Many-to-many relationships

Many-to-many relationships use junction tables. Students enroll in courses through a student_courses table with student_id and course_id foreign keys. Two graph representations are possible:

  • Junction node: the enrollment becomes a node with edges to both student and course. Use this when the junction carries features (enrollment date, grade, status).
  • Direct edge: skip the junction and create a direct student-enrolled-in-course edge. Use this when the junction is featureless (pure relationship).

What the graph enables that SQL cannot

SQL can express joins and aggregations. But some patterns are natural in graphs and extremely awkward in SQL:

  • Multi-hop path queries: “Find customers who bought products from the same category as products returned by customers who previously churned.” This is a 4-hop path. In SQL, it requires multiple self-joins and subqueries. In a graph, it is 4 layers of message passing.
  • Structural similarity: “Find customers with similar purchasing patterns.” In the graph, customers with similar neighborhoods (same products, same categories) will have similar GNN embeddings. No manual similarity metric needed.
  • Neighborhood aggregation: “Compute a feature for each customer that summarizes their orders, their products, and the behavior of other customers who bought those products.” This is one GNN forward pass. In SQL, it is a complex chain of GROUP BY operations.

Scale considerations

Enterprise databases are large: millions of rows become millions of nodes. The graph representation does not require materializing the full adjacency matrix. PyG uses sparse edge index representations (two arrays of source and target node IDs) that scale linearly with the number of foreign key references, not quadratically with the number of rows.

For very large databases, graph mini-batching and graph partitioning enable training on subgraphs without loading the entire database into memory.

Frequently asked questions

How does a relational database map to a graph?

Each table becomes a node type. Each row in a table becomes a node of that type. The row's column values become the node's features. Each foreign key relationship between tables becomes an edge type. A row referencing another row via a foreign key creates a directed edge between the two corresponding nodes. The result is a heterogeneous graph with multiple node and edge types.

Do you need to transform the data to create the graph?

No transformation is needed beyond reading the schema. The graph structure is implicit in the relational schema: primary keys identify nodes, foreign keys define edges, and column values provide features. Tools like PyG's data loaders can read directly from SQL databases and construct the graph representation. The mapping is mechanical and lossless.

What about many-to-many relationships?

Many-to-many relationships in relational databases use junction tables (e.g., student_courses with student_id and course_id). In the graph, the junction table can become either: (1) a node type with edges to both related tables, or (2) direct edges between the two related tables (skipping the junction node). The choice depends on whether the junction table carries meaningful features (enrollment date, grade).

How does this differ from a knowledge graph?

A knowledge graph stores (subject, predicate, object) triples with semantic meaning: (Einstein, bornIn, Ulm). A relational-database-as-graph preserves the full relational structure: every column is a feature, every row is a node, every foreign key is an edge. The relational graph is richer (dense features per node) but less semantically curated than a knowledge graph.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.