Documentation Index
Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
After defining all Table objects, construct a Graph over these tables. A Graph connects the Tables by their primary key / foreign key relationships.
Creating a Graph
graph = kumo.Graph(
# These are the tables that participate in the graph: the keys of this
# dictionary are the names of the tables, and the values are the Table
# objects that correspond to these names:
tables={
'customer': customer,
'stock': stock,
'transaction': transaction,
},
# These are the edges that define the primary key / foreign key
# relationships between the tables defined above. Here, `src_table`
# is the table that has the foreign key `fkey`, which maps to the
# table `dst_table`'s primary key:
edges=[
dict(src_table='transaction', fkey='StockCode', dst_table='stock'),
dict(src_table='transaction', fkey='CustomerID', dst_table='customer'),
],
)
# Validate the graph's correctness:
graph.validate(verbose=True)
Editing a Graph
Multiple methods exist to support adding and removing tables and edges after graph creation:
# Add a table to an existing graph:
graph.add_table('new_table_name', new_table)
# Remove a table:
graph.remove_table('table_name')
# Add an edge between two tables:
graph.link(kumo.Edge('src_table', 'fkey_column', 'dst_table'))
# Remove an edge:
graph.unlink('src_table', 'fkey_column', 'dst_table')
Snapshotting a Graph
The Graph.snapshot() method ingests all tables in the graph so that multiple model training runs use the same version of data, even as the underlying source data changes. Snapshotting is also required to view edge health statistics, which report the number of matches between primary and foreign keys across all edges.
# Snapshot the graph to lock in the current data:
graph.snapshot()
# View edge health statistics after snapshotting:
stats = graph.get_edge_stats()
print(stats)