Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

A Kumo Graph is a fundamental concept in the SDK. It links multiple Table objects (each created from a SourceTable) into a relational schema that represents the relationships between tables for a specific business problem. Graphs are used as input to predictive queries and training jobs.

Column

The metadata for a single column in a Table is represented by a Column object. Columns can be fetched from a table with Table.column() and modified by adjusting their properties. Related: Dtype, Stype.

Column

from kumoai.graph import Column

col = Column(name="order_date", stype="timestamp", dtype="date")
name
str
required
The name of this column.
stype
Union[Stype, str]
default:"None"
The semantic type. Can be specified as a string — see Stype for valid values.
dtype
Union[Dtype, str]
default:"None"
The data type. Can be specified as a string — see Dtype for valid values.
timestamp_format
Union[str, TimestampUnit]
default:"None"
For timestamp columns, the format string used to parse the value. Intelligently inferred by Kumo if not specified.

Table

A Table represents the full metadata for a table in a Kumo Graph. Unlike a SourceTable (which is just a reference to data behind a connector), a Table specifies selected columns, their data and semantic types, and relational constraint information (primary key, time column, end time column).

Table

from kumoai.graph import Table

table = Table.from_source_table(
    source_table=src,
    primary_key="order_id",
    time_column="order_date",
)
source_table
SourceTable
required
The source table this Kumo table is created from.
columns
Sequence[Union[SourceColumn, Column]]
default:"None"
The columns to include. Defaults to all columns from the source table. Each column must have its dtype and stype specified.
primary_key
Union[Column, str]
default:"None"
The primary key column, if present. Must exist in columns.
time_column
Union[Column, str]
default:"None"
The time column, if present. Must exist in columns.
end_time_column
Union[Column, str]
default:"None"
The end time column, if present. Must exist in columns.

from_source_table() staticmethod

Convenience constructor that creates a Table from a SourceTable.
source_table
SourceTable
required
The source table to create from.
column_names
List[str]
default:"None"
Column names to include. All columns are included if not specified.
primary_key
str
default:"None"
The primary key column name.
time_column
str
default:"None"
The time column name.
end_time_column
str
default:"None"
The end time column name.
Returns Table

columns property

Returns List[Column] — All columns in this table.

primary_key property

Returns Optional[Column] — The primary key column, or None.

time_column property

Returns Optional[Column] — The time column, or None.

end_time_column property

Returns Optional[Column] — The end time column, or None.

column()

Returns the named column.
name
str
required
The column name.
Returns Column

has_column()

name
str
required
The column name.
Returns boolTrue if the column exists in this table.

add_column()

Adds a Column to this table.

remove_column()

name
str
required
The column name to remove.
Returns Table

infer_metadata()

Infers any missing dtype and stype values from the source table.
verbose
bool
default:"True"
Whether to print progress output.
Returns Table

validate()

Validates the table configuration for use with Kumo.
verbose
bool
default:"True"
Whether to print validation output.
Returns Table

get_stats()

Fetches column statistics from a snapshot of this table.
wait_for
str
default:"\"minimal\""
The snapshot wait level.
Returns pd.DataFrame

save()

Saves the table to Kumo and returns its ID.
name
str
default:"None"
Optional name to save the table under.
Returns str

load() classmethod

Loads a previously saved table.
table_id_or_template
str
required
The table ID or named template.
Returns Table Prints the full table definition with placeholder names.

Graph

A Graph represents a full relational schema over a set of Table objects, including the primary key / foreign key relationships between them. Once a graph is created, you are ready to write a PredictiveQuery and train a model.

Graph

from kumoai.graph import Graph

graph = Graph(
    tables={"users": users_table, "orders": orders_table},
    edges=[("orders", "user_id", "users")],
)
tables
Dict[str, Table]
default:"None"
Tables in the graph, keyed by unique table name.
edges
Iterable[EdgeLike]
default:"None"
Foreign key relationships between tables. Each edge specifies (src_table, fkey, dst_table).

id property

Returns str — A unique identifier derived from the graph’s schema. Two graphs with any difference in their tables or columns are guaranteed to have distinct IDs.

snapshot_id property

Returns Optional[GraphSnapshotID] — The snapshot ID, if available.

tables property

Returns Dict[str, Table]

edges property

Returns List[Edge]

table()

name
str
required
The table name.
Returns Table

has_table()

name
str
required
The table name.
Returns bool

add_table()

name
str
required
The name to register the table under.
table
Table
required
The table to add.

remove_table()

name
str
required
The name of the table to remove.
Adds a foreign key edge to the graph.
edge
Edge
required
The edge to add.

infer_metadata()

Infers missing metadata in all tables in the graph.
verbose
bool
default:"True"
Whether to print progress output.
Returns Graph Automatically detects foreign key relationships between tables.
verbose
bool
default:"True"
Whether to print progress output.
Returns Graph

validate()

Validates the graph structure before use with a predictive query.
verbose
bool
default:"True"
Whether to print validation output.
Returns Graph

get_table_stats()

Fetches statistics for all tables in the graph.
wait_for
Optional[str]
default:"None"
The snapshot wait level.
Returns Dict[str, pd.DataFrame]

get_edge_stats()

Returns GraphHealthStats — Health statistics for each edge in the graph.

visualize()

Exports the graph structure as a Graphviz diagram.
path
str
default:"None"
Output path for the diagram.
show_cols
bool
default:"True"
Whether to include column names in the diagram.

save()

Saves the graph to Kumo.
name
Optional[str]
default:"None"
Optional name for the saved graph.
skip_validation
bool
default:"False"
Whether to skip validation before saving.
Returns str

load() classmethod

Loads a previously saved graph.
graph_id_or_template
str
required
The graph ID or named template.
Returns Graph Prints the full graph definition with placeholder names.

Edge

Represents a foreign key relationship between two tables. Edges are always bidirectional within Kumo.
from kumoai.graph import Edge

edge = Edge(src_table="orders", fkey="user_id", dst_table="users")
src_table, fkey, dst_table = edge  # supports unpacking
src_table
str
required
The source table name. This table must have a foreign key column fkey that links to the destination table’s primary key.
fkey
str
required
The name of the foreign key column in the source table.
dst_table
str
required
The destination table name. Must have a primary key that the source table’s foreign key references.

GraphHealthStats

Contains edge-level health statistics computed as part of a Graph snapshot. Index with an Edge object to retrieve per-edge statistics.
stats = graph.get_edge_stats()
edge_stats = stats[Edge("orders", "user_id", "users")]