KumoRFM (Kumo Relational Foundation Model) provides a powerful interface for querying relational data using a pre-trained foundation model. Unlike traditional ML approaches that require feature engineering and model training, KumoRFM generates predictions directly from raw relational data using PQL queries.Documentation Index
Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
KumoRFM consists of three main components:LocalTable— Apandas.DataFramewrapper that manages metadata including semantic types, primary keys, and time columns.Graph— A collection ofLocalTableobjects with edges defining relationships between tables.KumoRFM— The main interface for querying the foundation model.
Workflow
- Load relational data into
pandas.DataFrameobjects. - Create
LocalTableobjects (or useGraph.from_data()directly). - Build a
Graphdefining the relationships between tables. - Initialize
KumoRFMwith your graph. - Execute predictive queries to get predictions, explanations, or evaluations.
Query Language
KumoRFM uses Predictive Query Language (PQL). For a full introduction see the Querying guide, Prediction Types, and Filters and Operators.
The KumoRFM PQL syntax requires specifying the entity to predict for:
- A single entity:
users.user_id=1 - A tuple of entities:
users.user_id IN (1, 2, 3)
Table
Abstract base class for tables in a KumoRFM graph. Implemented by LocalTable.
LocalTable
A single in-memory table backed by a pandas.DataFrame, with metadata support for primary keys, time columns, and semantic types.
The DataFrame backing this table.
A unique name for this table within the graph.
primary_key property
Returns Optional[str] — The primary key column name.
Set via table.primary_key = "column_name".
time_column property
Returns Optional[str] — The time column name.
Set via table.time_column = "column_name".
infer_metadata()
Automatically infers dtype and stype for all columns.
Returns LocalTable
metadata property
Returns Dict — Full column metadata dictionary.
Graph
A collection of LocalTable objects with edges defining foreign key relationships — analogous to a relational database schema.
The tables in the graph.
Foreign key relationships as
(src_table, fkey, dst_table) tuples.from_data() classmethod
Creates a Graph directly from a dictionary of DataFrames.
Mapping of table name to DataFrame.
Optional edges to add. Inferred automatically if not specified.
Whether to automatically infer column metadata.
Whether to print progress output.
Graph
from_sqlite() classmethod
Creates a Graph from a SQLite database.
The SQLite connection — a path string,
Path, connection config dict, or ADBC connection object.Tables to include. Includes all tables if not specified.
Optional edges. Inferred from foreign key constraints if not specified.
Whether to automatically infer column metadata.
Graph
from_snowflake() classmethod
Creates a Graph from a Snowflake database.
The Snowflake connection object or credentials dict.
Tables to include. Includes all tables if not specified.
The Snowflake database name.
The Snowflake schema name.
Optional edges.
Whether to automatically infer column metadata.
Graph
add_table()
The table to add.
link()
Adds a foreign key edge.
The source table name (the one with the foreign key).
The foreign key column name in the source table.
The destination table name (the one with the primary key).
unlink()
Removes a foreign key edge.
infer_metadata()
Graph
infer_links()
Automatically detects foreign key relationships.
Graph
validate()
Validates the graph before use with KumoRFM.
Returns Graph
print_metadata()
Prints metadata for all tables in the graph.
print_links()
Prints all edges in the graph.
visualize()
Renders an interactive visualization of the graph schema.
KumoRFM
The main interface to the Kumo Relational Foundation Model. Generates predictions for any relational dataset without training.
The relational graph to query over.
Whether to print progress output during inference.
If
True, optimizes the underlying data backend for repeated querying (e.g. creates missing indices on transactional databases). Requires write access to the data backend.predict()
Returns predictions for a PQL query.
A PQL query string specifying the prediction task and target entities.
Specific entity indices to predict for. Predicts for all entities if
None.If
True or an ExplainConfig, returns an Explanation object instead of a plain DataFrame.If
True, includes entity embeddings in the output DataFrame.The prediction anchor time. Uses the most recent available time if
None. Pass 'entity' to use each entity’s own timestamp.The inference run mode controlling speed vs. accuracy trade-off.
Per-hop neighbor counts for subgraph sampling. Uses defaults if
None.Number of hops for subgraph sampling.
Number of lag timesteps for temporal context.
Random seed for reproducibility.
Whether to print progress output.
Union[pd.DataFrame, Explanation]
evaluate()
Evaluates a PQL query against labeled data and returns metric scores.
The PQL query string. The target entities must have ground-truth labels.
Metrics to compute. Uses task-appropriate defaults if
None.The evaluation anchor time.
The inference run mode.
Number of hops for subgraph sampling.
pd.DataFrame — Metric scores.
retry() context manager
Context manager that retries failed queries up to num_retries times.
Maximum number of retry attempts on failure.
batch_mode() context manager
Context manager that batches multiple predictions together for efficiency.
Number of entities per batch.
'max' uses the largest batch size supported by the model.Number of retry attempts per batch on failure.
ExplainConfig
Configuration for explainability output.
If
True, skips generating a human-readable natural language summary of the explanation.Explanation
The result of a predict() call with explain=True. Contains both the prediction scores and a natural language explanation.
prediction
Type pd.DataFrame — Prediction scores, one row per entity.
summary
Type str — Human-readable explanation of the most important features.