Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

KumoRFM (Kumo Relational Foundation Model) provides a powerful interface for querying relational data using a pre-trained foundation model. Unlike traditional ML approaches that require feature engineering and model training, KumoRFM generates predictions directly from raw relational data using PQL queries.

Overview

KumoRFM consists of three main components:
  1. LocalTable — A pandas.DataFrame wrapper that manages metadata including semantic types, primary keys, and time columns.
  2. Graph — A collection of LocalTable objects with edges defining relationships between tables.
  3. KumoRFM — The main interface for querying the foundation model.

Workflow

  1. Load relational data into pandas.DataFrame objects.
  2. Create LocalTable objects (or use Graph.from_data() directly).
  3. Build a Graph defining the relationships between tables.
  4. Initialize KumoRFM with your graph.
  5. Execute predictive queries to get predictions, explanations, or evaluations.
import pandas as pd
from kumoai.rfm import Graph, KumoRFM

graph = Graph.from_data({
    "users": users_df,
    "orders": orders_df,
})
graph.link("orders", "user_id", "users")

rfm = KumoRFM(graph)
result = rfm.predict("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)")

Query Language

KumoRFM uses Predictive Query Language (PQL). For a full introduction see the Querying guide, Prediction Types, and Filters and Operators. The KumoRFM PQL syntax requires specifying the entity to predict for:
PREDICT <aggregation_expression> FOR <entity_specification>
Entities can be specified as:
  • A single entity: users.user_id=1
  • A tuple of entities: users.user_id IN (1, 2, 3)

Table

Abstract base class for tables in a KumoRFM graph. Implemented by LocalTable.

LocalTable

A single in-memory table backed by a pandas.DataFrame, with metadata support for primary keys, time columns, and semantic types.
from kumoai.rfm import LocalTable

table = LocalTable(df=users_df, name="users")
table.infer_metadata()
table.primary_key = "user_id"
df
pd.DataFrame
required
The DataFrame backing this table.
name
str
required
A unique name for this table within the graph.

primary_key property

Returns Optional[str] — The primary key column name. Set via table.primary_key = "column_name".

time_column property

Returns Optional[str] — The time column name. Set via table.time_column = "column_name".

infer_metadata()

Automatically infers dtype and stype for all columns. Returns LocalTable

metadata property

Returns Dict — Full column metadata dictionary.

Graph

A collection of LocalTable objects with edges defining foreign key relationships — analogous to a relational database schema.
from kumoai.rfm import Graph

# From DataFrames directly:
graph = Graph.from_data({
    "users": users_df,
    "orders": orders_df,
})

# Manual construction:
graph = Graph(tables=[users_table, orders_table])
graph.link("orders", "user_id", "users")
graph.validate()
tables
Sequence[Table]
required
The tables in the graph.
edges
Sequence[EdgeLike]
default:"None"
Foreign key relationships as (src_table, fkey, dst_table) tuples.

from_data() classmethod

Creates a Graph directly from a dictionary of DataFrames.
df_dict
Dict[str, pd.DataFrame]
required
Mapping of table name to DataFrame.
edges
Sequence[EdgeLike]
default:"None"
Optional edges to add. Inferred automatically if not specified.
infer_metadata
bool
default:"True"
Whether to automatically infer column metadata.
verbose
bool
default:"True"
Whether to print progress output.
Returns Graph

from_sqlite() classmethod

Creates a Graph from a SQLite database.
connection
Union[AdbcSqliteConnection, SqliteConnectionConfig, str, Path, dict]
required
The SQLite connection — a path string, Path, connection config dict, or ADBC connection object.
tables
Sequence[Union[str, dict]]
default:"None"
Tables to include. Includes all tables if not specified.
edges
Sequence[EdgeLike]
default:"None"
Optional edges. Inferred from foreign key constraints if not specified.
infer_metadata
bool
default:"True"
Whether to automatically infer column metadata.
Returns Graph

from_snowflake() classmethod

Creates a Graph from a Snowflake database.
connection
Union[SnowflakeConnection, dict, None]
default:"None"
The Snowflake connection object or credentials dict.
tables
Sequence[Union[str, dict]]
default:"None"
Tables to include. Includes all tables if not specified.
database
str
default:"None"
The Snowflake database name.
schema
str
default:"None"
The Snowflake schema name.
edges
Sequence[EdgeLike]
default:"None"
Optional edges.
infer_metadata
bool
default:"True"
Whether to automatically infer column metadata.
Returns Graph

add_table()

table
Table
required
The table to add.
Adds a foreign key edge.
src_table
str
required
The source table name (the one with the foreign key).
fkey
str
required
The foreign key column name in the source table.
dst_table
str
required
The destination table name (the one with the primary key).
Removes a foreign key edge.
src_table
str
required
fkey
str
required
dst_table
str
required

infer_metadata()

verbose
bool
default:"True"
Returns Graph Automatically detects foreign key relationships.
verbose
bool
default:"True"
Returns Graph

validate()

Validates the graph before use with KumoRFM. Returns Graph Prints metadata for all tables in the graph. Prints all edges in the graph.

visualize()

Renders an interactive visualization of the graph schema.

KumoRFM

The main interface to the Kumo Relational Foundation Model. Generates predictions for any relational dataset without training.
from kumoai.rfm import KumoRFM

rfm = KumoRFM(graph)
result = rfm.predict("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)")
graph
Graph
required
The relational graph to query over.
verbose
bool
default:"True"
Whether to print progress output during inference.
optimize
bool
default:"False"
If True, optimizes the underlying data backend for repeated querying (e.g. creates missing indices on transactional databases). Requires write access to the data backend.

predict()

Returns predictions for a PQL query.
result = rfm.predict(
    "PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)"
)
# Returns a DataFrame with columns: entity_id, prediction_score

result_with_explain = rfm.predict(query, explain=True)
prediction_df, summary_text = result_with_explain
query
str
required
A PQL query string specifying the prediction task and target entities.
indices
Sequence[Union[str, float, int]]
default:"None"
Specific entity indices to predict for. Predicts for all entities if None.
explain
Union[bool, ExplainConfig, dict]
default:"False"
If True or an ExplainConfig, returns an Explanation object instead of a plain DataFrame.
return_embeddings
bool
default:"False"
If True, includes entity embeddings in the output DataFrame.
anchor_time
Union[pd.Timestamp, Literal['entity']]
default:"None"
The prediction anchor time. Uses the most recent available time if None. Pass 'entity' to use each entity’s own timestamp.
run_mode
Union[RunMode, str]
default:"RunMode.FAST"
The inference run mode controlling speed vs. accuracy trade-off.
num_neighbors
List[int]
default:"None"
Per-hop neighbor counts for subgraph sampling. Uses defaults if None.
num_hops
int
default:"2"
Number of hops for subgraph sampling.
lag_timesteps
int
default:"0"
Number of lag timesteps for temporal context.
random_seed
Optional[int]
default:"fixed seed"
Random seed for reproducibility.
verbose
bool
default:"True"
Whether to print progress output.
Returns Union[pd.DataFrame, Explanation]

evaluate()

Evaluates a PQL query against labeled data and returns metric scores.
metrics = rfm.evaluate("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2)")
query
str
required
The PQL query string. The target entities must have ground-truth labels.
metrics
List[str]
default:"None"
Metrics to compute. Uses task-appropriate defaults if None.
anchor_time
Union[pd.Timestamp, Literal['entity']]
default:"None"
The evaluation anchor time.
run_mode
Union[RunMode, str]
default:"RunMode.FAST"
The inference run mode.
num_hops
int
default:"2"
Number of hops for subgraph sampling.
verbose
bool
default:"True"
Returns pd.DataFrame — Metric scores.

retry() context manager

Context manager that retries failed queries up to num_retries times.
with rfm.retry(num_retries=3):
    result = rfm.predict(query)
num_retries
int
default:"1"
Maximum number of retry attempts on failure.

batch_mode() context manager

Context manager that batches multiple predictions together for efficiency.
with rfm.batch_mode(batch_size=32):
    result = rfm.predict(query)
batch_size
Union[int, Literal['max']]
default:"\"max\""
Number of entities per batch. 'max' uses the largest batch size supported by the model.
num_retries
int
default:"1"
Number of retry attempts per batch on failure.

ExplainConfig

Configuration for explainability output.
from kumoai.rfm import ExplainConfig

result = rfm.predict(query, explain=ExplainConfig(skip_summary=False))
skip_summary
bool
default:"False"
If True, skips generating a human-readable natural language summary of the explanation.

Explanation

The result of a predict() call with explain=True. Contains both the prediction scores and a natural language explanation.
explanation = rfm.predict(query, explain=True)

prediction_df = explanation.prediction  # pd.DataFrame
summary_text = explanation.summary      # str

# Supports unpacking:
prediction_df, summary_text = explanation

# Renders nicely in Jupyter:
explanation.print()

prediction

Type pd.DataFrame — Prediction scores, one row per entity.

summary

Type str — Human-readable explanation of the most important features.

print()

Prints the prediction DataFrame and explanation summary to stdout.