kumoai.rfm

KumoRFM (Kumo Relational Foundation Model) provides a powerful interface for querying relational data using a pre-trained foundation model. Unlike traditional ML approaches that require feature engineering and model training, KumoRFM generates predictions directly from raw relational data using PQL queries.

Overview

KumoRFM consists of three main components:

LocalTable — A pandas.DataFrame wrapper that manages metadata including semantic types, primary keys, and time columns.
Graph — A collection of LocalTable objects with edges defining relationships between tables.
KumoRFM — The main interface for querying the foundation model.

Workflow

Load relational data into pandas.DataFrame objects.
Create LocalTable objects (or use Graph.from_data() directly).
Build a Graph defining the relationships between tables.
Initialize KumoRFM with your graph.
Execute predictive queries to get predictions, explanations, or evaluations.

import pandas as pd
from kumoai.rfm import Graph, KumoRFM

graph = Graph.from_data({
    "users": users_df,
    "orders": orders_df,
})
graph.link("orders", "user_id", "users")

rfm = KumoRFM(graph)
result = rfm.predict("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)")

Query Language

KumoRFM uses Predictive Query Language (PQL). For a full introduction see the Querying guide, Prediction Types, and Filters and Operators. The KumoRFM PQL syntax requires specifying the entity to predict for:

PREDICT <aggregation_expression> FOR <entity_specification>

Entities can be specified as:

A single entity: users.user_id=1
A tuple of entities: users.user_id IN (1, 2, 3)

`Table`

Abstract base class for tables in a KumoRFM graph. Implemented by LocalTable.

`LocalTable`

A single in-memory table backed by a pandas.DataFrame, with metadata support for primary keys, time columns, and semantic types.

from kumoai.rfm import LocalTable

table = LocalTable(df=users_df, name="users")
table.infer_metadata()
table.primary_key = "user_id"

pd.DataFrame

required

The DataFrame backing this table.

name

str

required

A unique name for this table within the graph.

`primary_key` `property`

Returns Optional[str] — The primary key column name. Set via table.primary_key = "column_name".

`time_column` `property`

Returns Optional[str] — The time column name. Set via table.time_column = "column_name".

`infer_metadata()`

Automatically infers dtype and stype for all columns. Returns LocalTable

`metadata` `property`

Returns Dict — Full column metadata dictionary.

`Graph`

A collection of LocalTable objects with edges defining foreign key relationships — analogous to a relational database schema.

from kumoai.rfm import Graph

# From DataFrames directly:
graph = Graph.from_data({
    "users": users_df,
    "orders": orders_df,
})

# Manual construction:
graph = Graph(tables=[users_table, orders_table])
graph.link("orders", "user_id", "users")
graph.validate()

tables

Sequence[Table]

required

The tables in the graph.

edges

Sequence[EdgeLike]

default:"None"

Foreign key relationships as (src_table, fkey, dst_table) tuples.

`from_data()` `classmethod`

Creates a Graph directly from a dictionary of DataFrames.

df_dict

Dict[str, pd.DataFrame]

required

Mapping of table name to DataFrame.

edges

Sequence[EdgeLike]

default:"None"

Optional edges to add. Inferred automatically if not specified.

infer_metadata

bool

default:"True"

Whether to automatically infer column metadata.

verbose

bool

default:"True"

Whether to print progress output.

Returns Graph

`from_sqlite()` `classmethod`

Creates a Graph from a SQLite database.

connection

Union[AdbcSqliteConnection, SqliteConnectionConfig, str, Path, dict]

required

The SQLite connection — a path string, Path, connection config dict, or ADBC connection object.

tables

Sequence[Union[str, dict]]

default:"None"

Tables to include. Includes all tables if not specified.

edges

Sequence[EdgeLike]

default:"None"

Optional edges. Inferred from foreign key constraints if not specified.

infer_metadata

bool

default:"True"

Whether to automatically infer column metadata.

Returns Graph

`from_snowflake()` `classmethod`

Creates a Graph from a Snowflake database.

connection

Union[SnowflakeConnection, dict, None]

default:"None"

The Snowflake connection object or credentials dict.

tables

Sequence[Union[str, dict]]

default:"None"

Tables to include. Includes all tables if not specified.

database

str

default:"None"

The Snowflake database name.

schema

str

default:"None"

The Snowflake schema name.

edges

Sequence[EdgeLike]

default:"None"

Optional edges.

infer_metadata

bool

default:"True"

Whether to automatically infer column metadata.

Returns Graph

`add_table()`

table

Table

required

The table to add.

`link()`

Adds a foreign key edge.

src_table

str

required

The source table name (the one with the foreign key).

fkey

str

required

The foreign key column name in the source table.

dst_table

str

required

The destination table name (the one with the primary key).

`unlink()`

Removes a foreign key edge.

src_table

str

required

fkey

str

required

dst_table

str

required

`infer_metadata()`

verbose

bool

default:"True"

Returns Graph

`infer_links()`

Automatically detects foreign key relationships.

verbose

bool

default:"True"

Returns Graph

`validate()`

Validates the graph before use with KumoRFM. Returns Graph

`print_metadata()`

Prints metadata for all tables in the graph.

`print_links()`

Prints all edges in the graph.

`visualize()`

Renders an interactive visualization of the graph schema.

`KumoRFM`

The main interface to the Kumo Relational Foundation Model. Generates predictions for any relational dataset without training.

from kumoai.rfm import KumoRFM

rfm = KumoRFM(graph)
result = rfm.predict("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)")

graph

Graph

required

The relational graph to query over.

verbose

bool

default:"True"

Whether to print progress output during inference.

optimize

bool

default:"False"

If True, optimizes the underlying data backend for repeated querying (e.g. creates missing indices on transactional databases). Requires write access to the data backend.

`predict()`

Returns predictions for a PQL query.

result = rfm.predict(
    "PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2, 3)"
)
# Returns a DataFrame with columns: entity_id, prediction_score

result_with_explain = rfm.predict(query, explain=True)
prediction_df, summary_text = result_with_explain

query

str

required

A PQL query string specifying the prediction task and target entities.

indices

Sequence[Union[str, float, int]]

default:"None"

Specific entity indices to predict for. Predicts for all entities if None.

explain

Union[bool, ExplainConfig, dict]

default:"False"

If True or an ExplainConfig, returns an Explanation object instead of a plain DataFrame.

return_embeddings

bool

default:"False"

If True, includes entity embeddings in the output DataFrame.

anchor_time

Union[pd.Timestamp, Literal['entity']]

default:"None"

The prediction anchor time. Uses the most recent available time if None. Pass 'entity' to use each entity’s own timestamp.

run_mode

Union[RunMode, str]

default:"RunMode.FAST"

The inference run mode controlling speed vs. accuracy trade-off.

num_neighbors

List[int]

default:"None"

Per-hop neighbor counts for subgraph sampling. Uses defaults if None.

num_hops

int

default:"2"

Number of hops for subgraph sampling.

lag_timesteps

int

default:"0"

Number of lag timesteps for temporal context.

random_seed

Optional[int]

default:"fixed seed"

Random seed for reproducibility.

verbose

bool

default:"True"

Whether to print progress output.

Returns Union[pd.DataFrame, Explanation]

`evaluate()`

Evaluates a PQL query against labeled data and returns metric scores.

metrics = rfm.evaluate("PREDICT COUNT(orders.*, 0, 30, days)>0 FOR users.user_id IN (1, 2)")

query

str

required

The PQL query string. The target entities must have ground-truth labels.

metrics

List[str]

default:"None"

Metrics to compute. Uses task-appropriate defaults if None.

anchor_time

Union[pd.Timestamp, Literal['entity']]

default:"None"

The evaluation anchor time.

run_mode

Union[RunMode, str]

default:"RunMode.FAST"

The inference run mode.

num_hops

int

default:"2"

Number of hops for subgraph sampling.

verbose

bool

default:"True"

Returns pd.DataFrame — Metric scores.

`retry()` `context manager`

Context manager that retries failed queries up to num_retries times.

with rfm.retry(num_retries=3):
    result = rfm.predict(query)

num_retries

int

default:"1"

Maximum number of retry attempts on failure.

`batch_mode()` `context manager`

Context manager that batches multiple predictions together for efficiency.

with rfm.batch_mode(batch_size=32):
    result = rfm.predict(query)

batch_size

Union[int, Literal['max']]

default:"\"max\""

Number of entities per batch. 'max' uses the largest batch size supported by the model.

num_retries

int

default:"1"

Number of retry attempts per batch on failure.

`ExplainConfig`

Configuration for explainability output.

from kumoai.rfm import ExplainConfig

result = rfm.predict(query, explain=ExplainConfig(skip_summary=False))

skip_summary

bool

default:"False"

If True, skips generating a human-readable natural language summary of the explanation.

`Explanation`

The result of a predict() call with explain=True. Contains both the prediction scores and a natural language explanation.

explanation = rfm.predict(query, explain=True)

prediction_df = explanation.prediction  # pd.DataFrame
summary_text = explanation.summary      # str

# Supports unpacking:
prediction_df, summary_text = explanation

# Renders nicely in Jupyter:
explanation.print()

`prediction`

Type pd.DataFrame — Prediction scores, one row per entity.

`summary`

Type str — Human-readable explanation of the most important features.

`print()`

Prints the prediction DataFrame and explanation summary to stdout.

Predictive Query

Python SDK

Model Plan

Overview

Workflow

Query Language

`Table`

`LocalTable`

`primary_key` `property`

`time_column` `property`

`infer_metadata()`

`metadata` `property`

`Graph`

`from_data()` `classmethod`

`from_sqlite()` `classmethod`

`from_snowflake()` `classmethod`

`add_table()`

`link()`

`unlink()`

`infer_metadata()`

`infer_links()`

`validate()`

`print_metadata()`

`print_links()`

`visualize()`

`KumoRFM`

`predict()`

`evaluate()`

`retry()` `context manager`

`batch_mode()` `context manager`

`ExplainConfig`

`Explanation`

`prediction`

`summary`

`print()`

Predictive Query

Python SDK

Model Plan

Documentation Index

​Overview

​Workflow

​Query Language

​Table

​LocalTable

​primary_key property

​time_column property

​infer_metadata()

​metadata property

​Graph

​from_data() classmethod

​from_sqlite() classmethod

​from_snowflake() classmethod

​add_table()

​link()

​unlink()

​infer_metadata()

​infer_links()

​validate()

​print_metadata()

​print_links()

​visualize()

​KumoRFM

​predict()

​evaluate()

​retry() context manager

​batch_mode() context manager

​ExplainConfig

​Explanation

​prediction

​summary

​print()

Overview

Workflow

Query Language

`Table`

`LocalTable`

`primary_key` `property`

`time_column` `property`

`infer_metadata()`

`metadata` `property`

`Graph`

`from_data()` `classmethod`

`from_sqlite()` `classmethod`

`from_snowflake()` `classmethod`

`add_table()`

`link()`

`unlink()`

`infer_metadata()`

`infer_links()`

`validate()`

`print_metadata()`

`print_links()`

`visualize()`

`KumoRFM`

`predict()`

`evaluate()`

`retry()` `context manager`

`batch_mode()` `context manager`

`ExplainConfig`

`Explanation`

`prediction`

`summary`

`print()`