Skip to main content
KumoRFM can connect directly to Databricks SQL warehouses using Unity Catalog, enabling predictions on enterprise-scale data without moving it out of Databricks.

Installation

The Databricks backend requires the Databricks connector:
pip install kumoai[databricks]

Quick Start

Connect using explicit Databricks credentials:
import kumoai.rfm as rfm

graph = rfm.Graph.from_databricks(
    connection={
        "server_hostname": "<workspace>.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/<warehouse_id>",
        "access_token": "<access_token>",
    },
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)
This will:
  1. Connect to the Databricks SQL warehouse
  2. Discover all tables in the specified catalog and schema
  3. Infer column metadata (data types, semantic types, primary keys, time columns)
  4. Detect foreign key relationships
  5. Print a summary of the inferred metadata and links

Specifying Tables

Control which tables to include and customize their configuration:
graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    tables=[
        "USERS",
        {"name": "ORDERS", "source_name": "ORDERS_SNAPSHOT"},
        {"name": "ITEMS", "schema": "OTHER_SCHEMA"},
    ],
)
Table configuration options:
KeyDescriptionRequired
nameThe table name used in PQL queriesYes
source_nameThe actual table name in Databricks (if different from name)No
catalogOverride the default catalog for this tableNo
schemaOverride the default schema for this tableNo

Authentication

Personal access token:
graph = rfm.Graph.from_databricks(
    connection={
        "server_hostname": "<workspace>.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/<warehouse_id>",
        "access_token": "<personal_access_token>",
    },
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)
Existing connection object:
from kumoai.rfm.backend.databricks import connect

conn = connect(
    server_hostname="<workspace>.cloud.databricks.com",
    http_path="/sql/1.0/warehouses/<warehouse_id>",
    access_token="<access_token>",
)
graph = rfm.Graph.from_databricks(
    connection=conn,
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)

Catalog and Schema

The catalog and schema parameters refer to Unity Catalog naming:
  • catalog: The Unity Catalog catalog (top-level namespace)
  • schema: The schema within the catalog
If both are omitted, KumoRFM uses the current catalog and schema from the active Databricks session. Individual tables can override these defaults using catalog and schema keys in their configuration dictionary.

Controlling Metadata Inference

graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    infer_metadata=False,
    verbose=False,
)

graph.infer_metadata()
graph.infer_links()

Manual Edge Specification

graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    edges=[
        ("ORDERS", "USER_ID", "USERS"),
        ("ORDERS", "ITEM_ID", "ITEMS"),
    ],
)