> ## Documentation Index
> Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Connector

> Connect KumoRFM to data stored in a Databricks SQL warehouse

KumoRFM can connect directly to Databricks SQL warehouses using Unity Catalog, enabling predictions on enterprise-scale data without moving it out of Databricks.

## Installation

The Databricks backend requires the Databricks connector:

```bash theme={null}
pip install kumoai[databricks]
```

## Quick Start

Connect using explicit Databricks credentials:

```python theme={null}
import kumoai.rfm as rfm

graph = rfm.Graph.from_databricks(
    connection={
        "server_hostname": "<workspace>.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/<warehouse_id>",
        "access_token": "<access_token>",
    },
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)
```

This will:

1. Connect to the Databricks SQL warehouse
2. Discover all tables in the specified catalog and schema
3. Infer column metadata (data types, semantic types, primary keys, time columns)
4. Detect foreign key relationships
5. Print a summary of the inferred metadata and links

## Specifying Tables

Control which tables to include and customize their configuration:

```python theme={null}
graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    tables=[
        "USERS",
        {"name": "ORDERS", "source_name": "ORDERS_SNAPSHOT"},
        {"name": "ITEMS", "schema": "OTHER_SCHEMA"},
    ],
)
```

Table configuration options:

| Key           | Description                                                    | Required |
| ------------- | -------------------------------------------------------------- | -------- |
| `name`        | The table name used in PQL queries                             | Yes      |
| `source_name` | The actual table name in Databricks (if different from `name`) | No       |
| `catalog`     | Override the default catalog for this table                    | No       |
| `schema`      | Override the default schema for this table                     | No       |

## Authentication

**Personal access token:**

```python theme={null}
graph = rfm.Graph.from_databricks(
    connection={
        "server_hostname": "<workspace>.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/<warehouse_id>",
        "access_token": "<personal_access_token>",
    },
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)
```

**Existing connection object:**

```python theme={null}
from kumoai.rfm.backend.databricks import connect

conn = connect(
    server_hostname="<workspace>.cloud.databricks.com",
    http_path="/sql/1.0/warehouses/<warehouse_id>",
    access_token="<access_token>",
)
graph = rfm.Graph.from_databricks(
    connection=conn,
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
)
```

## Catalog and Schema

The `catalog` and `schema` parameters refer to Unity Catalog naming:

* `catalog`: The Unity Catalog catalog (top-level namespace)
* `schema`: The schema within the catalog

If both are omitted, KumoRFM uses the current catalog and schema from the active Databricks session.
Individual tables can override these defaults using `catalog` and `schema` keys in their configuration dictionary.

## Controlling Metadata Inference

```python theme={null}
graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    infer_metadata=False,
    verbose=False,
)

graph.infer_metadata()
graph.infer_links()
```

## Manual Edge Specification

```python theme={null}
graph = rfm.Graph.from_databricks(
    connection={...},
    catalog="MY_CATALOG",
    schema="MY_SCHEMA",
    edges=[
        ("ORDERS", "USER_ID", "USERS"),
        ("ORDERS", "ITEM_ID", "ITEMS"),
    ],
)
```
