> ## Documentation Index
> Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# DuckDB Connector

> Connect KumoRFM to data stored in a DuckDB database

KumoRFM can connect directly to DuckDB databases, automatically inferring table metadata and relationships from the schema.

## Installation

The DuckDB backend requires the ADBC DuckDB driver:

```bash theme={null}
pip install kumoai[duckdb]
```

## Quick Start

Connect to a file-based DuckDB database:

```python theme={null}
import kumoai.rfm as rfm

graph = rfm.Graph.from_duckdb(connection="my_database.duckdb")
```

This will:

1. Connect to the DuckDB database
2. Discover all non-temporary, non-internal tables automatically
3. Infer column metadata (data types, semantic types, primary keys, time columns)
4. Detect foreign key relationships
5. Print a summary of the inferred metadata and links

## Specifying Tables

Control which tables to include and customize their configuration:

```python theme={null}
graph = rfm.Graph.from_duckdb(
    connection="my_database.duckdb",
    tables=[
        "USERS",
        {"name": "ORDERS", "source_name": "ORDERS_SNAPSHOT"},
        {"name": "ITEMS", "primary_key": "ITEM_ID"},
    ],
)
```

Table configuration options:

| Key               | Description                                                           | Required |
| ----------------- | --------------------------------------------------------------------- | -------- |
| `name`            | The table name used in PQL queries                                    | Yes      |
| `source_name`     | The actual table name in the database (if different from `name`)      | No       |
| `primary_key`     | Override the auto-detected primary key                                | No       |
| `time_column`     | The name of the time column for this table                            | No       |
| `end_time_column` | The name of the end time column for this table                        | No       |
| `columns`         | Selected source columns or column specs, including expression columns | No       |

## Connection Options

**In-memory database:**

```python theme={null}
from kumoai.rfm.backend.duckdb import connect

conn = connect()  # in-memory DuckDB database
graph = rfm.Graph.from_duckdb(connection=conn)
```

**From a file path:**

```python theme={null}
graph = rfm.Graph.from_duckdb(connection="path/to/database.duckdb")
```

**From an existing ADBC connection:**

```python theme={null}
from kumoai.rfm.backend.duckdb import connect

conn = connect("path/to/database.duckdb")
graph = rfm.Graph.from_duckdb(connection=conn)
```

**From a connection config dict:**

```python theme={null}
graph = rfm.Graph.from_duckdb(
    connection={"uri": "path/to/database.duckdb"},
)
```

## Controlling Metadata Inference

```python theme={null}
graph = rfm.Graph.from_duckdb(
    connection="my_database.duckdb",
    infer_metadata=False,
    verbose=False,
)

graph.infer_metadata()
graph.infer_links()
```

## Manual Edge Specification

Override automatic link detection by providing edges explicitly:

```python theme={null}
graph = rfm.Graph.from_duckdb(
    connection="my_database.duckdb",
    edges=[
        ("ORDERS", "USER_ID", "USERS"),
        ("ORDERS", "ITEM_ID", "ITEMS"),
    ],
)
```
