> ## Documentation Index
> Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Graph Definitions

> Define graphs for KumoRFM

`Graph` objects represent the relational structure between your tables. **The key to a good graph is having well-prepared tables underneath** - proper dtypes, stypes, primary keys, and time columns in the individual tables are essential for graph success.

## Graph Structure and Metadata

A `Graph` holds two types of information:

* **Tables**: The collection of `LocalTable` objects containing your data
* **Edges**: The relational metadata defining how tables connect through primary/foreign key relationships

The edges are the crucial metadata that transforms individual tables into a connected relational structure, enabling `KumoRFM` to understand and leverage relationships in your data.

## Graph Construction Methods

`Graph` provides several factory methods for different data sources:

* `Graph.from_data()` — from pandas DataFrames (see below)
* `Graph.from_sqlite()` — from a SQLite database (see [SQLite Connector](/rfm/connectors/sqlite))
* `Graph.from_snowflake()` — from a Snowflake warehouse (see [Snowflake Connector](/rfm/connectors/snowflake))
* `Graph.from_relbench()` — from RelBench benchmark datasets (see [RelBench](/rfm/examples/relbench))

**From pandas DataFrames**, you can construct a graph in two ways:

```python theme={null}
import kumoai.rfm as rfm

# Method 1: Utility function (recommended for most cases)
# Automatically creates tables from data frames, infers metadata, and finds links
graph = rfm.Graph.from_data({
    'users': df_users,
    'products': df_products,
    'transactions': df_transactions
})

# Method 2: Manual construction from pre-configured table objects
tables = [users_table, products_table, transactions_table]
graph = rfm.Graph(tables=tables)
graph.infer_links()  # or define links manually
```

**The utility function** `Graph.from_data()` is often preferred because it:

1. Creates `LocalTable` objects from your data frames
2. Calls `infer_metadata()` on each table (see [Table Definitions](/rfm/table-definitions))
3. Automatically infers links between tables based on column names

## Link Inference and Naming Conventions

Link inference is based on column names, making **consistent naming conventions crucial** for automatic graph construction:

```python theme={null}
# For example, these column patterns create automatic links:
# transactions.user_id -> users.user_id (or users.id)
# orders.product_id -> products.product_id (or products.id)
# reviews.customer_id -> customers.customer_id (or customers.id)

# View inferred edges
for edge in graph.edges:
    print(f"{edge.src_table}.{edge.fkey} -> {edge.dst_table}")
```

**Best practice**: Use consistent foreign key naming (*e.g.*, always use `user_id`, not mixing `user_id`, `uid`, `customer_id` for the same relationship).

### Manual Link Management

If you cannot rename columns to follow consistent patterns, you can add links manually:

```python theme={null}
# Add specific edge
graph.link(src_table="transactions", fkey="user_id", dst_table="users")

# Remove edge
graph.unlink(src_table="transactions", fkey="user_id", dst_table="users")
```

## What Makes a Good Graph

A good `Graph` should have:

* **Well-prepared tables**: The tables should be well-prepared, and split up according to best practices (see [Table Definitions](/rfm/table-definitions))
* **Meaningful links**: Edges should represent meaningful relationships between tables, not just technical connections
* **Entities are well-defined**: Each table should represent either a single entity or a single event, not a mix of both
* **Includes prediction ready structure**: graph structure imposes limitations on the queries that can be defined with PQL (see [Make Predictions](/rfm/make-predictions)), so make sure that PQL queries you want to run are possible with the graph structure

## Working around the limitations

**Multiple entities in a single table**

Tables that mix data from multiple entities should be split for better graph structure. Think about each table as representing a single entity type or event. Here's an example:

```python theme={null}
# Original table mixing transaction, bank, and user data
mixed_data = pd.DataFrame({
    'transaction_id': [1, 2, 3],
    'bank_id': [101, 102, 101],
    'user_id': [201, 202, 203],
    'transaction_amount': [100.0, 250.0, 75.0],
    'transaction_type': ['deposit', 'withdrawal', 'transfer'],
    'bank_name': ['Chase', 'Wells Fargo', 'Chase'],
    'bank_routing': ['123456', '789012', '123456'],
    'user_name': ['Alice', 'Bob', 'Charlie'],
    'user_email': ['alice@email.com', 'bob@email.com', 'charlie@email.com']
})

# Split into three entity-focused tables

# 1. Transactions table (transaction-specific data)
transactions = mixed_data[['transaction_id', 'bank_id', 'user_id', 'transaction_amount', 'transaction_type']].copy()

# 2. Banks table (bank-specific data)
banks = mixed_data[['bank_id', 'bank_name', 'bank_routing']].drop_duplicates()

# 3. Users table (user-specific data)
users = mixed_data[['user_id', 'user_name', 'user_email']].drop_duplicates()

# Create graph with proper entity relationships
graph = rfm.Graph.from_data({
    'transactions': transactions,
    'banks': banks,
    'users': users
})
# Result: transactions.bank_id -> banks.bank_id and transactions.user_id -> users.user_id
```

**Many-to-many relationships**

`KumoRFM` only supports primary-foreign key relationships (one-to-many). Many-to-many relationships require a junction table to break them into two one-to-many relationships:

```python theme={null}
# Problem: Table with many-to-many data stored as lists/comma-separated values
user_skills_combined = pd.DataFrame({
    'user_id': [1, 2, 3],
    'user_name': ['Alice', 'Bob', 'Charlie'],
    'skills': [['Python', 'SQL'], ['SQL', 'Machine Learning'], ['Python', 'Machine Learning']],
    'proficiency_levels': [['expert', 'beginner'], ['intermediate', 'advanced'], ['expert', 'expert']]
})

# This structure cannot create proper foreign key relationships in KumoRFM

# Solution: Normalize into three tables with junction table

# 1. Users table (entity table)
users = user_skills_combined[['user_id', 'user_name']].copy()

# 2. Skills table (entity table)
all_skills = []
for skill_list in user_skills_combined['skills']:
    all_skills.extend(skill_list)
unique_skills = list(set(all_skills))

skills = pd.DataFrame({
    'skill_id': range(1, len(unique_skills) + 1),
    'skill_name': unique_skills
})

# 3. Junction table (breaks many-to-many into two one-to-many)
user_skills_records = []
for _, row in user_skills_combined.iterrows():
    for skill, proficiency in zip(row['skills'], row['proficiency_levels']):
        skill_id = skills[skills['skill_name'] == skill]['skill_id'].iloc[0]
        user_skills_records.append({
            'user_skill_id': len(user_skills_records) + 1,
            'user_id': row['user_id'],
            'skill_id': skill_id,
            'proficiency_level': proficiency
        })

user_skills = pd.DataFrame(user_skills_records)

# Create graph with proper one-to-many relationships
graph = rfm.Graph.from_data({
    'users': users,
    'skills': skills,
    'user_skills': user_skills
})
# Result: user_skills.user_id -> users.user_id and user_skills.skill_id -> skills.skill_id
```

This normalization allows proper foreign key relationships and stores relationship-specific attributes (like `proficiency_level`) in the junction table.

## Graph Utilities

**Visualizing the graph:**

```python theme={null}
graph.visualize()
```

This displays an interactive visualization of the graph structure showing tables, columns, and edges. Useful for verifying that links were inferred correctly.

**Validating the graph:**

```python theme={null}
graph.validate()
```

Checks that the graph meets all requirements for use with `KumoRFM`, including valid primary keys, consistent foreign key types, and proper edge definitions. Always validate before running predictions.