KumoRFM (Kumo Relational Foundation Model). Understanding these requirements is essential for creating high-quality datasets that maximize KumoRFM’s predictive capabilities.
Introduction
KumoRFM operates on relational data organized as interconnected tables forming a graph structure. The foundation of this process starts with a set of pandas.DataFrame objects, which are transformed into LocalTable objects and assembled into a Graph. Proper data preparation ensures optimal model performance and reliable predictions.
Key Terms and Concepts
Before diving into the technical details, it’s important to understand the key terms used throughout this guide:-
pandas DataFrame
A two-dimensional labeled data structure inpandas, similar to a spreadsheet or SQL table. Data frames are the starting point for allKumoRFMdata preparation workflows. A collection of data frames connected by pkey/fkey relationships defines a relational database. -
pandas dtype
The data type of apandas.Seriesorpandas.DataFramecolumn (e.g.,int64,float64,object,bool). These represent howpandasstores and processes the data internally. -
Kumo Dtype (
kumoai.Dtype)
KumoRFM’s representation of physical data storage types (e.g.,Dtype.int,Dtype.string,Dtype.float). These are mapped frompandasdtypes and determine how data is processed by the foundation model. -
Kumo Stype (
kumoai.Stype)
Semantic types that define how the data should be interpreted by the foundation model (e.g.,Stype.numerical,Stype.categorical,Stype.ID). These determine what preprocessing and modeling techniques are applied to each column. -
LocalTable (
LocalTable)
A wrapper around apandas.DataFramethat includes metadata such as column types, the primary key, and time column. Each table can have at most one primary key and at most one time column, but it can contain many foreign keys (primary keys of other tables). ALocalTableis the fundamental building block in order to defineKumoRFMgraphs. -
Graph (
Graph)
A collection of interconnectedLocalTableobjects representing the relational structure of your data. TheGraphdefines how tables relate to each other through primary/foreign key relationships. How we connect the tables is a modeling decision that is important for the performance of the foundation model.
Dtype.string could have Stype.categorical (for category labels) or Stype.text (for natural language), leading to completely different preprocessing approaches. The other important modeling decision is the structure of graph, it affects both the performance of KumoRFM on the data as well as which predictions can be defined with PQL (see Make Predictions).
Data Connectors
KumoRFM supports multiple ways to load data into a `Graph`:| Backend | Best For | Entry Point |
|---|---|---|
| LocalTable (pandas) | Small to medium datasets already in memory | Graph.from_data() |
| SQLite | File-based databases, prototyping | Graph.from_sqlite() |
| Snowflake | Enterprise data warehouses | Graph.from_snowflake() |
| RelBench | Benchmarking and experimentation | Graph.from_relbench() |
Guide Structure
This guide is organized into focused sections for easy navigation:Data Types & Semantic TypesTable DefinitionsGraph DefinitionsBest PracticesSnowflakeSQLiteRelBench