Graph objects represent the relational structure between your tables. The key to a good graph is having well-prepared tables underneath - proper dtypes, stypes, primary keys, and time columns in the individual tables are essential for graph success.
Graph Structure and Metadata
AGraph holds two types of information:
- Tables: The collection of
LocalTableobjects containing your data - Edges: The relational metadata defining how tables connect through primary/foreign key relationships
KumoRFM to understand and leverage relationships in your data.
Graph Construction Methods
Graph provides several factory methods for different data sources:
Graph.from_data()— from pandas DataFrames (see below)Graph.from_sqlite()— from a SQLite database (see SQLite Connector)Graph.from_snowflake()— from a Snowflake warehouse (see Snowflake Connector)Graph.from_relbench()— from RelBench benchmark datasets (see RelBench)
Graph.from_data() is often preferred because it:
- Creates
LocalTableobjects from your data frames - Calls
infer_metadata()on each table (see Table Definitions) - Automatically infers links between tables based on column names
Link Inference and Naming Conventions
Link inference is based on column names, making consistent naming conventions crucial for automatic graph construction:user_id, not mixing user_id, uid, customer_id for the same relationship).
Manual Link Management
If you cannot rename columns to follow consistent patterns, you can add links manually:What Makes a Good Graph
A goodGraph should have:
- Well-prepared tables: The tables should be well-prepared, and split up according to best practices (see Table Definitions)
- Meaningful links: Edges should represent meaningful relationships between tables, not just technical connections
- Entities are well-defined: Each table should represent either a single entity or a single event, not a mix of both
- Includes prediction ready structure: graph structure imposes limitations on the queries that can be defined with PQL (see Make Predictions), so make sure that PQL queries you want to run are possible with the graph structure
Working around the limitations
Multiple entities in a single table Tables that mix data from multiple entities should be split for better graph structure. Think about each table as representing a single entity type or event. Here’s an example:KumoRFM only supports primary-foreign key relationships (one-to-many). Many-to-many relationships require a junction table to break them into two one-to-many relationships:
proficiency_level) in the junction table.
Graph Utilities
Visualizing the graph:KumoRFM, including valid primary keys, consistent foreign key types, and proper edge definitions. Always validate before running predictions.