Model Plan
Kumo simplifies the process of building high-performing Graph Neural Networks (GNNs) with two key tools:
-
AutoML: Automatically selects the best GNN architecture, column encodings, and training table generation strategy based on your dataset and predictive query.
-
Model Planner: Provides fine-grained control over GNN architecture, column encoding, and training configurations for experienced users who want to optimize performance.
Model Plan
Whenever you write a predictive query, Kumo generates a modeling plan that covers:
Column Encoding
Kumo automates column encoding, transforming raw tabular data into model-ready inputs. The AutoML algorithm analyzes data types, column semantics, and statistical properties to determine the best encoding strategy. Supported encodings include:
-
Hash Encoding: For high-cardinality identifiers (e.g.,
product_code
). -
Datetime Encoding: For timestamps.
-
Numerical Encoding: For quantities (e.g.,
num_visits
). -
Index Encoding: For boolean values.
Training Table Generation
Kumo automatically generates training tables with properly ordered train, validation, and holdout splits, preventing data leakage in temporal queries. The system optimally samples data to ensure balanced splits, even for complex predictive tasks involving time-based aggregations.
Architecture Search
Kumo integrates state-of-the-art GNN architectures, including GraphSAGE, GIN, ID-GNN, GCN, PNA, and GAT. Kumo selects the best architecture and hyperparameters for each predictive query, optimizing aspects like:
-
Neighborhood sampling method
-
Layer connectivity
-
Embedding size
-
Aggregation methods
Kumo runs multiple experiments (typically 2-8) to find the best configuration, displaying the final architecture and hyperparameters in the UI for transparency and fine-tuning.
Fine-Grained Control with Model Plan
For users who need customization, Kumo’s Model Plan offers direct control over model configurations. Use cases include:
Data Split Strategy
Specify exact holdout datasets using TimeRangeSplit
, ensuring compatibility with external models or enforcing organizational constraints.
Faster Job Execution
Skip the full AutoML search by specifying a known architecture, reducing experiment count and improving iteration speed.
Performance Optimization
Adjust hyperparameters beyond AutoML defaults to maximize accuracy, such as increasing channel limits or enabling refit
for full dataset training.
Custom Data Encoding
Override AutoML’s encoding choices to:
-
Treat missing numerical values as
0
-
Use advanced NLP encoding for specific text columns
Optimization Method Customization
Modify Kumo’s tuning metric (e.g., AUROC, MAE, Loss) or optimize for recommendation objectives like diversity vs. recall.
Embedding Export
Ensure stable embeddings for downstream use in KNN lookups or feature engineering, or disable embedding generation to prioritize accuracy with advanced GNN architectures.