Kumo simplifies the process of building high-performing Graph Neural Networks (GNNs) with two key tools:
AutoML: Automatically selects the best GNN architecture, column encodings, and training table generation strategy based on your dataset and predictive query.
Model Planner: Provides fine-grained control over GNN architecture, column encoding, and training configurations for experienced users who want to optimize performance.
Kumo automates column encoding, transforming raw tabular data into model-ready inputs. The AutoML algorithm analyzes data types, column semantics, and statistical properties to determine the best encoding strategy. Supported encodings include:
Hash Encoding: For high-cardinality identifiers (e.g., product_code).
Datetime Encoding: For timestamps.
Numerical Encoding: For quantities (e.g., num_visits).
Kumo automatically generates training tables with properly ordered train, validation, and holdout splits, preventing data leakage in temporal queries. The system optimally samples data to ensure balanced splits, even for complex predictive tasks involving time-based aggregations.
Kumo integrates state-of-the-art GNN architectures, including GraphSAGE, GIN, ID-GNN, GCN, PNA, and GAT. Kumo selects the best architecture and hyperparameters for each predictive query, optimizing aspects like:
Neighborhood sampling method
Layer connectivity
Embedding size
Aggregation methods
Kumo runs multiple experiments (typically 2-8) to find the best configuration, displaying the final architecture and hyperparameters in the UI for transparency and fine-tuning.
Ensure stable embeddings for downstream use in KNN lookups or feature engineering, or disable embedding generation to prioritize accuracy with advanced GNN architectures.