Improved Static LP Configuration Support
Advanced users can now define static link prediction modes via ModelPlan, enabling better control over leakage and modeling strategies.
S3 to BigQuery Export Optimization
Batch prediction results are now streamed from S3 to BigQuery to improve performance and prevent memory-related job failures.
Databricks Connector Guardrails
The system now prevents creation of Databricks connectors if backend support isn’t enabled, ensuring early validation and fewer runtime surprises.
Connector Credential Retention on Update
Updating a Databricks connector now preserves existing credentials, fixing a bug that previously caused silent prediction failures.
Table Name Clean-Up and Collision Prevention
Table names with repeated underscores are automatically normalized, and manual entry with multiple underscores is now blocked.
Graph Link Suggestions After Table Edit
Editing a table node within a graph now immediately triggers updated graph link suggestions, reducing friction in graph setup workflows.
Clear Feedback for Table Graph Errors
Improved graph editing behaviors now include cleaner warnings, better name overflow handling, and consistent layout.
Trainer Stability for Large Embedding Tables
Trainer now exits gracefully when encountering large right-hand-side (RHS) tables that would previously cause CUDA out-of-memory crashes.
PQ Validation Robustness with Multi-word Keywords
Autocomplete logic in the predictive query editor has been upgraded to correctly suggest multi-word keywords like FOR EACH, avoiding syntax corruption.
Support for Multi-Class Targets with Empty Strings
Empty-string rows are now excluded from multi-class classification tasks, resolving CUDA indexing errors and ensuring class label consistency.
Improved Error Messaging for Static Node Prediction
Predictive query errors now provide helpful suggestions (e.g., use LAST() or aggregate values) when multiple matches are detected per entity.
Better Split Statistics in TimeSplit
The split stats logic now aligns with offset-based boundaries and non-strict modes, producing accurate distribution metrics across partitions.
Validation Skipped for Legacy Trained Queries
Trained models with deprecated or renamed connectors can now be reviewed in the UI without triggering connector re-validation.
Prediction Anchor Time Consistency
The Prediction Overview now accurately displays the backend-provided anchor time value, removing timezone-related confusion during batch prediction reviews.
Improved Prediction on Non-Empty Columns
Users now receive actionable guidance when a prediction job fails because the prediction column is already populated.
Version 2.9 delivers over 40 fixes and improvements across ingestion, training, evaluation, and user experience.
Functionality
PQuery End Time Support: Predictive query now supports use of end_time_col. For instance, users can write a query with “WHERE USER_LIST.WITHDRAW_DATE > 2011-10-25” where WITHDRAW_DATE is the end_time_col.
Optimization
Show Optimization
Predictive Query Validation Latency Reduction: Internal PQ validator now runs significantly faster under concurrent load, with latency reduced up to 25% across 20-100 parallel jobs.
Connector API Call Reductions: Eliminated redundant connector API calls during Prediction creation and Graph View loads to improve page responsiveness.
Enhancement
Show Enhancement
#ingestion & connectors
Databricks Connector Creation Guardrails: UI now blocks creation of multiple Databricks connectors to avoid backend stability issues.
S3 Path Copy Button: Users can now easily copy S3 source paths directly from connector detail panels.
#training & evaluation
Primary Key Auto-Suggestion: The system suggests likely primary keys based on column heuristics when users omit this field during table creation.
Adaptive Sampling Neighbor Validation: Improved validation logic estimates safe node counts based on NumNeighbors and sampling config to prevent GPU OOM failures during training. Users receive clearer warnings when configurations exceed safe thresholds.
Negative Weight Validation: Training jobs are blocked if weight columns contain negative values, preventing invalid model configurations.
Improved Metrics Table Display: The evaluation metrics table now uses the new UI Catalog Datatable for improved readability, displays baseline model scores for easier comparison, and clearly shows macro-averaged metrics like F1.
Improved Error Readability: PQ validator now displays human-readable operators (e.g., >) instead of internal enum names (e.g., RelOp.GT).
PQ Warning UX: PQ validation warnings are now surfaced even when queries are otherwise valid, ensuring users receive early guidance without blocking execution.
#UI/UX polish
List Tooltip Behavior: Tooltips only show on truncated list items, eliminating unnecessary hover pop-ups.
Graph View Auto-Layout Fixes: Dragged table positions in Graph View no longer auto-reset after idle refreshes.
Version 2.7 delivers major improvements to prediction reliability, graph editing workflows, and UI consistency. It delivers over 70 fixes and enhancements across predictive modeling, training stability, connector flows, and error messaging.
Improvements
Show Improvements
Graph creation blocked if no primary key: Graph builder now prevents creation when no primary or foreign key is detected, ensuring better data integrity during setup.
Snowflake access control respected: Users now see only the Snowflake tables they have read access to, reducing ingestion-related permission errors.
Automatic renaming of unnamed columns: Local uploads automatically rename empty or generic column headers (e.g., “Unnamed: 0”) to avoid ingestion failures.
Improved SDK reliability on Snowflake session expiry: The SDK now automatically retries generate_prediction_table() when a Snowflake session expires, preventing job crashes.
Connector errors converted into readable UI failures: Missing or renamed source files now surface clear error messages in the UI instead of returning generic server errors.
Model plan conflict warning for channels and aggregation: Users are now alerted when conflicting legacy and scoped configuration fields are defined in the same model plan.
Improved empty state experience across the UI: Legacy placeholder views were replaced with polished UI Catalog components across connectors, tables, models, and predictions.
Clearer error messaging for failed jobs: Job failure messages now specify the exact failed stage (e.g., Graph Snapshot, Prediction Table), making it easier to identify issues.
Standardized input components across product: Legacy form fields were updated to use consistent UI Catalog components such as TextInput, TextArea, and Label across multiple workflows.
Reduced snapshot polling frequency: Snapshot polling was reduced from every 2 seconds to every 30 seconds to improve system performance and reduce backend load.
Training job settings no longer duplicated: Training configuration details are now shown only once in the job details view, improving visual clarity.
Graph snapshot histogram charts restored: A prior patch was reverted to restore consistent rendering of histogram charts in the Graph Snapshot view.
Tag filtering restored in V1 UI: Tag filters in the prediction jobs list now reliably return matching results.
Deprecation
Show Deprecation
Multi-file upload rolled back: The “Add Table” button in Local Upload was removed to preserve a linear and streamlined onboarding experience; multi-upload is no longer supported in the connector panel.
Version 2.6 brings a rich mix of platform stability enhancements, error message clarity improvements, and UI polish across the Kumo experience. This release delivers over 50 improvements across graph building, model training, prediction, and table ingestion.
Improvements
Show Improvements
TimeSplit training logic and timestamp consistency: TimeSplit models now correctly apply boundary logic, and selected timestamp units are preserved across the pipeline to ensure reliable splits, fingerprints, and cache behavior.
Improved timestamp handling and recognition: Users can now assign Timestamp types to string fields during table registration, and anchor times are displayed consistently in both UI and logs, eliminating manual fixes and formatting mismatches.
Clearer error messaging across workflows: Users now see precise messages for connector path issues, missing or renamed files, type mismatches in graph validation, and filtered training datasets with zero rows.
UI enhancements to graph creation and display: Placeholder graph names have been removed in favor of manual naming with validation, and graph creation now blocks if no primary/foreign key is detected to ensure data integrity.
Batch prediction improvements: Prediction jobs now display custom tags, clearly identify UI-triggered jobs, exclude training weights to avoid memory overload, and hide ROC curves in degenerate cases for clarity.
Consistent and polished input and list views: Form components and list pages have been updated for consistency in spacing, styling, and layout, and dropdowns were reorganized to surface high-priority actions like “Create New Graph.”
Stable handling of schema and data changes: The UI now gracefully handles schema edits such as column renaming or removal, and ingestion jobs properly report on deleted files or empty row counts with actionable feedback.
Parquet and CSV handling refinements: Header logic is now correctly scoped to file type, .parquet folder names no longer cause errors, and string-based column preview widths remain stable during data load.
S3 and connector performance improvements: S3 file discovery supports max_items to avoid full scans, and updated connector panel messaging provides better guidance during empty or misconfigured states.
Improved subgraph and explainability display: The XAI subgraph view now offers more complete and stable rendering of model behavior, improving interpretability.
Miscellaneous usability and editor updates: YAML editors now support tab indentation, type names are human-readable in error messages, and dropdown behaviors and job tagging are more consistent across the UI.
This release enhanced predictive query capabilities by upgrading Predictive Query Language from v1 to v2. This release also delivered substantial workflow optimizations by integrating new temporal split functionality and enhancing job management
Improvements
Show Improvements
Integrated UI Catalog components into Kumo: Unified the app’s visual language by integrating Sidebar, ListOverview, status cards, and table search for a more consistent user experience.
Generate Prediction button added to key views: Enabled quick access to prediction workflows from the Prediction List and Training Job Overview pages.
Improved job orchestration for predictions: Enhanced logic for using parent sources, waiting on child jobs, and propagating job failure states correctly.
Reduced excessive graph snapshot refreshes: Prevented redundant refreshes to reduce load and improve responsiveness in graph workflows.
Async validation in graph workflows: Moved table.validate() into an async graph.validate() pipeline to improve performance and prevent blocking.
Timestamp unit handling standardized: Inferred and preserved timestamp units across ingestion, graph refresh, and execution to ensure consistency.
Split stats UI fixed and improved: Resolved broken displays in split table stats, restoring clarity and usability for time-based training workflows.
Enhanced XAI subgraph and score display: Fixed local score display and improved completeness of subgraph views for explainability workflows.
Batch prediction tagging and metadata exposed: Surfaced custom tags and UI-sourced job indicators to improve traceability in experiments.
Prediction query engine enhancements: Added AutoML pruning, adaptive sampling, RFM query flag, and simplified syntax (e.g., “IN” over “IS IN”).
Improved error messaging throughout the product: Delivered clearer messages for missing PKs, dtype mismatches, empty files, and normalized headers.
Robust handling of renamed/missing columns: Enabled stable behavior in workflows when table schemas change mid-pipeline.
Snowflake and S3 integration improvements: Enhanced error logging, credential handling, and S3 directory scanning for better reliability and performance.
Removed deprecated model training APIs: Finalized migration to kumo-ml, added Trainer(checkpoint_path) support, and dropped legacy task dependencies.
Updated Optuna and Databricks compatibility: Upgraded to Optuna 4.3.0 and updated Databricks table versioning to align with current backend infrastructure.
We’re thrilled to announce a major update to the Kumo platform, featuring a fully redesigned interface and . This update brings a more streamlined, modern look and feel to the platform—making it easier for ML engineers and data scientists to train and run models.
What’s New:
Redesigned Navigation & Layout. A cleaner layout and intuitive navigation bar help you find the right features faster—reducing clicks and saving you time when setting up data connections or reviewing model outputs.
Powerful new Python SDK. Designed to use Kumo in your favorite IDE or notebook, seamlessly integrate with the UI, enabling robust, flexible, and interoperable workflows between code and visual interactions. SDK Reference.
Enhanced Workflows. An intuitive expeirence to help select graphs and train models faster with quicker iterations.
Improvements
Reduced the time between AutoML trials to a minimum, significantly speeding up execution, especially for workflows with many trials.
Improved encoding efficiency: Raw data is now encoded and hashed upon graph snapshotting, leading to improved security and faster execution.
Relative time is now computed for all timestamp columns, independent of whether they were assigned as a designated time column.
Kumo can now gracefully handle timestamps outside of UNIX/int64 range
Introduce job queuing for individual workflow like training table generation, prediction table generation, etc.