Version 2.12 delivers over 70 improvements across training stability, batch prediction, forecasting, ingestion, connectors, error handling, and UI polish.Functionality
Show Functionality
Functionality
Optimization
Show Optimization
Optimization
Enhancement
Show Enhancement
ingestion & connectors
Duplicate Column Validation for CSVs
When uploading CSV files, the system now detects duplicate column names and blocks ingestion with a descriptive error banner in the UI.
Retry Logic for Databricks SSL Errors
The Databricks connector retry logic now catches SSLError, ensuring transient SSL failures trigger retries instead of causing immediate job failure.
Exact Stats in Graph Snapshots
Graph Snapshot views always show available exact column stats after ingestion, falling back to sampled stats only if none exist.
Edge Stats Versioning Fix
Graph snapshot edge stats now reference the correct snapshot version instead of latest, ensuring accurate row/match counts across environments.
Training & Evaluation
Snowflake Stage Mount V2 Adoption
Added support for Stage Mount V2 to bypass Snowflake’s path length limitation without relying on credential vending. Controlled by FSUTIL_SNOWFLAKE_USE_STAGE_MOUNT=, this change improves reliability of training and batch prediction pipelines.
Training Stability for OOM on Minimum Batch Size
Training jobs no longer fail outright when batch size reduction hits the minimum threshold. Instead, jobs continue if at least one experiment succeeds, ensuring partial results are preserved.
UI Task Type for Standalone Train Table Jobs
Standalone train table jobs triggered via SDK now correctly display task type (e.g., Regression, Classification) in the UI, ensuring parity with training jobs.
Weight Column Flag Fix
has_weight_col is now set only when both weight_mode is non-null and a custom training table includes a weight column. Fixes missing majority_sampling_ratio display in Oscilar classification jobs.
RHS Node Count Validation
Training jobs now validate RHS node counts during train table generation, failing early if limits are exceeded (10M+), with clear error messages.
Unsupported Field Validation
Model plan validation now treats unsupported fields as blocking errors (not warnings), ensuring invalid configs fail fast and consistently across SDK and UI.
Predictive Query
Improved Forecasting with ASSUMING Clause
Forecasting predictive queries using ASSUMING filters now run successfully without producing invalid step-size errors (e.g., 604800000000000). Training table generation handles entity filters consistently across WHERE and ASSUMING clauses, ensuring valid autoregressive labels.
Batch Prediction & XAI
UI Boolean Value Rendering in XAI
Boolean fields in local XAI tables now display correctly as true/false instead of appearing blank, with MISSING reserved for nulls.
UI Stats Labeling
Batch prediction stats are now explicitly labeled as “estimated” to avoid confusion with exact row counts.
Output Size Estimation Improvements
Batch prediction output size is now estimated across 5 sample files instead of just the first partition, preventing misleading “0 rows exported” results when later partitions contain data.
UI/UX Polish
Graph Visualization Stability
Fixed issues with graph layout where node width/height mutations caused nodes to spread or reset during zoom/pan. Graph positions and zoom are now stable across reloads.
Error message
Empty CSV Handling
Uploading an empty CSV now produces a clear error message (“Failed to fetch column data for source table: table might be empty”) instead of an ambiguous internal error.
Improved File Format Error Messaging
Invalid file types selected in S3 connectors (e.g., .yml, .txt) now show “Unsupported file type (.csv or .parquet required)” instead of misleading “file not found” errors.
BigQuery Connector Error Handling
Errors from BigQuery ingestion (e.g., resource timeouts) now surface as clear, customer-facing messages in the UI with query duration logged.
Error Messaging for Spark OOM
Training jobs failing due to Spark OutOfMemory now surface descriptive UI errors, advising users to resize clusters, instead of vague fallback messages.
Improved Validation Messaging for FK/PK Errors
When a foreign key is linked to multiple primary keys, users now see a clear error message referencing the table/column, without exposing internal IDs.
Version 2.11 delivers over 60 improvements across ingestion, training workflows, validation feedback, and UI experience enhancements.Functionality
Show Functionality
Prediction Window Minutes Support
Predictive queries now support defining windows in minutes rather than just days, enabling high-frequency models (e.g., 15-minute intervals) ideal for streaming use cases.
Optimization
Show Optimization
Reduced Duplicate API Calls
Redundant API calls in the Prediction creation and Training job views have been removed, significantly improving performance and reducing backend load.
Enhancement
Show Enhancement
ingestion & connectors
Streaming Table Support
Databricks STREAMING_TABLE types are now fully supported and treated like views to prevent unsupported operations like time travel.
training & evaluation
Experiment Metric Logging Fixes
TrainingProgressLogger now correctly logs training metrics after computation, ensuring that visualizations for train/validation metrics are accurate and complete from epoch 0.
predictivequery
Improved Future-Looking Error Handling
Predictive queries that use future-looking filters now raise hard validation errors instead of being misclassified as warnings, preventing silent training failures.
Missing Threshold for Binary Prediction
Users are now clearly warned when binary_classification_threshold is missing during SDK-based predictions, making it easier to debug.
batch prediction
Support Embedding-Only Predictions
The system now allows running predictions with only embedding output selected — prediction values are no longer required.
UI/UX polish
Train From Graph Shortcut
A “Train a model” button has been added to the Graph page, allowing users to directly launch model creation with the selected graph preloaded.
Dropdown Sort Prefix + Cleanup
The Tables and Graphs list pages now feature a clean “Sort by:” dropdown supporting Most Recent, A–Z, Z–A, and Least Recent, with better padding and prefix rendering.
Fixed Scroll Behavior in Prediction + Graph Views
Scroll logic across the New Prediction and Graph canvas views has been improved to avoid layout shifts, locked scrollbars, and sticky header bugs.
QueryCard Copy Button
A copy-to-clipboard button is now available for Predictive Queries on all job detail pages (Training, Prediction, Train Table), improving usability.
Display Inferred Task Type in New Model
After validating a Predictive Query, the New Model page now shows the inferred task type (e.g., Binary Classification) with tooltip and documentation link.
Error message
Warnings and Errors Shown Separately
The PQ validation panel now clearly separates blocking errors and non-blocking warnings into distinct sections with color-coded styling.
Blocked Underscore Naming Errors
Users now receive clear error messages when entering table, graph, or connector names with multiple underscores, preventing invalid inputs.
Better Split Error Feedback
When using temporal splits on static prediction tasks, users now receive a helpful message advising that a create_date is required on the target table.
Improved Validation Messaging
Training job failures now surface specific backend error messages (including validation and connector errors), replacing vague fallback errors like “Training failed.” Table refresh workflow timeouts also now show actionable context.
Improved Validation for File Connectors
File-based training tables now properly resolve the connector source path and no longer fail with “FileNotFoundError.”
Cleaner Error Messaging on Refresh Failures
Timeout or failure in table refresh (e.g., missing S3 files or BigQuery hangs) now results in a helpful error banner pointing to upstream causes.
Version 2.10 delivers over 50 updates across data ingestion, prediction accuracy, system stability, and UI polish.Functionality
Show Functionality
Improved Static LP Configuration Support
Advanced users can now define static link prediction modes via ModelPlan, enabling better control over leakage and modeling strategies.
Optimization
Show Optimization
S3 to BigQuery Export Optimization
Batch prediction results are now streamed from S3 to BigQuery to improve performance and prevent memory-related job failures.
Enhancement
Show Enhancement
#ingestion & connectors
Databricks Connector Guardrails
The system now prevents creation of Databricks connectors if backend support isn’t enabled, ensuring early validation and fewer runtime surprises.
Connector Credential Retention on Update
Updating a Databricks connector now preserves existing credentials, fixing a bug that previously caused silent prediction failures.
Table Name Clean-Up and Collision Prevention
Table names with repeated underscores are automatically normalized, and manual entry with multiple underscores is now blocked.
Graph Link Suggestions After Table Edit
Editing a table node within a graph now immediately triggers updated graph link suggestions, reducing friction in graph setup workflows.
Clear Feedback for Table Graph Errors
Improved graph editing behaviors now include cleaner warnings, better name overflow handling, and consistent layout.training & evaluation
Trainer Stability for Large Embedding Tables
Trainer now exits gracefully when encountering large right-hand-side (RHS) tables that would previously cause CUDA out-of-memory crashes.
PQ Validation Robustness with Multi-word Keywords
Autocomplete logic in the predictive query editor has been upgraded to correctly suggest multi-word keywords like FOR EACH, avoiding syntax corruption.
Support for Multi-Class Targets with Empty Strings
Empty-string rows are now excluded from multi-class classification tasks, resolving CUDA indexing errors and ensuring class label consistency.
Improved Error Messaging for Static Node Prediction
Predictive query errors now provide helpful suggestions (e.g., use LAST() or aggregate values) when multiple matches are detected per entity.
Better Split Statistics in TimeSplit
The split stats logic now aligns with offset-based boundaries and non-strict modes, producing accurate distribution metrics across partitions.predictivequery
Validation Skipped for Legacy Trained Queries
Trained models with deprecated or renamed connectors can now be reviewed in the UI without triggering connector re-validation.batchprediction
Prediction Anchor Time Consistency
The Prediction Overview now accurately displays the backend-provided anchor time value, removing timezone-related confusion during batch prediction reviews.
Improved Prediction on Non-Empty Columns
Users now receive actionable guidance when a prediction job fails because the prediction column is already populated.
Version 2.9 delivers over 40 fixes and improvements across ingestion, training, evaluation, and user experience.Functionality
Show Functionality
PQuery End Time Support: Predictive query now supports use of end_time_col. For instance, users can write a query with “WHERE USER_LIST.WITHDRAW_DATE > 2011-10-25” where WITHDRAW_DATE is the end_time_col.
Optimization
Show Optimization
Predictive Query Validation Latency Reduction: Internal PQ validator now runs significantly faster under concurrent load, with latency reduced up to 25% across 20-100 parallel jobs.
Connector API Call Reductions: Eliminated redundant connector API calls during Prediction creation and Graph View loads to improve page responsiveness.
Enhancement
Show Enhancement
#ingestion & connectors
Databricks Connector Creation Guardrails: UI now blocks creation of multiple Databricks connectors to avoid backend stability issues.
S3 Path Copy Button: Users can now easily copy S3 source paths directly from connector detail panels.
#training & evaluation
Primary Key Auto-Suggestion: The system suggests likely primary keys based on column heuristics when users omit this field during table creation.
Adaptive Sampling Neighbor Validation: Improved validation logic estimates safe node counts based on NumNeighbors and sampling config to prevent GPU OOM failures during training. Users receive clearer warnings when configurations exceed safe thresholds.
Negative Weight Validation: Training jobs are blocked if weight columns contain negative values, preventing invalid model configurations.
Improved Metrics Table Display: The evaluation metrics table now uses the new UI Catalog Datatable for improved readability, displays baseline model scores for easier comparison, and clearly shows macro-averaged metrics like F1.
Improved Error Readability: PQ validator now displays human-readable operators (e.g., >) instead of internal enum names (e.g., RelOp.GT).
PQ Warning UX: PQ validation warnings are now surfaced even when queries are otherwise valid, ensuring users receive early guidance without blocking execution.
#UI/UX polish
List Tooltip Behavior: Tooltips only show on truncated list items, eliminating unnecessary hover pop-ups.
Graph View Auto-Layout Fixes: Dragged table positions in Graph View no longer auto-reset after idle refreshes.
Version 2.7 delivers major improvements to prediction reliability, graph editing workflows, and UI consistency. It delivers over 70 fixes and enhancements across predictive modeling, training stability, connector flows, and error messaging. Improvements
Show Improvements
Graph creation blocked if no primary key: Graph builder now prevents creation when no primary or foreign key is detected, ensuring better data integrity during setup.
Snowflake access control respected: Users now see only the Snowflake tables they have read access to, reducing ingestion-related permission errors.
Automatic renaming of unnamed columns: Local uploads automatically rename empty or generic column headers (e.g., “Unnamed: 0”) to avoid ingestion failures.
Improved SDK reliability on Snowflake session expiry: The SDK now automatically retries generate_prediction_table() when a Snowflake session expires, preventing job crashes.
Connector errors converted into readable UI failures: Missing or renamed source files now surface clear error messages in the UI instead of returning generic server errors.
Model plan conflict warning for channels and aggregation: Users are now alerted when conflicting legacy and scoped configuration fields are defined in the same model plan.
Improved empty state experience across the UI: Legacy placeholder views were replaced with polished UI Catalog components across connectors, tables, models, and predictions.
Clearer error messaging for failed jobs: Job failure messages now specify the exact failed stage (e.g., Graph Snapshot, Prediction Table), making it easier to identify issues.
Standardized input components across product: Legacy form fields were updated to use consistent UI Catalog components such as TextInput, TextArea, and Label across multiple workflows.
Reduced snapshot polling frequency: Snapshot polling was reduced from every 2 seconds to every 30 seconds to improve system performance and reduce backend load.
Training job settings no longer duplicated: Training configuration details are now shown only once in the job details view, improving visual clarity.
Graph snapshot histogram charts restored: A prior patch was reverted to restore consistent rendering of histogram charts in the Graph Snapshot view.
Tag filtering restored in V1 UI: Tag filters in the prediction jobs list now reliably return matching results.
Deprecation
Show Deprecation
Multi-file upload rolled back: The “Add Table” button in Local Upload was removed to preserve a linear and streamlined onboarding experience; multi-upload is no longer supported in the connector panel.
Version 2.6 brings a rich mix of platform stability enhancements, error message clarity improvements, and UI polish across the Kumo experience. This release delivers over 50 improvements across graph building, model training, prediction, and table ingestion. Improvements
Show Improvements
TimeSplit training logic and timestamp consistency: TimeSplit models now correctly apply boundary logic, and selected timestamp units are preserved across the pipeline to ensure reliable splits, fingerprints, and cache behavior.
Improved timestamp handling and recognition: Users can now assign Timestamp types to string fields during table registration, and anchor times are displayed consistently in both UI and logs, eliminating manual fixes and formatting mismatches.
Clearer error messaging across workflows: Users now see precise messages for connector path issues, missing or renamed files, type mismatches in graph validation, and filtered training datasets with zero rows.
UI enhancements to graph creation and display: Placeholder graph names have been removed in favor of manual naming with validation, and graph creation now blocks if no primary/foreign key is detected to ensure data integrity.
Batch prediction improvements: Prediction jobs now display custom tags, clearly identify UI-triggered jobs, exclude training weights to avoid memory overload, and hide ROC curves in degenerate cases for clarity.
Consistent and polished input and list views: Form components and list pages have been updated for consistency in spacing, styling, and layout, and dropdowns were reorganized to surface high-priority actions like “Create New Graph.”
Stable handling of schema and data changes: The UI now gracefully handles schema edits such as column renaming or removal, and ingestion jobs properly report on deleted files or empty row counts with actionable feedback.
Parquet and CSV handling refinements: Header logic is now correctly scoped to file type, .parquet folder names no longer cause errors, and string-based column preview widths remain stable during data load.
S3 and connector performance improvements: S3 file discovery supports max_items to avoid full scans, and updated connector panel messaging provides better guidance during empty or misconfigured states.
Improved subgraph and explainability display: The XAI subgraph view now offers more complete and stable rendering of model behavior, improving interpretability.
Miscellaneous usability and editor updates: YAML editors now support tab indentation, type names are human-readable in error messages, and dropdown behaviors and job tagging are more consistent across the UI.
This release enhanced predictive query capabilities by upgrading Predictive Query Language from v1 to v2. This release also delivered substantial workflow optimizations by integrating new temporal split functionality and enhancing job managementImprovements
Show Improvements
Integrated UI Catalog components into Kumo: Unified the app’s visual language by integrating Sidebar, ListOverview, status cards, and table search for a more consistent user experience.
Generate Prediction button added to key views: Enabled quick access to prediction workflows from the Prediction List and Training Job Overview pages.
Improved job orchestration for predictions: Enhanced logic for using parent sources, waiting on child jobs, and propagating job failure states correctly.
Reduced excessive graph snapshot refreshes: Prevented redundant refreshes to reduce load and improve responsiveness in graph workflows.
Async validation in graph workflows: Moved table.validate() into an async graph.validate() pipeline to improve performance and prevent blocking.
Timestamp unit handling standardized: Inferred and preserved timestamp units across ingestion, graph refresh, and execution to ensure consistency.
Split stats UI fixed and improved: Resolved broken displays in split table stats, restoring clarity and usability for time-based training workflows.
Enhanced XAI subgraph and score display: Fixed local score display and improved completeness of subgraph views for explainability workflows.
Batch prediction tagging and metadata exposed: Surfaced custom tags and UI-sourced job indicators to improve traceability in experiments.
Prediction query engine enhancements: Added AutoML pruning, adaptive sampling, RFM query flag, and simplified syntax (e.g., “IN” over “IS IN”).
Improved error messaging throughout the product: Delivered clearer messages for missing PKs, dtype mismatches, empty files, and normalized headers.
Robust handling of renamed/missing columns: Enabled stable behavior in workflows when table schemas change mid-pipeline.
Snowflake and S3 integration improvements: Enhanced error logging, credential handling, and S3 directory scanning for better reliability and performance.
Removed deprecated model training APIs: Finalized migration to kumo-ml, added Trainer(checkpoint_path) support, and dropped legacy task dependencies.
Updated Optuna and Databricks compatibility: Upgraded to Optuna 4.3.0 and updated Databricks table versioning to align with current backend infrastructure.
We’re thrilled to announce a major update to the Kumo platform, featuring a fully redesigned interface and . This update brings a more streamlined, modern look and feel to the platform—making it easier for ML engineers and data scientists to train and run models.What’s New:
Redesigned Navigation & Layout. A cleaner layout and intuitive navigation bar help you find the right features faster—reducing clicks and saving you time when setting up data connections or reviewing model outputs.
Powerful new Python SDK. Designed to use Kumo in your favorite IDE or notebook, seamlessly integrate with the UI, enabling robust, flexible, and interoperable workflows between code and visual interactions. SDK Reference.
Enhanced Workflows. An intuitive expeirence to help select graphs and train models faster with quicker iterations.
Improvements
Reduced the time between AutoML trials to a minimum, significantly speeding up execution, especially for workflows with many trials.
Improved encoding efficiency: Raw data is now encoded and hashed upon graph snapshotting, leading to improved security and faster execution.
Relative time is now computed for all timestamp columns, independent of whether they were assigned as a designated time column.
Kumo can now gracefully handle timestamps outside of UNIX/int64 range
Introduce job queuing for individual workflow like training table generation, prediction table generation, etc.