v1.44 (2024-12-18)

Improvements

UI Enhancements

  • BP Job Creation: Fixed warnings during job creation.

Training & Database Optimizations

  • Temporal Queries: Enhanced training table generation performance when static entity filters are applied.

  • Finegrained sampling config: Number of neighbors can now be specified per edge type.

API and Framework Updates

  • Kumo-ML Update: Upgraded to the latest version for enhanced stability.

  • Session Creation Retry: Enabled automatic retries in Databricks mode.

Spark and SPCS

  • Spark Tracking: Enabled by default for streamlined debugging.

  • SPCS Efficiency: Enhanced file copy processes for faster data handling.

Bug Fixes

  • Worker Health Check: Improved warning messages and thresholds.

  • Subgraph Size: Enhanced error messaging for large subgraph scenarios.

  • XAI Table Display: Fixed table name visibility issues in the Explainability UI.

v1.43 (2024-12-05)

New Features: SDK Kumo version v1.43 introduces the Kumo Python SDK, a fully fledged job-centric, composable, and interactive programmatic interface to the Kumo machine learning platform. The SDK allows users to perform EDA, create tables, graphs & queries, train jobs, evaluate results, and orchestrate production jobs all in one notebook environment. Key features of the SDK include:

  • A Python-friendly object model representing the key components of a relational deep learning model: Connector, Table, Graph, PredictiveQuery, and Trainer.

  • A composable interface allowing users to inspect, evaluate, and modify intermediate artifacts in the Kumo pipeline (e.g., training and prediction tables, holdout dataframes, and more).

  • A new user interface to display all launched jobs, monitor job progress, and visualize outputs including training progress, evaluations, and explainability.

The SDK can be installed with:

pip install kumoai==0.2.1 --extra-index-url=https://sdk-pkg.kumoai.cloud

Documentation: Kumo SDK Docs

Improvements

UI and Jobs Page Enhancements

  • Adjusted search text box corners and reduced row padding.

  • Improved dropdown designs, tags theme, and column interactions.

  • Enhanced column width and hover behaviors for table elements.

  • Renamed components for better clarity (e.g., “Jobs Overview” updated).

  • Improved styling and content for job training details and related pages.

Snowflake Native App Improvements

  • Kumo’s Snowflake Native app is now more cost-efficient! The control plane runs on a CPU-only compute pool, utilizing a GPU compute pool only during model training, resulting in nearly 3X to 4X cost savings.

  • Improved reliability for Snowflake stage operations.

Performance and Reliability Fixes

  • Call Cache Directory Creation: Transitioned file IO operations to temporal activities to prevent deadlocks.

  • Batch Prediction Optimization: Fixed global materialization logic in batch jobs.

  • Prediction Anchor Time Visibility: Updated prediction table to display anchor time only when available.

Model Plan Improvements

  • Added support for finer-grained control of sampling, allowing granularity at the level of fkeys and hops.

Graph and Visualization Improvements

  • Updated styles, icons, and spacing for better clarity.

  • Resolved visualization issues like “fitView” bugs and enhanced graph interactivity.

  • Added related jobs and nested jobs enhancements.

  • Improved graph snapshots and introduced spacing refinements.

Training Table Improvements

  • Made the “Timeframe Chart” default on page load and removed redundant elements.

  • Added new columns and updated styling to match current requirements.

  • Set more intuitive defaults for regression task parameters.

Bug Fixes

  • UI Interaction Refinements: Fixed flickering divider issues and softened shadows.

  • Resolved inconsistencies in column width and hover effects.

  • Updated progress bar behavior and fixed reload issues.

Breaking Changes

  • AdvancedAutoTrainerOptions has been deprecated in favor of ModelPlan.

  • Removed the max_target_neighbors_per_entity option.

v1.42 (2024-11-20)

Improvements

Enhanced Migration Documentation and Database Improvements

  • Updated Migration Readme: Simplified guidance for migration processes, ensuring clear and actionable steps for smoother transitions.

  • Schema Migration Enhancements: Improved ML database migrations by fixing database connection strings for greater reliability and ease of use.

  • Spark Resource Table Addition: Enabled advanced data analytics by adding the spark_resource table to the production ML database, enhancing data processing capabilities.

Predictive (and Forecasting) Query Refinements

  • Predictive Query User Experience: Removed unnecessary pop-ups from the Predictive Query overview, streamlining user interactions and reducing distractions.

Performance and Efficiency Upgrades

  • Batch Prediction Optimization: Reduced prediction time by 20-50% for large-scale partitioned batch predictions, saving time on data processing.

  • Categorical Data Analysis: Fixed missing percentage stats for categorical columns with more than 20 categories, providing comprehensive insights.

Explainability and UI Improvements

  • XAI Enhancements: Simplified explainability visuals by updating graph details and tips for easier interpretation of predictive models.

    • Updated graph origins to display prediction averages.

    • Streamlined visuals by removing population fraction indicators from graphs and showing them as text instead.

Bug Fixes and Reliability

  • Job Type Search Fix: Resolved job type filtering issues, ensuring accurate search results when filtering by job attributes.

  • Split Horizon Adjustment: Corrected computations to ensure consistent and accurate results in horizon splitting.

Graph and Node Visualization Improvements

  • Node Graph Adjustments: Improved the visibility of nodes in large graphs by dynamically resizing node dimensions based on graph size.

v1.41 (2024-11-04)

Improvements

  • Improved Reliability for Backend Storage Systems Expanded compatibility and enhanced reliability for diverse backend storage systems including AWS S3, Databricks Unity Catalog, and Snowflake Partner Connect Storage (SPCS) stage.

  • Optimized Efficiency in Training Data Materialization for Small Graphs Significant improvements have been made in the processing efficiency of small graph data, reducing materialization time by up to 25%.

  • Entity selection in XAI for multi-class classification will now provide the top-k entities for which the model predictions are correct or wrong with high confidence.

  • Support for focal loss in binary/multi-label tasks.

  • Support for 6-hop neighbor sampling.

  • Early validation for Batch Prediction outputs written to warehouses.

  • Baseline comparisons are now available Snowflake Native App for all task types.

  • Added graph-based visualizations to display entity-level explainability for binary classification tasks.

  • Several UI improvements, like stats scale issues, fixing eval charts.

  • Performance improvements for materialization (~30% reduction) using intermediate Snowflake tables instead of external parquet files in Snowflake Native app.

Breaking Changes

  • Support for the old flat model plan YAML configuration has been fully removed.

v1.40 (2024-10)

Improvements

  • Various improvements to the experience for writing Predictive Queries including Autocomplete and inline error hints!

  • CPU requirements have been shrunk down massively, avoiding the likelihood of CPU OOMs. Reduced times for loading tables in trainer for SPCS by 30-40%.

  • SPCS deployment no longer requires Snowflake connector credentials. Kumo uses the Snowflake-provided Oauth token already available in SPCS.

  • Improved reliability for Databricks native deployments around UCV file uploads/downloads and session management.

  • Various UI improvements to support Databricks connector.

Breaking Changes

  • Batch Prediction output format transformations have been deprecated. To post-process predictions, we recommend using a distributed data processing platform like Databricks or AWS EMR Studio in your secure environment.

  • Batch Prediction data distribution drift statistics have been deprecated. We recommend using your MLOps platform to make sure that changes in the distribution of your data are intended.

v1.39 (2024-09)

New Features

  • New model plan option - sample_from_entity_table (default: True): This new option for static predictive queries allows customization of neighborhood sampling behavior. If set to False, it will disallow sampling of other entities in the entity table besides the seed entity itself. Useful in case entities represent candidates/hypothetical examples in order to restrict information flow between different candidates.

  • Support for global baselines for all problem types, which now enables generating baselines for static link and node prediction.

Improvements

  • Baseline is additionally supported for temporal multilabel classification and ranking problems and all static problems.

  • Baseline Triggering button is moved from model planner to a separate more visible button in the model planner page.

  • Speed up batch prediction table generation with a large number of entities and timeframes. For some queries, this brings down BP table generation time from 1 hour+ to under 20 minutes.

  • Connector authentication for Snowflake Native app: When Kumo runs as a Snowflake native app, users no longer need to provide their credentials when creating a Snowflake connector; Kumo uses the built-in Oauth token in SPCS to connect to the customer’s warehouse. This change also ensures that all traffic between Kumo and Snowflake happens within Snowflake’s private network and Kumo no longer requires egress rules to connect to the customer’s Snowflake account. This change does require privileges to be granted to the Kumo native app before Kumo can access any data in the customer’s Snowflake account.

v1.38 (2024-07-15)

  • Baselines now supported in SPCS.

  • Encrypted keys now supported for Snowflake connector.

  • Backend performance enhancements.

  • Various minor fixes and UI improvements.

v1.37 (2024-06-02)

  • Backend performance enhancements (SaaS).

  • Various minor fixes and UI improvements.

v1.36 (2024-05-27)

  • Baselines page now displays a warning when a feature is not available.

  • Users are now alerted if multi-class classifications only have two classes.

  • Enhancements to in-app pQuery documentation and improved tooltips.

  • Various minor fixes and UI improvements.

v1.35 (2024-05-13)

  • Kumo table and view creation now streamlined in a unified “Add Table/View” page.

  • Newly refined UI across the Kumo SaaS app.

  • Various minor fixes and UI improvements.

v1.34 (2024-04-29)

  • Multi-label ranking is now available in PQLv2.

  • Encoder use can now be specified for autoregressive labels in regression and forecasting tasks (by specifying past_encoder in the model plan).

  • Various backend performance enhancements and improvements.

  • Various minor fixes and UI improvements.

v1.33 (2024-04-11)

  • Enhanced monitoring for batch predictions to detect unusual gaps in fact tables.

  • For classification, link prediction, and regression tasks, heuristic baselines now available for comparing Kumo results to other baselines.

  • Various backend performance enhancements and improvements.

  • Various minor fixes and UI improvements.

v1.32 (2024-03-25)

  • Data distribution drift statistics now available for batch predictions.

  • Row-level explainability (XAI) metrics now available via the explorer tab.

  • Enhanced datatype changes are now available during preprocessing when creating tables.

  • When setting up dimension tables, end date can now be set up to restrict training and batch predictions to a specific timeframe.

  • Various minor fixes and UI improvements.

v1.31 (2024-03-11)

  • For ranking tasks (i.e., pqueries using LIST_DISTINCT with RANK TOP K), target item limit increased from 1M to 10M.

  • For certain types of pQueries (e.g., link prediction tasks), an Explorer section is available for evaluating predictions against historical and ground truth data.

  • Various minor fixes and UI improvements.

v1.30 (2024-02-26)

  • Improvements for supporting extensive batch prediction jobs.

  • Various minor fixes and UI improvements.

v1.29 (2024-02-15)

  • Improvements to AWS S3 connector allow for CSV/Parquet support and broader scaling (more tables) capability.

  • Various minor fixes and UI improvements.

v1.28 (2024-02-01)

  • Various backend improvements to performance during training.

  • Various minor fixes and UI improvements.

v1.27 (2024-01-15)

  • Additional features and syntax available for link prediction tasks.

  • MLOps monitoring dashboards available for batch prediction jobs.

  • Various minor fixes and UI improvements.