06/27/2023
Bringing Predictive AI to the Enterprise Data with Snowpark Container Servicess
Author: Ivaylo Bahtchevanov
We’re excited to announce our partnership with Snowflake, the Data Cloud, with the common goal to democratize machine learning (ML) in the enterprise. Our integration with Snowpark Container Services (in private preview) enables anyone to bring state-of-the-art deep learning directly within their Snowflake account where the governed data lives.
This partnership will drastically accelerate the ability for developers, ML practitioners, analysts, and business owners to power their applications with best-in-class machine learning capabilities.
Integrating Kumo’s ML with Snowpark Container Services
Now you can combine the capabilities of the Kumo platform with security and reliability of Snowflake to automate and supercharge the entire ML lifecycle. No more pipeline building, feature engineering, or training bottlenecks.
Using Snowpark Container Services, which enables developers to deploy container images in Snowflake, Snowflake customers can run Kumo platform all within their account, eliminating the need to move data across platforms or copy data outside of the Snowflake governance boundary. From within the Snowpark Container Services, Kumo can operate directly on raw Snowflake tables, generate predictions at scale, then store them in Snowflake as additional tables.
It offers developers deployment flexibility while specifying compute and storage parameters. With Snowflake’s access management capabilities, the same platform can enable many stakeholders simultaneously.
Why It Matters
Kumo presents an entirely new approach to performing machine learning at scale, one that drastically simplifies the end-to process and accelerates time-to-value.
The representational learning approaches that eliminated the need for extensive feature engineering and training set generation in computer vision and natural language are now available for relational data. Kumo works directly on raw enterprise relational data and does not require ML pipelines to generate a training set before learning.
In short, Kumo can operate directly on your raw data, and by integrating Snowflake and Kumo, enterprises can query the future as easily as they have been querying the past using SQL.
Let’s discuss how this new approach is different from the traditional way of performing machine learning.
Machine Learning – the Old Way
Today, enterprises dedicate millions of dollars in resources to store terabytes of data and build predictive analytics to power business decisions. Many tools, platforms, and libraries exist to perform machine learning—but traditional ML approaches have significant drawbacks and bottlenecks that make the end-to-end process very long, complex, and brittle.
Feature Engineering is expensive and error-prone
The feature engineering process is often manual and time consuming—plus, it can introduce bias. What’s more, for each new machine learning task, the data team performs significant repetitive work and builds a separate dedicated pipeline.
Traditional models intake fixed input, which requires the data to be preprocessed and stored in tabular form. This transformation enforces dimensional limitations on the representation of the data, discarding most of the structural and contextual information and the complex interactions between entities.
Models are rigid, with limited adaptability to new predictions or use cases
Each new use case or prediction requires an entirely new data structure, pipeline, and model to power the prediction. Creating new data pipelines, setting up the infrastructure, and building new models every time you have a new problem to solve is expensive and time consuming. This limits the number of experiments a given company can perform, and forces teams to adopt more generic models over highly personalized ones.
Operational complexity of the machine learning lifecycle
Models become more complex and expensive over time. People add features and seldomly remove old ones, creating significant bloat and increasing the requirements to maintain pipelines. When a data scientist leaves, they often take the knowledge with them, resulting in abandoned pipelines. Moreover, complex deployment involves duck-taping multiple tools across the lifecycle for feature engineering, feature stores, retraining orchestration pipelines, monitoring, experimentation, and other production tooling.
Machine Learning the New Way – the Kumo Approach
Kumo enables a new world where building ML is entirely about defining the task by formulating the problem statement, and Kumo takes care of the rest.
What enables this new world? Snowflake tables are connected inherently by interactions and relationships within entities—so the enterprise data in Snowflake is best represented as a graph.
Using graph representational learning, Kumo is able to operate directly on raw data, abstracting away the feature engineering and pipeline creation process. Kumo leverages the natural relational structure of the data to maximize signal to improve accuracy and performance.
You can think of Kumo as a transformer for your enterprise relational data that learns and makes predictions off of the entire graph at once.
Ensure scalability and performance for terabytes of data in Snowflake.
So what does this mean for your business?
- Deliver More Models: Build your graph once by connecting your Snowflake tables, then use it to generate any number of predictions for any number of use cases.
- Better Performance: Kumo leverages the latest approaches and identifies the best model and parameters for your specific problem and corresponding graph. Kumo can also power existing models by feeding trained embeddings directly into established models to improve accuracy.
- Cheaper and Easier: Since Kumo operates directly on your raw tables, you simplify infrastructure and optimize costs – no need for ML pipelines, feature engineering, feature stores, and production tooling.
- Scalable and Performant: Kumo can rapidly train and predict on graphs at massive scale, up to tens of terabytes of data.
- Turn-Key: Kumo is a single platform that can manage your entire ML lifecycle, with the fastest time-to-ROI and payback.
Specific Applications of Kumo
Kumo is currently used by enterprises to power their growth and GTM use cases. Build your graph once, and immediately query the future and generate highly accurate predictions for any of the following common applications:
- Optimizing customer loyalty and retention
- Personalizing experiences for users and recommending relevant product and content
- Powering cross-selling and up-selling strategies
- Predicting future purchases/activities and identifying potential high value customers
- Optimizing customer outreach and notifications strategies
- Resolving entity resolution for search and retrieval
- Detecting fraud and abuse
Who Benefits
- Data Scientists & ML Engineers: Deploy more models quickly, improve performance with minimal time-to-ROI, power existing models.
- App Developers: Use ready-made predictions in workflows without building ML pipelines.
- Analysts: Access out-of-the-box predictions on relational tables.
- Line-of-Business Owners: Understand customers better for growth and go-to-market strategies.
How to Get Started
To learn more about using Kumo in Snowflake, watch our Youtube instructional video on running an end-to-end machine learning process on your data.
Contact us at https://kumo.ai/apply to get started!