How Predictive Query Training Works
During the training process, Kumo creates a table of historical data slices to use as examples, each specifying a historical context (e.g., all historic data relevant to customer A, up to July 3, 2018) and target (i.e., customer A will spend $30 in the next 2 months). These training tables are materialized one timeframe at a time, starting with the most recent examples.Training and Validation
Kumo starts the training process by partitioning your historical training examples into three sets:- Holdout Data Split: The most recent timeframe(s) of training examples, used for evaluating the model on how well it generalizes to future unseen data, and entirely kept out of the model training process.
- Validation Data Split: The second-to-most recent timeframe(s) of training examples, used during the neural architecture search experimentation process for determining which model coming out of the experimentation process is best for promoting to an evaluation on the holdout data split.
- Training Data Split: All remaining earlier timeframe(s) of training examples, used for training each of the models created during the experimentation process.
Model Planning
After writing your predictive query, the next step is to configure/confirm your model plan. Under “Run Mode”, you can set the run mode for your model plan. Select the run mode that best suits your particular scenario:- Normal: Default value.
- Fast: Speeds up the search process—typically about 4x faster than using the normal mode.
- Best: Typically takes 4x the time used by the normal mode.

Keep in mind that there is a trade-off between search time and optimal search results.



Training Your Predictive Query
Once you click the Start Training button, Kumo immediately launches a training job that finds the optimal set of ML parameters for your pQuery. Depending on the size of your graph (i.e., the combined size of its underlying tables), this job usually takes between 1 and 10 hours. You can quickly check the status of your training job by click on the relevant training job under theModels tab.
Limiting Your Training Window
In some cases, you may want to limit your training window—for example, upon inspecting the time ranges in your data, you may notice that your dataset contains multiple years of data. This may result in prolonged target generation times due to shifting target distributions over time. To mitigate this, you can usetrain_start_offset
model planner training parameter to defines the numerical offset from the most recent entry to use to generate training data labels, and train_end_offset
to define the numerical offset from the most recent entry to not use to generate training data labels. These model planner training parameters will effectively allow you to limit your learning interval and what labels are generated.
For example, we may want to only use training examples for customers that churned in the last year, but those customers may have 10 years of data that we will use for training the model:
**NOTE: **
train_start_offset
**and **train_end_offset
only apply to temporal queries, like those that use a temporal aggregation like SUM().
To learn more, please refer to train_start_offset and train_end_offset in the PQuery Reference.
Analyzing Your Training Results
Kumo provides a full suite of tools and metrics for understanding how your training results are generated. To access a predictive query’s experiment monitoring metrics, click on the Training job in the **Models **tab to view the results of the neural architecture search experiments.

Experiment Monitoring Metrics
During the training process, Kumo automatically defines a search space of potential graph neural network (GNN) model architectures and hyperparameters, followed by an intelligent selection of a subset of specific architecture and hyperparameter configurations to run experiments with.
Note: predictive query training sessions in progress may not display all experiment monitoring metrics.
Training Data Statistics
The details of the training, validation, test data are present under the TRAINING TABLE GENERATION tab under related jobs when a specific training job is opened.
