KumoRFM, including run modes, temporal behavior, inference behavior, batch prediction, and retry handling.
Run Modes
Therun_mode parameter controls the trade-off between prediction quality and speed by adjusting how much context data is sampled.
| Run Mode | Context Size | Neighbor Sampling | Use Case |
|---|---|---|---|
DEBUG | 100 | [16, 16, 4, 4, 1, 1] | Quick iteration, testing queries |
FAST | 1,000 | [32, 32, 8, 8, 4, 4] | Default. Good balance of speed and quality |
NORMAL | 5,000 | [64, 64, 8, 8, 4, 4] | Higher quality predictions |
BEST | 10,000 | [64, 64, 8, 8, 4, 4] | Maximum quality |
Temporal and Context Timing
Use these parameters inKumoRFM.predict() and KumoRFM.evaluate() when you need to control the prediction timestamp or the historical examples used as model context.
| Option | Default | Description |
|---|---|---|
anchor_time | None | The anchor timestamp for the prediction. If set to None, KumoRFM uses the maximum timestamp in the data. If set to "entity", KumoRFM uses each entity’s own timestamp. |
context_anchor_time | None | The maximum anchor timestamp for context examples. If set to None, anchor_time determines the anchor time for context examples. |
use_prediction_time | False | Whether to use the anchor timestamp as an additional feature during prediction. KumoRFM enforces this automatically for time series forecasting tasks. |
lag_timesteps | 0 | Number of past timesteps to include as lagged target features for temporal predictive queries. |
anchor_time when you want to predict as of a specific point in time:
context_anchor_time when the prediction date and the latest available context data should differ:
lag_timesteps when recent historical target values should be available to the model as additional context. For example, lag_timesteps=3 adds the previous three target windows as lagged features:
Inference Configuration
Theinference_config parameter controls inference-time model behavior, including ensembling. You can pass either a dictionary or a configuration object from kumoapi.rfm.
When you pass a dictionary, KumoRFM casts it based on the task type:
- Classification tasks use
ClassificationInferenceConfig. - Regression and forecasting tasks use
RegressionInferenceConfig.
inference_config, KumoRFM selects defaults automatically based on the task type.
Common options:
| Option | Description |
|---|---|
num_estimators | Number of estimators to ensemble. Defaults to 1 and must be between 1 and 4. |
column_shuffle | Whether to shuffle column order across estimators. |
category_shuffle | Whether to shuffle categories within categorical columns across estimators. |
hop_shuffle | Whether to shuffle subgraph depth across estimators. |
| Option | Description |
|---|---|
class_shuffle | Whether to shuffle class order across estimators. |
| Option | Description |
|---|---|
target_transforms | Target preprocessing transforms to vary across estimators. Supported values are "clip", "power", "quantile", and None. Defaults to ["quantile"]. |
output_type | How to summarize the output distribution. Supported values are "median", "mean", and "quantiles". Defaults to "median". |
output_type="quantiles", the prediction output contains 27 quantile columns instead of a single TARGET_PRED column:
Output and Collection Controls
These options control whatpredict() returns and how KumoRFM collects valid context labels.
| Option | Default | Description |
|---|---|---|
return_embeddings | False | Whether to include embeddings for each prediction example in the output DataFrame. |
explain | False | Whether to return an Explanation object instead of a plain prediction DataFrame. Explainability currently supports single-entity predictions with run_mode="FAST". See Prediction Explainability. |
max_pq_iterations | 10 | Maximum number of iterations used to collect valid labels. Increase this when a predictive query has strict entity filters and KumoRFM needs to sample more entities to find enough valid labels. |
random_seed | fixed seed | Manual seed for pseudo-random sampling. |
verbose | True | Whether to print progress output during prediction or evaluation. |
Batch Mode
For predictions over many entities, useKumoRFM.batch_mode() to automatically split the workload into batches:
batch_size: The number of entities per batch. Set to"max"(default) to use the maximum applicable batch size for the task type.num_retries: Number of retries for failed batches due to server issues.
| Task Type | Max Prediction Size | Max Test Size |
|---|---|---|
| Classification / Regression / Forecasting | 1,000 | 2,000 |
Retry
UseKumoRFM.retry() to automatically retry failed queries due to transient server issues:
Size Limits
KumoRFM enforces a 30 MB context size limit per prediction. If exceeded, you will see an error message suggesting:- Reducing the number of tables in the graph
- Reducing the number of columns (e.g., large text columns)
- Adjusting the neighborhood configuration
- Using a lower run mode
optimize parameter in KumoRFM can help with database backends by creating indices for faster sampling: