Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

A Kumo PredictiveQuery is a declarative syntax for describing a machine learning task. Predictive queries generate training and prediction tables which, together with a Graph, can be used to fit or predict a model.

Enums

RunMode

Defines the training budget for AutoML.
ValueDescription
FASTSpeeds up the search process — approximately 4× faster than NORMAL.
NORMALThe default mode.
BESTApproximately 4× more thorough than NORMAL.

Predictive Query

PredictiveQuery

Defines a machine learning task using PQL (Predictive Query Language), a concise SQL-like syntax. For details on writing PQL, see the Predictive Query guide.
from kumoai.pquery import PredictiveQuery

pq = PredictiveQuery(
    graph=graph,
    query="RANK BY COUNT(orders.*, 0, 30, days) FOR EACH users.user_id",
)
graph
Graph
required
The Graph this predictive query is defined over.
query
str
required
The PQL query string.

id property

Returns str — The unique ID for this predictive query.

train_table property

Returns Union[TrainingTable, TrainingTableJob] — The training table most recently generated by this query.

prediction_table property

Returns Union[PredictionTable, PredictionTableJob] — The prediction table most recently generated by this query.

get_task_type()

Returns TaskType — The detected task type (classification, regression, ranking, etc.).

validate()

Validates the PQL syntax of this query.
verbose
bool
default:"True"
Whether to print validation output.
Returns PredictiveQuery

suggest_training_table_plan()

Generates a recommended TrainingTableGenerationPlan for this query. Returns TrainingTableGenerationPlan

generate_training_table()

Generates a training table from this predictive query.
plan
TrainingTableGenerationPlan
required
The plan specifying time windows, splits, and other generation parameters.
non_blocking
bool
default:"False"
If True, returns a TrainingTableJob immediately rather than blocking.
Returns Union[TrainingTable, TrainingTableJob]

suggest_prediction_table_plan()

Generates a recommended PredictionTableGenerationPlan. Returns PredictionTableGenerationPlan

generate_prediction_table()

Generates a prediction table from this predictive query.
plan
PredictionTableGenerationPlan
required
The plan specifying the anchor time and other generation parameters.
non_blocking
bool
default:"False"
If True, returns a PredictionTableJob immediately rather than blocking.
Returns Union[PredictionTable, PredictionTableJob]

suggest_model_plan()

Generates a recommended ModelPlan for this query. Returns ModelPlan

suggest_distilled_model_plan()

Generates a recommended DistilledModelPlan for online serving distillation.
base_model_id
str
required
The training job ID of the base GNN model to distill from.
run_mode
RunMode
default:"RunMode.NORMAL"
The AutoML run mode.
Returns DistilledModelPlan

fit()

Trains a model on this predictive query using the auto-suggested plans.
non_blocking
bool
default:"False"
If True, returns a TrainingJob immediately rather than blocking.
Returns Union[TrainingJobResult, TrainingJob]

generate_baseline()

Generates baseline metrics for comparison.
metrics
List[str]
required
The metrics to compute for the baseline.
train_table
Union[TrainingTable, TrainingTableJob]
required
The training table to use.
Returns Union[BaselineJob, BaselineJobResult]

save()

Saves this predictive query to Kumo.
name
str
default:"None"
Optional name for the saved query.
Returns PredictiveQueryID

load() classmethod

Loads a predictive query from its ID or a named template.
pq_id_or_template
str
required
The predictive query ID or template name.
Returns PredictiveQuery

load_from_training_job() classmethod

Loads the predictive query associated with an existing training job.
training_job_id
str
required
The training job ID.
Returns PredictiveQuery

TrainingTableGenerationPlan

Configuration for training table generation. Specifies time windows, train/validation/test splits, and other generation parameters. Obtain a recommended plan via PredictiveQuery.suggest_training_table_plan().

PredictionTableGenerationPlan

Configuration for prediction table generation. Specifies the anchor time and other parameters. Obtain a recommended plan via PredictiveQuery.suggest_prediction_table_plan().

Training Table

TrainingTable

A training dataset generated from a PredictiveQuery. Can be initialized from the job ID of a completed training table generation job.
train_table = pq.generate_training_table(plan=plan)
df = train_table.data_df()
job_id
GenerateTrainTableJobID
required
The ID of the completed training table generation job.

data_df()

Returns pd.DataFrame — The generated training data.

data_urls()

Returns List[str] — Download URLs for the training table data.

export()

Exports the training table to an external connector.
output_config
TrainingTableExportConfig
required
The output destination configuration.
non_blocking
bool
default:"True"
If True, returns an ArtifactExportJob immediately.
Returns Union[ArtifactExportJob, ArtifactExportResult]

TrainingTableJob

Represents an ongoing training table generation job.

result()

Blocks until complete and returns the TrainingTable. Returns TrainingTable

status()

Returns JobStatusReport — Current job status.

cancel()

Cancels the training table generation job.

Prediction Table

PredictionTable

A prediction dataset generated from a PredictiveQuery. Can be initialized from a job ID or a custom data path on supported object storage.
pred_table = pq.generate_prediction_table(plan=plan)
df = pred_table.data_df()
job_id
GeneratePredictionTableJobID
default:"None"
The ID of the completed prediction table generation job. Leave None when using table_data_path.
table_data_path
str
default:"None"
Path to custom prediction table data on S3 (s3://...) or a Databricks UC Volume (dbfs:/Volumes/...). Leave None when using job_id.

anchor_time property

Returns Optional[datetime] — The anchor time for the generated prediction table, or None for custom-specified data.

data_df()

Returns pd.DataFrame — The prediction table data.

data_urls()

Returns List[str] — Download URLs for the prediction table data.

PredictionTableJob

Represents an ongoing prediction table generation job.

id property

Returns GeneratePredictionTableJobID — The unique job ID.

result()

Blocks until complete and returns the PredictionTable. Returns PredictionTable

status()

Returns JobStatusReport

cancel()

Cancels the prediction table generation job.

future()

Returns Future[PredictionTable] — The underlying future object.

load_config()

Returns GeneratePredictionTableRequest — The full configuration for this job.