PQL (Predictive Query Language) is a SQL-like syntax for expressing ML prediction tasks on relational databases. Instead of writing feature engineering pipelines, training code, and model selection logic, you write a single PQL statement that specifies WHAT you want to predict, for WHICH entities, and optionally WHEN and WHERE. The underlying system handles feature extraction (via graph construction), model selection, training, and inference automatically.

How does PQL relate to SQL?

SQL retrieves data that exists: SELECT amount FROM orders WHERE customer_id = 42. PQL predicts data that does not yet exist: PREDICT churn FOR customer WHERE customer_id = 42. SQL operates on rows and columns. PQL operates on the relational graph derived from the database schema. The syntax is deliberately SQL-like so that anyone who knows SQL can express prediction tasks.

What prediction tasks can PQL express?

PQL can express any prediction task on relational data: classification (PREDICT churn FOR customers), regression (PREDICT lifetime_value FOR customers), ranking (PREDICT next_purchase FOR customers RANKED BY products), link prediction (PREDICT will_connect FOR customer, product PAIRS), and time-series forecasting (PREDICT demand FOR products AT next_30_days).

What happens under the hood when you run a PQL query?

The system: (1) parses the PQL statement to identify the target, entity, and constraints, (2) constructs the temporal heterogeneous graph from the database schema, (3) selects or adapts a pre-trained model (KumoRFM), (4) performs schema-agnostic encoding of all table columns, (5) runs GNN message passing with temporal filtering, (6) generates predictions for the specified entities. Steps 2-6 are fully automatic.

Predictive Query Language (PQL): SQL-Like Syntax for Prediction Tasks | Kumo.ai

PQL (Predictive Query Language) lets you express ML prediction tasks in SQL-like syntax. Instead of spending months building feature engineering pipelines, selecting models, writing training loops, and deploying inference systems, you write one statement that declares what you want to predict. The underlying system constructs the graph from your database schema, encodes all columns, runs the GNN, and returns predictions.

SQL retrieves. PQL predicts.

The analogy is precise:

SQL: SELECT amount FROM orders WHERE customer_id = 42 retrieves data that exists in the database.
PQL: PREDICT churn FOR customer WHERE customer_id = 42 generates a prediction about data that does not yet exist (will this customer churn?).

Both operate on relational databases. SQL is the interface for data retrieval. PQL is the interface for prediction. Anyone who knows SQL can express prediction tasks in PQL.

PQL syntax

pql_examples.pql

-- Classification: will customers churn?
PREDICT churn
FOR customers
WHERE signup_date < '2024-01-01'

-- Regression: what is the expected lifetime value?
PREDICT lifetime_value
FOR customers

-- Ranking: what will each customer buy next?
PREDICT next_purchase
FOR customers
RANKED BY products

-- Link prediction: which customers will buy which products?
PREDICT will_purchase
FOR customer, product PAIRS

-- Time-series: forecast demand for each product
PREDICT demand
FOR products
AT next_30_days

Each PQL statement specifies WHAT to predict (target), WHO to predict for (entity), and optional constraints (WHERE, RANKED BY, AT). The system handles the rest.

What happens under the hood

When you execute a PQL query, the system performs five steps automatically:

Parse: extract the prediction target (churn), entity type (customers), and constraints (signup_date filter).
Graph construction: read the database schema, build the temporal heterogeneous graph. Tables become node types, foreign keys become edges, timestamps enable temporal filtering.
Encoding: schema-agnostic encoding converts all column values to the universal representation format.
Inference: the pre-trained model (KumoRFM) runs message passing on the graph, propagating information across tables to build rich entity embeddings.
Prediction: the model generates predictions for the specified entities and returns results in a familiar tabular format.

Prediction types

Classification

Binary or multi-class prediction for entities. Will this customer churn? Is this transaction fraudulent? What risk tier does this loan fall into? PQL returns a probability score per entity.

Regression

Numerical prediction for entities. What is this customer's expected lifetime value? What will next month's revenue be? PQL returns a numerical estimate.

Ranking

Score entities against candidates. What products should we recommend to this customer? PQL returns a ranked list of candidates per entity.

Link prediction

Predict connections between entities. Which customer-product pairs will result in purchases? This powers recommendation and matchmaking.

Why PQL matters for enterprises

The bottleneck in enterprise ML is not model accuracy. It is time-to-production. A typical enterprise ML project takes 6-12 months from business question to deployed model. Most of that time is spent on data preparation and feature engineering.

PQL compresses this to hours. The business analyst identifies the prediction task (“predict churn for customers”), writes it in PQL, and gets predictions. The graph construction, feature extraction, and model inference happen automatically.

Traditional pipeline: 6-12 months (schema study + feature engineering + model training + deployment)
PQL pipeline: hours to days (write PQL + validate predictions)

Key Takeaways

1PQL is a SQL-like syntax for prediction tasks. PREDICT churn FOR customers replaces months of feature engineering, model selection, and training pipeline development.
2PQL supports classification, regression, ranking, link prediction, and time-series forecasting. Any prediction on relational data fits the PQL syntax.
3Under the hood, PQL triggers automatic graph construction, schema-agnostic encoding, and GNN inference. The entire ML pipeline is hidden behind a familiar SQL-like interface.
4PQL makes relational deep learning accessible to SQL-literate analysts. No Python, no PyTorch, no data science expertise required for generating enterprise predictions.
5The enterprise value is time-to-production: traditional ML takes 6-12 months per prediction task. PQL compresses this to hours by automating the feature engineering and model training pipeline.

Predictive Query Language: SQL-Like Syntax for Expressing Prediction Tasks