PQL (Predictive Query Language) lets you express ML prediction tasks in SQL-like syntax. Instead of spending months building feature engineering pipelines, selecting models, writing training loops, and deploying inference systems, you write one statement that declares what you want to predict. The underlying system constructs the graph from your database schema, encodes all columns, runs the GNN, and returns predictions.
SQL retrieves. PQL predicts.
The analogy is precise:
- SQL:
SELECT amount FROM orders WHERE customer_id = 42retrieves data that exists in the database. - PQL:
PREDICT churn FOR customer WHERE customer_id = 42generates a prediction about data that does not yet exist (will this customer churn?).
Both operate on relational databases. SQL is the interface for data retrieval. PQL is the interface for prediction. Anyone who knows SQL can express prediction tasks in PQL.
PQL syntax
-- Classification: will customers churn?
PREDICT churn
FOR customers
WHERE signup_date < '2024-01-01'
-- Regression: what is the expected lifetime value?
PREDICT lifetime_value
FOR customers
-- Ranking: what will each customer buy next?
PREDICT next_purchase
FOR customers
RANKED BY products
-- Link prediction: which customers will buy which products?
PREDICT will_purchase
FOR customer, product PAIRS
-- Time-series: forecast demand for each product
PREDICT demand
FOR products
AT next_30_daysEach PQL statement specifies WHAT to predict (target), WHO to predict for (entity), and optional constraints (WHERE, RANKED BY, AT). The system handles the rest.
What happens under the hood
When you execute a PQL query, the system performs five steps automatically:
- Parse: extract the prediction target (churn), entity type (customers), and constraints (signup_date filter).
- Graph construction: read the database schema, build the temporal heterogeneous graph. Tables become node types, foreign keys become edges, timestamps enable temporal filtering.
- Encoding: schema-agnostic encoding converts all column values to the universal representation format.
- Inference: the pre-trained model (KumoRFM) runs message passing on the graph, propagating information across tables to build rich entity embeddings.
- Prediction: the model generates predictions for the specified entities and returns results in a familiar tabular format.
Prediction types
Classification
Binary or multi-class prediction for entities. Will this customer churn? Is this transaction fraudulent? What risk tier does this loan fall into? PQL returns a probability score per entity.
Regression
Numerical prediction for entities. What is this customer's expected lifetime value? What will next month's revenue be? PQL returns a numerical estimate.
Ranking
Score entities against candidates. What products should we recommend to this customer? PQL returns a ranked list of candidates per entity.
Link prediction
Predict connections between entities. Which customer-product pairs will result in purchases? This powers recommendation and matchmaking.
Why PQL matters for enterprises
The bottleneck in enterprise ML is not model accuracy. It is time-to-production. A typical enterprise ML project takes 6-12 months from business question to deployed model. Most of that time is spent on data preparation and feature engineering.
PQL compresses this to hours. The business analyst identifies the prediction task (“predict churn for customers”), writes it in PQL, and gets predictions. The graph construction, feature extraction, and model inference happen automatically.
- Traditional pipeline: 6-12 months (schema study + feature engineering + model training + deployment)
- PQL pipeline: hours to days (write PQL + validate predictions)