Every enterprise ML project follows the same pattern. A business stakeholder asks a question: "Which customers will churn next month?" A data scientist then spends 2-4 weeks building a pipeline: extracting data from multiple tables, writing joins, computing hundreds of features, selecting a model, training it, evaluating it, and deploying it. The question took 10 seconds to ask. The answer takes 10 days to build.
PQL eliminates the gap between question and answer. It is a query language that lets you describe a prediction task the same way you would describe it to a colleague: "Predict which active members will stop visiting in the next 30 days." Kumo's engine takes that description and handles everything else, from feature discovery to model training to scored output.
The headline result: SAP SALT benchmark
The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
PQL syntax: how it works
A PQL query has three parts: what to predict, who to predict for, and optionally, which entities to filter. The syntax is intentionally similar to SQL so that anyone who can write a SELECT statement can write a prediction task.
Anatomy of a PQL query
Every PQL query follows this structure:
- PREDICT — defines the target variable using an aggregation function (COUNT, SUM, AVG, MAX, MIN, LIST_DISTINCT) over a forward time window
- FOR EACH — specifies the entity to generate predictions for (e.g., each customer, each account, each article)
- WHERE (optional) — filters entities using backward time windows to focus on recently-active entities
- ASSUMING (optional) — defines a counterfactual condition for "what-if" predictions
PQL examples: 7 prediction tasks in 2-3 lines each
The following examples show real PQL queries for common enterprise prediction tasks. Each query replaces an entire ML pipeline that would otherwise require data extraction, feature engineering, model selection, and training.
1. Churn prediction
Predict which recently-active members will stop visiting in the next 30 days. The backward window (-60, 0, days) in the WHERE clause filters to members who have visited in the last 60 days, eliminating noise from already-inactive accounts.
PQL Query
PREDICT COUNT(VISITS.*, 0, 30, days) = 0 FOR EACH MEMBERS.MEMBER_ID WHERE COUNT(VISITS.*, -60, 0, days) > 0
Binary classification: predicts whether a recently-active member will have zero visits in the next 30 days. The backward window eliminates already-churned members from the prediction set, focusing model capacity on at-risk members who are still reachable.
Output
| member_id | churn_probability | last_visit | risk_tier |
|---|---|---|---|
| M-4401 | 0.87 | 3 days ago | High |
| M-4402 | 0.12 | 1 day ago | Low |
| M-4403 | 0.64 | 18 days ago | Medium |
| M-4404 | 0.93 | 42 days ago | High |
2. Fraud detection
Predict which accounts will have suspicious transaction volume exceeding $10,000 in the next 7 days. This binary classification flags accounts likely to experience high-value anomalous activity.
PQL Query
PREDICT SUM(TRANSACTIONS.AMOUNT, 0, 7, days) > 10000 FOR EACH ACCOUNTS.ACCOUNT_ID
Binary classification: predicts whether an account's total transaction amount will exceed $10,000 in the next 7 days. The model learns spending patterns across the full relational graph, including merchant types, transaction frequency, and geographic patterns.
Output
| account_id | fraud_probability | current_7d_avg | alert_level |
|---|---|---|---|
| A-7701 | 0.91 | $1,200 | Critical |
| A-7702 | 0.03 | $8,400 | Normal |
| A-7703 | 0.78 | $950 | High |
| A-7704 | 0.15 | $11,200 | Low (normal high-volume) |
3. Lead scoring
Predict which leads will convert to an order in the next 30 days. This replaces manual lead scoring models that require CRM feature engineering.
PQL Query
PREDICT COUNT(ORDERS.*, 0, 30, days) > 0 FOR EACH LEADS.LEAD_ID
Binary classification: predicts whether a lead will generate at least one order in the next 30 days. The model reads across leads, contacts, activities, and historical opportunities to discover buying signals that manual scoring misses.
Output
| lead_id | conversion_probability | lead_source | priority |
|---|---|---|---|
| L-2201 | 0.84 | Webinar | Hot |
| L-2202 | 0.31 | Cold email | Warm |
| L-2203 | 0.07 | Trade show | Cold |
| L-2204 | 0.92 | Inbound demo | Hot |
4. Demand forecasting
Predict the total quantity sold for each article over the next 3 months. This is a regression task, not binary classification, because the target is a continuous value.
PQL Query
PREDICT SUM(TRANSACTIONS.QUANTITY, 0, 3, months) FOR EACH ARTICLES.ARTICLE_ID
Regression: predicts the total units sold for each article in the next 3 months. The model learns seasonal patterns, cross-product cannibalization, and supply chain signals from the full relational structure.
Output
| article_id | predicted_quantity | current_quarterly_avg | trend |
|---|---|---|---|
| SKU-001 | 4,200 | 3,800 | Up 10.5% |
| SKU-002 | 890 | 1,200 | Down 25.8% |
| SKU-003 | 12,400 | 11,900 | Up 4.2% |
| SKU-004 | 150 | 600 | Down 75.0% |
5. Product recommendations
Predict the top 5 products each customer is most likely to order in the next 30 days. LIST_DISTINCT returns a ranked list of distinct values, and RANK TOP N limits the output to the top candidates.
PQL Query
PREDICT LIST_DISTINCT(ORDERS.PRODUCT_ID, 0, 30, days) RANK TOP 5 FOR EACH CUSTOMERS.CUSTOMER_ID
Ranked recommendation: predicts the top 5 most likely products for each customer to order in the next 30 days. The model learns from purchase history, product co-occurrence, customer similarity, and temporal patterns across the full product catalog.
Output
| customer_id | rank_1 | rank_2 | rank_3 | rank_4 | rank_5 |
|---|---|---|---|---|---|
| C-101 | P-44 (0.89) | P-12 (0.74) | P-87 (0.61) | P-03 (0.55) | P-91 (0.42) |
| C-102 | P-22 (0.93) | P-44 (0.81) | P-56 (0.67) | P-78 (0.52) | P-33 (0.41) |
| C-103 | P-07 (0.76) | P-19 (0.71) | P-44 (0.58) | P-62 (0.44) | P-15 (0.38) |
6. Customer lifetime value (LTV)
Predict the total revenue from each customer over the next 365 days. This regression task drives budget allocation for acquisition and retention spend.
PQL Query
PREDICT SUM(ORDERS.AMOUNT, 0, 365, days) FOR EACH CUSTOMERS.CUSTOMER_ID
Regression: predicts the total order revenue per customer over the next year. The model captures purchase frequency trends, average order value trajectories, product mix evolution, and cross-sell patterns across the full customer-order-product graph.
Output
| customer_id | predicted_ltv | historical_ltv | segment |
|---|---|---|---|
| C-101 | $14,200 | $11,800 | High-value growth |
| C-102 | $2,100 | $4,500 | Declining |
| C-103 | $8,700 | $8,200 | Stable |
| C-104 | $22,400 | $19,100 | High-value growth |
7. Counterfactual prediction (ASSUMING clause)
Predict whether a user will make a purchase in the next 4 days, assuming they receive a push notification today. The ASSUMING clause is what makes this query fundamentally different from all the others. It enables counterfactual reasoning: predicting outcomes under hypothetical interventions.
PQL Query
PREDICT COUNT(PURCHASES.*, 1, 4, days) > 0 FOR EACH USERS.USER_ID ASSUMING COUNT(NOTIFICATIONS.* WHERE NOTIFICATIONS.TYPE = 'PUSH', 0, 1, days) > 0
Counterfactual binary classification: predicts purchase probability assuming a push notification is sent, even for users who have never received one. This enables causal uplift modeling without holdout experiments. Compare the 'with notification' prediction to a baseline query without the ASSUMING clause to estimate the incremental lift of the intervention.
Output
| user_id | prob_with_push | prob_without_push | incremental_lift |
|---|---|---|---|
| U-501 | 0.72 | 0.31 | +41 points |
| U-502 | 0.18 | 0.16 | +2 points (low lift) |
| U-503 | 0.85 | 0.44 | +41 points |
| U-504 | 0.09 | 0.08 | +1 point (no effect) |
PQL vs Python: what PQL replaces
To understand why PQL matters, you need to see what the equivalent Python pipeline looks like. A single PQL query replaces 6 distinct pipeline stages, each requiring specialized code.
pql_vs_python_pipeline
| pipeline_stage | PQL | Python_equivalent | time_cost |
|---|---|---|---|
| Define prediction target | PREDICT clause (1 line) | Target variable logic + label extraction (15-30 lines) | 10 min vs seconds |
| Data extraction & joins | Handled automatically | SQL queries + pandas merges across 3-8 tables (40-80 lines) | 2-4 hours vs 0 |
| Feature computation | Handled automatically | Aggregations, time windows, encodings (100-200 lines) | 4-6 hours vs 0 |
| Feature selection | Handled automatically | Correlation analysis, importance ranking (30-50 lines) | 1-2 hours vs 0 |
| Model selection & training | Handled automatically | Try XGBoost, LightGBM, neural nets, tune hyperparameters (50-80 lines) | 2-3 hours vs 0 |
| Evaluation & deployment | Handled automatically | Cross-validation, metrics, model serialization (40-60 lines) | 1-2 hours vs 0 |
| Total | 2-3 lines of PQL | 275-500 lines of Python | 12+ hours vs minutes |
Highlighted: total effort comparison. PQL collapses the entire pipeline into a declarative query. The reduction is not about writing less code. It is about eliminating 6 pipeline stages that each require specialized expertise.
Traditional Python pipeline
- 275-500 lines of code across 6 stages
- Requires SQL, pandas, scikit-learn, XGBoost expertise
- 12+ hours of data scientist time per prediction task
- Manual feature engineering limits accuracy to available features
- Each new prediction task requires a new pipeline
- Counterfactual predictions require holdout experiments
PQL query
- 2-3 lines of declarative syntax
- Requires only SQL-level knowledge
- Minutes from query to scored predictions
- Automatic feature discovery across full relational graph
- Each new task is a new query, not a new pipeline
- Counterfactual predictions via ASSUMING clause
PQL's backward window: eliminating noise from dead accounts
One of PQL's most powerful features is the backward time window in the WHERE clause. This filters the entity set before prediction, ensuring you only predict for entities that are relevant and actionable.
Consider the churn prediction example:
WHERE COUNT(VISITS.*, -60, 0, days) > 0
This backward window (-60, 0, days) means "only include members who had at least one visit in the last 60 days." Without this filter, your churn model would waste capacity predicting that already-inactive accounts will remain inactive, which is trivially true and operationally useless.
In traditional ML, this filtering requires a separate data preprocessing step: query the database for recently-active entities, build the feature table only for those entities, then train the model. In PQL, it is one line in the query. The engine handles the filtering automatically and ensures that the training data, features, and predictions are all scoped to the right entity set.
Supported aggregations and task types
PQL supports a range of aggregation functions, each mapping to a specific type of prediction task.
pql_aggregations_and_task_types
| aggregation | example_usage | task_type | typical_use_case |
|---|---|---|---|
| COUNT(...) = 0 | COUNT(VISITS.*, 0, 30, days) = 0 | Binary classification | Churn, inactivity |
| COUNT(...) > 0 | COUNT(ORDERS.*, 0, 30, days) > 0 | Binary classification | Conversion, lead scoring |
| SUM(...) > N | SUM(TXN.AMOUNT, 0, 7, days) > 10000 | Binary classification | Fraud, anomaly detection |
| SUM(...) | SUM(ORDERS.AMOUNT, 0, 365, days) | Regression | LTV, revenue forecasting |
| AVG(...) | AVG(RATINGS.SCORE, 0, 90, days) | Regression | Satisfaction prediction |
| COUNT(...) | COUNT(TICKETS.*, 0, 30, days) | Regression | Support volume forecasting |
| MAX(...) | MAX(TRANSACTIONS.AMOUNT, 0, 30, days) | Regression | Peak transaction prediction |
| MIN(...) | MIN(RESPONSE_TIME.SECONDS, 0, 7, days) | Regression | SLA prediction |
| LIST_DISTINCT(...) RANK TOP N | LIST_DISTINCT(ORDERS.PRODUCT_ID, 0, 30, days) RANK TOP 5 | Ranked recommendation | Product, content recommendations |
PQL aggregation functions map directly to ML task types. Adding a comparison operator (= 0, > N) converts a regression target into a binary classification target. LIST_DISTINCT with RANK TOP N produces ranked recommendation lists.
Why PQL changes who can do ML
The traditional ML pipeline requires at least three skill sets: SQL for data extraction, Python/pandas for feature engineering, and ML framework expertise for modeling. Most organizations have far more SQL-proficient analysts than ML engineers.
PQL reduces the required skill set to SQL-level knowledge. A data analyst who writes SQL queries daily can write PQL queries and produce predictions that match or exceed what a dedicated ML team builds in weeks. This is not about simplifying ML. It is about making the prediction itself the interface, rather than the pipeline.
who_can_use_pql
| role | can_write_SQL | can_build_ML_pipeline | can_write_PQL | time_to_prediction |
|---|---|---|---|---|
| Data analyst | Yes | No | Yes | Minutes |
| Business analyst | Yes | No | Yes | Minutes |
| Data engineer | Yes | Sometimes | Yes | Minutes |
| Data scientist | Yes | Yes | Yes | Minutes (vs weeks) |
| ML engineer | Yes | Yes | Yes | Minutes (vs weeks) |
Highlighted: data analysts and business analysts can use PQL directly. They represent the largest pool of data-literate professionals in most enterprises but have been locked out of ML by the Python pipeline requirement.
PQL in practice: from question to prediction
A typical PQL workflow has three steps:
- Connect your database. Point Kumo at your relational database (or data warehouse tables). Kumo reads the schema, foreign keys, and data types automatically.
- Write the PQL query. Define what you want to predict, for which entities, with optional filters and counterfactual conditions.
- Get scored predictions. Kumo's engine discovers features across the full relational graph, trains a model, and returns scored predictions for every entity in the query scope.
There is no feature engineering step. There is no model selection step. There is no training configuration to specify. The PQL query is the complete task specification, and the engine handles everything else.