PREDICT <Aggregation_Function>(<target_table>.<column_name>, <start>, <end>) <comparison_operator> <constant>
(Required)
Description
A pQuery starts with the PREDICT clause that defines a required specification of a “target” which represents the value you want to predict for an entity described in FOR EACH clause. If you want to predict some aggregation of values over time (e.g. sum of purchase values over the next 30 days), then you must choose the proper aggregation formula and the column representing the value you want to aggregate. This column must come from a fact table that includes a foreign key column to the entity ID specified in FOR EACH clause.Example 1
The syntax (and some examples) for this part of the query is as follows:Example 2
The syntax (and some examples) for this part of the query is as follows:- Sum (numerical columns):
SUM
- Average (numerical columns):
AVG
- Maximum (numerical columns):
MAX
- Minimum (numerical columns):
MIN
- Count (all columns):
COUNT
- Cardinality/count of distinct values (all columns):
COUNT_DISTINCT
- List of all distinct values (all columns):
LIST_DISTINCT
- The latest/most recent value in a specified timeframe (all columns except foreign keys):
LAST
- The oldest/least recent value in a specified timeframe (all columns except foreign keys):
FIRST
LIST_DISTINCT
for a smooth user experience we only recommend using it if the number of possible values to return is less than 1000.
Within the aggregation function inputs, the start
and end
parameters refer to the time period you want to aggregate facts across, as calculated in days. For example: 10
for start
and 30
for end
implies that you want to aggregate across all facts from 10 days later (excluding the 10th day) to 30 days later (including the 30th day). Both start
and end
should be non-negative integers and end
should be strictly greater than start
.
In general for the target formula, you should input a facts table that has historic fact data over a timeframe that is at least an order of magnitude more than the prediction horizon in your target formula. For example, if you are predicting over the next 30 days, especially to take into account seasonality, you’d probably be best off including at least 1 year of rows in your fact table.
Note it is also optionally possible to convert your prediction target into a conditional statement prediction problem (e.g. a prediction of whether a particular target expression is true or false) by adding a comparison operator.
Example 3
The syntax (and some examples) for this part of the query is as follows:- Greater than (numeric, timestamp columns):
>
- Less than (numeric, timestamp columns):
<
- Equal to (all column types):
=
- Not equal to (all column types):
!=
- Greater than or equal to (numeric, timestamp columns):
>=
- Less than or equal to (numeric, timestamp columns):
<=
- Starts with a particular value (text columns):
STARTS WITH
- Ends with a particular value (text columns):
ENDS WITH
- Contains a particular value (text columns):
CONTAINS