metrics

`metrics: List[str]` (Optional)

Description

This field specifies the metrics to use for evaluating your model. Available metrics vary per task, with incompatible metrics resulting in a validation error.

Supported Task Types

Available Extra Metrics

Task Type	Options
Binary Classification	`acc`,`auroc`, `auprc`, `ap`, `f1`,`ndcg`, `ndcg@k`,`precision`,`precision@k`,`recall`,`recall@k`; for k = 1, 10, and 100
Multiclass Classification	`acc`, `f1`, `precision`, `recall`
Multilabel Classification	`acc`,`f1`, `precision`, `recall`; `auroc`,`auprc`, `ap` supported only with suffixes `_macro`, `_micro`, and `_per_label`
Multilabel Ranking	`f1@k`, `map@k`, `mrr@k`, `ndcg@k`, `precision@k`, `recall@k`,`hit_ratio@k` ; for k = 1, 10, and 100
Link Prediction	`f1@k`, `map@k`, `mrr@k`, `ndcg@k`, `precision@k`, `recall@k`, `hit_ratio@k`, `coverage@k`, `avg_popularity@k`, `personalization@k`, `diversity[col_name]@k`; for k = 1, 10, and 100
Regression	`mae`, `mape`, `mse`, `rmse`, `smape`
Forecasting	`mae`, `mse`, `rmse`, `smape`, `mape`, `neg_binamial`, `normal`, `lognormal`

Example

In the case of link prediction, the default metrics are map@1, map@10, and map@100, but you can use:

metrics: [map@12]

to report map@12.

Metric Description

precision@k, i.e. the proportion of recommendations within the top-k that are actually relevant. A higher precision indicates the model’s ability to surface relevant items early in the ranking.
recall@k, i.e. the proportion of relevant items that appear within the top-k. A higher recall indicates the model’s ability to retrieve a larger proportion of relevant items.
map@k (Mean Average Precision), considering the order of relevant items within the top-k. map@k can provide a more comprehensive view of ranking quality than precision alone.
ndcg@k (Normalized Discounted Cumulative Gain) accounts for the position of relevant items by considering relevance scores, giving higher weight to more relevant items appearing at the top.
mrr@k (Mean Reciprocal Rank), i.e. the mean reciprocal rank of the first correct prediction (or zero otherwise).
hit_ratio@k, i.e. the percentage of users for whom at least one relevant item is present within the top-k recommendations.
coverage@k, i.e. the percentage of unique items recommended across all users within the top-k. Higher coverage indicates a wider exploration of the item catalog.
avg_popularity@k provides insights into the model’s tendency to recommend popular items by averaging their popularity scores of items from the training set within the top-k recommendations.
personalization@k, i.e. the dissimilarity of recommendations across different users. Higher personalization suggests that the model tailors recommendations to individual user preferences rather than providing generic results. Dissimilarity is defined by the average inverse cosine similarity between users’ lists of recommendations.
diversity[item_col_name]@k computes the diversity of predictions according to an item category, i.e. the pair-wise inequality of recommendations according to item categories. An item category can be defined by a categorical column name in the item table, e.g., diversity[product_type]@10.

Predictive Query

Model Plan

SDK

`metrics: List[str]` (Optional)

Description

Supported Task Types

Available Extra Metrics

Example

Metric Description

Predictive Query

Model Plan

SDK

​metrics: List[str] (Optional)

​Description

​Supported Task Types

​Available Extra Metrics

​Example

​Metric Description

`metrics: List[str]` (Optional)

Description

Supported Task Types

Available Extra Metrics

Example

Metric Description