metrics: List[str] (Optional)

Description

This field specifies the metrics to use for evaluating your model. Available metrics vary per task, with incompatible metrics resulting in a validation error.

Supported Task Types

  • All

Available Extra Metrics

Task TypeOptions
Binary Classificationacc,auroc, auprc, ap, f1,ndcg, ndcg@k,precision,precision@k,recall,recall@k; for k = 1, 10, and 100
Multiclass Classificationacc, f1, precision, recall
Multilabel Classificationacc,f1, precision, recall; auroc,auprc, ap supported only with suffixes _macro, _micro, and _per_label
Multilabel Rankingf1@k, map@k, mrr@k, ndcg@k, precision@k, recall@k,hit_ratio@k ; for k = 1, 10, and 100
Link Predictionf1@k, map@k, mrr@k, ndcg@k, precision@k, recall@k, hit_ratio@k, coverage@k, avg_popularity@k, personalization@k, diversity[col_name]@k; for k = 1, 10, and 100
Regressionmae, mape, mse, rmse, smape
Forecastingmae, mse, rmse, smape, mape, neg_binamial, normal, lognormal

Example

In the case of link prediction, the default metrics are map@1, map@10, and map@100, but you can use:

metrics: [map@12]

to report map@12.

Metric Description

  • precision@k, i.e. the proportion of recommendations within the top-k that are actually relevant. A higher precision indicates the model’s ability to surface relevant items early in the ranking.

  • recall@k, i.e. the proportion of relevant items that appear within the top-k. A higher recall indicates the model’s ability to retrieve a larger proportion of relevant items.

  • map@k (Mean Average Precision), considering the order of relevant items within the top-k. map@k can provide a more comprehensive view of ranking quality than precision alone.

  • ndcg@k (Normalized Discounted Cumulative Gain) accounts for the position of relevant items by considering relevance scores, giving higher weight to more relevant items appearing at the top.

  • mrr@k (Mean Reciprocal Rank), i.e. the mean reciprocal rank of the first correct prediction (or zero otherwise).

  • hit_ratio@k, i.e. the percentage of users for whom at least one relevant item is present within the top-k recommendations.

  • coverage@k, i.e. the percentage of unique items recommended across all users within the top-k. Higher coverage indicates a wider exploration of the item catalog.

  • avg_popularity@k provides insights into the model’s tendency to recommend popular items by averaging their popularity scores of items from the training set within the top-k recommendations.

  • personalization@k, i.e. the dissimilarity of recommendations across different users. Higher personalization suggests that the model tailors recommendations to individual user preferences rather than providing generic results. Dissimilarity is defined by the average inverse cosine similarity between users’ lists of recommendations.

  • diversity[item_col_name]@k computes the diversity of predictions according to an item category, i.e. the pair-wise inequality of recommendations according to item categories. An item category can be defined by a categorical column name in the item table, e.g., diversity[product_type]@10.