Suggest Edits
=
, >
or <
, the task will be defined as a binary classification task.
The prediction output for a binary classification task is the probability that the target value belongs to the positive class. A classification threshold determines how high the predicted probability must be for it to be considered a positive class prediction. Kumo uses accuracy, AUROC, and AUPRC as the evaluation metrics for this task, and provides a confusion matrix and a cumulative gain chart to help understand the results.
1
, a predictor that makes completely random predictions yields an AUROC score of 0.5
. An AUROC of 0.8
is typically considered very good, but what is considered good enough will depend on your specific business problem.
1
. The baseline for determining what makes a better-than-random AUPRC score is different for each problem - specifically, it depends on how imbalanced the data is. For example, if your problem has 10% positive class and 90% negative class, then a predictor that always arbitrarily makes positive predictions would achieve an AUPRC of 0.1
.
AUPRC is typically considered a better metric when you have imbalanced data (e.g., you are trying to make good predictions for a very rare class), since it is much more sensitive to changes in performance for predicting that rare class. On the other hand, with a metric like accuracy, you could achieve very high metrics by just predicting that a rare positive class never happens.
Depending on your business focus, changing the threshold above which predictions are considered positive will change accuracy, specificity, sensitivity, precision, and recall scores, but a single AUROC or AUPRC score captures how good the overall performance is under different thresholds.
Predicted Negative Class | Predicted Positive Class | |
---|---|---|
Actual negative class | TN | FP |
Actual positive class | FN | TP |