Binary Classification
Binary classification tasks are problems with target values that fall under two categories: one positive class, one negative class. For example, in a churn prediction problem, you want to predict whether a customer leaves your service in the near future, so the labels can be defined as:- Positive Class: the customer discontinues your service.
- Negative Class: the customer continues your service.
=
, >
or <
, the task will be defined as a binary classification task.
The prediction output for a binary classification task is the probability that the target value belongs to the positive class. A classification threshold determines how high the predicted probability must be for it to be considered a positive class prediction. Kumo uses accuracy, AUROC, and AUPRC as the evaluation metrics for this task, and provides a confusion matrix and a cumulative gain chart to help understand the results.
Accuracy
The accuracy tells you the overall percentage of correct predictions. The accuracy score is given by the total number of correct predictions over the number of samples.AUROC
The AUROC stands for the area under the receiver operating characteristic curve, which is a curve plotting sensitivity against specificity under different classification thresholds. Sensitivity is calculated by:

1
, a predictor that makes completely random predictions yields an AUROC score of 0.5
. An AUROC of 0.8
is typically considered very good, but what is considered good enough will depend on your specific business problem.
AUPRC
The AUPRC stands for the area under the precision-recall curve, which is a curve plotting precision against recall under different classification thresholds. Precision is calculated by:

1
. The baseline for determining what makes a better-than-random AUPRC score is different for each problem - specifically, it depends on how imbalanced the data is. For example, if your problem has 10% positive class and 90% negative class, then a predictor that always arbitrarily makes positive predictions would achieve an AUPRC of 0.1
.
AUPRC is typically considered a better metric when you have imbalanced data (e.g., you are trying to make good predictions for a very rare class), since it is much more sensitive to changes in performance for predicting that rare class. On the other hand, with a metric like accuracy, you could achieve very high metrics by just predicting that a rare positive class never happens.
Depending on your business focus, changing the threshold above which predictions are considered positive will change accuracy, specificity, sensitivity, precision, and recall scores, but a single AUROC or AUPRC score captures how good the overall performance is under different thresholds.
Confusion Matrix
Kumo provides a confusion matrix that presents the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) on the same graph. The confusion matrix can be read as follows:Predicted Negative Class | Predicted Positive Class | |
---|---|---|
Actual negative class | TN | FP |
Actual positive class | FN | TP |
Cumulative Gain Chart
The cumulative gain chart shows the percentage of “gain” you can achieve in a certain class by taking a particular action. For example, suppose that the graph below is the result of a churn prediction query:


Multiclass Classification
Kumo uses the accuracy score as the metric in a multiclass classification problem. The accuracy score presents the percentage of correct class prediction, and is given by the total number of correct class predictions over the number of samples.Multilabel Classification
For a multilabel classification problem, Kumo calculates the same precision and recall score for each class as in the binary classification case. You can also look at the F1 score, which takes both precision and recall into consideration, calculated by:
