A configurable evaluator for multiclass classification tasks. The function
reproduces the behaviour of Python's multiclass_metrics_fn, covering classical
discrimination metrics (ROC‑AUC with various averaging strategies, F1, accuracy, …)
as well as calibration metrics (ECE / adaptive ECE / classwise ECE).
Supported metrics names:
"roc_auc_macro_ovo" – ROC AUC, macro averaged over one-vs-one
"roc_auc_macro_ovr" – ROC AUC, macro averaged over one-vs-rest
"roc_auc_weighted_ovo" – ROC AUC, weighted averaged over one-vs-one
"roc_auc_weighted_ovr" – ROC AUC, weighted averaged over one-vs-rest
"accuracy" – Overall accuracy
"balanced_accuracy" – Balanced accuracy (useful for imbalanced datasets)
"f1_micro" – F1 score, micro averaged
"f1_macro" – F1 score, macro averaged
"f1_weighted" – F1 score, weighted averaged
"jaccard_micro" – Jaccard index, micro averaged
"jaccard_macro" – Jaccard index, macro averaged
"jaccard_weighted" – Jaccard index, weighted averaged
"cohen_kappa" – Cohen's κ
"brier_top1" – Brier score between top prediction and true label
"ECE" – Expected Calibration Error (equal‑width bins)
"ECE_adapt" – Adaptive ECE (equal‑size bins)
"cwECEt" – Classwise ECE with threshold
"cwECEt_adapt" – Classwise adaptive ECE with threshold
"hits@n" – Computes HITS@1, HITS@5, HITS@10
"mean_rank" – Computes mean rank and mean reciprocal rank
Arguments
- y_true
Numeric or integer vector of ground‑truth labels (1, 2, ..., K) using R's 1-based indexing.
- y_prob
Numeric matrix of predicted probabilities with shape (n_samples, n_classes).
- metrics
Character vector listing which metrics to compute. Default is
c("accuracy", "f1_macro", "f1_micro").- y_predset
Optional matrix for prediction set metrics. Default is NULL.