AutoML’25: SmartCal: A Novel Automated Approach to Classifier Probability Calibration

on July 29, 2025

Submitted by Abdelrahman et al. as part of the AutoML’25 conference

Imagine developing a highly accurate machine learning (ML) model for medical diagnosis that consistently underestimates rare but critical conditions. Lives could be put at risk due to such miscalibrations in predictive probabilities. Similarly, consider a fraud detection system that frequently flags legitimate transactions due to poor probability calibration, causing unnecessary frustration and inefficiency.

These scenarios highlight the importance of reliable, well-calibrated probability estimates in classification tasks across healthcare, finance, and autonomous driving systems. Unfortunately, traditional calibration methods like Platt scaling and temperature scaling often fail to generalize well across diverse datasets, leaving practitioners guessing about the best approach.

Figure: Distribution of optimal calibration algorithms across datasets. Clearly illustrating the need for an adaptive approach

SmartCal: Automated and Intelligent Calibration

To tackle this issue, we present SmartCal, an innovative Automated Machine Learning (AutoML) framework designed to automatically identify the optimal post-hoc calibration strategy from among 12 popular methods. Our solution is inspired by the realization that no single calibration method universally excels. Instead, different datasets and classifiers require tailored calibration approaches.

At the heart of SmartCal is a meta-model trained on a large-scale knowledge base comprising 165 diverse datasets (160 tabular, 3 image, and 2 language datasets) and 13 different classifiers. The meta-model leverages dataset-specific meta-features and classifier prediction characteristics to predict the most suitable calibration method for new tasks.

Once promising candidates are identified, SmartCal employs Bayesian optimization to further refine and tune the hyperparameters efficiently, dramatically outperforming random search and standard calibration baselines.

Validating SmartCal: Experimental Results

We conducted comprehensive experiments to assess SmartCal’s effectiveness. Evaluating across a benchmark of 30 unseen datasets, SmartCal significantly outperformed commonly used methods like Temperature Scaling and Beta Calibration.

In fact, SmartCal achieved up to 50% lower Expected Calibration Error (ECE) compared to traditional methods and demonstrated consistently superior performance under varying computational budgets. Importantly, statistical analysis using a Friedman/Nemenyi test confirmed these performance gains as statistically significant.

Below is the performance comparison of Expected Calibration Error across different methods:

Method	Avg. ECE ± Std. Dev
SmartCal (10 iters)	0.0301 ± 0.0314
SmartCal (30 iters)	0.0240 ± 0.0267
Random Search (10 iters)	0.0382 ± 0.0732
Random Search (30 iters)	0.0408 ± 0.0673
Temperature Scaling	0.0743 ± 0.0626
Beta Calibration	0.0405 ± 0.0406

Table: Average Expected Calibration Error (ECE) over 30 benchmark datasets under two different iteration budgets (𝑁 = 10 and 𝑁 = 30), followed by Random Search using the same iteration budgets. The last two rows compare the Temperature scaling and Beta calibration methods as baselines.

Why Choose SmartCal?

Adaptivity: Automatically identifies the best calibration method tailored to your dataset.
Efficiency: Bayesian optimization intelligently explores calibration methods, reducing computational overhead.
User-friendly: A single, unified interface integrates 12 calibration algorithms, simplifying the calibration process.

Try SmartCal Today!

We believe SmartCal can significantly ease the calibration challenges faced by ML practitioners, researchers, and businesses alike. To explore our framework and start achieving superior calibration performance effortlessly, visit our GitHub repository.

Empower your machine learning models with SmartCal, where calibration meets intelligence.

Acknowledgment

This work was supported by the project “Increasing the knowledge intensity of Ida-Viru entrepreneurship” co-funded by the European Union. You can visit our Data Systems Research Group page at the University of Tartu for more relevant work on Automated Machine Learning.

–> Camera-Ready-Version of the paper at AutoML’25

Categories:

AutoML

Tags:

No Tag

Comments are closed