AutoML@AAAI26

on February 4, 2026

Having recently returned from the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) held in Singapore, I was curious to see the current trajectory of Automated Machine Learning (AutoML) research within our community. Since AutoML concerns all kinds of AI research, it was great to see many AutoML papers at AAAI. Let me give you a short overview of the most important papers I found there:

HyperSHAP: Shapley Values and Interactions for Explaining Hyperparameter Optimization (Disclaimer: our own paper)

Everyone is running ablation or sensitivity studies, and maybe even HPO. But how to do it in an efficient and sound way, and less ad-hoc? HyperSHAP addresses the critical “black-box” nature of modern hyperparameter optimization by introducing a game-theoretic explainability framework based on Shapley values. By providing an additive decomposition of performance metrics, the method enables researchers to quantify both the individual contributions and the complex interactions of hyperparameters, transforming opaque configurations into actionable insights for model tunability and optimizer behavior.

Link to Underline Proceedings

Neural Architecture and Hyperparameter Selection Through Meta-Learning on Time Series

To mitigate the prohibitive computational overhead of searching high-dimensional design spaces in time-series tasks, this paper introduces a meta-learning framework that leverages a joint representation of neural architectures and dataset characteristics. By utilizing a performance-prediction surrogate trained on historical search data, the authors demonstrate significant empirical gains—achieving up to a 60% performance improvement on classification benchmarks while simultaneously reducing the computational budget to a mere 10% of that required by traditional HPO methods like HEBO.

Link to Underline Proceedings

DA-DFGAS: Differentiable Federated Graph Neural Architecture Search with Distribution-Aware Attentive Aggregation

Addressing the dual challenges of data silos and heterogeneity in graph-structured data, the authors propose DA-DFGAS, a federated graph neural architecture search algorithm designed for decentralized environments. The framework achieves model personalization through a directed tree topology and a self-attention mechanism that captures distributional variations across clients, while a bi-level global-local optimization strategy balances global consistency with local adaptability. Empirical results highlight its superiority, yielding up to a 5% accuracy improvement over existing federated baselines.

Link to Underline Proceedings

PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning

This paper addresses a critical inefficiency in the Combined Algorithm Selection and Hyperparameter Optimization (CASH) pipeline: the reliance on rigid, fixed strategies for post-hoc ensemble construction. The authors propose PSEO, a framework that treats the ensemble phase itself as an optimization problem. By formulating base model selection through binary quadratic programming to balance diversity and performance, and searching a dedicated hyperparameter space for multi-layer stacking strategies, PSEO dynamically adapts to task-specific characteristics. The framework’s efficacy is underscored by its top-tier performance across 80 public datasets, outperforming 16 state-of-the-art AutoML and ensemble methods.

Link to Underline Proceedings

MetaGameBO: Hierarchical Game-Theoretic Driven Robust Meta-Learning for Bayesian Optimization

MetaGameBO addresses a critical failure mode in traditional meta-learned Bayesian optimization: the tendency to optimize for average-case performance at the expense of robustness on “outlier” tasks. By reformulating meta-learning as a hierarchical game-theoretic optimization problem using Conditional Value-at-Risk (CVaR) and diversity-aware sampling, the framework ensures robust generalization even under significant distribution shifts. The authors provide rigorous theoretical convergence guarantees, backed by empirical results showing an 88.6% reduction in tail risk compared to existing state-of-the-art methods.

Link to Underline Proceedings

Function-on-Function Bayesian Optimization

Traditional Bayesian optimization is largely limited to scalar outputs, failing to address emerging complex systems where both inputs and outputs are functions. This paper introduces Function-on-Function Bayesian Optimization (FFBO), underpinned by a novel Function-on-Function Gaussian Process (FFGP) model that utilizes separable operator-valued kernels to model correlations directly within functional spaces. By implementing a weighted operator-based scalarization for the acquisition function and a scalable Functional Gradient Ascent (FGA) algorithm, the authors provide a mathematically rigorous framework for optimizing high-dimensional, sensing-rich systems that outpaces existing surrogate models.

Link to Underline Proceedings

Faster Game Solving via Hyperparameter Schedules

This work challenges the complexity of state-of-the-art Counterfactual Regret Minimization (CFR) variants by introducing a streamlined framework called Hyperparameter Schedules (HSs). Rather than relying on computationally heavy agent-learned discounting, HSs utilize a simple, training-free dynamic adjustment that aggressively downweights early iterations to accelerate convergence toward a Nash equilibrium. Despite its simplicity—requiring fewer than 15 lines of code—the method demonstrates remarkable generalization across 17 diverse game environments, including large-scale poker, establishing a new state-of-the-art for solving two-player zero-sum games.

Link to Underline Proceedings

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

This paper addresses the “cold-start” problem in Hyperparameter Optimization (HPO) by introducing LAMDA, an algorithm-agnostic framework that eliminates the need for external expert knowledge or historical metadata. Instead of relying on pre-existing priors, LAMDA learns a reliable prior directly from low-fidelity (LF) evaluations before transitioning to guide the primary HPO process. The authors provide a rigorous theoretical analysis for its integration with both Bayesian Optimization and bandit-based methods, backed by empirical results where LAMDA achieved the top performance in 51 out of 56 diverse HPO tasks.

Link to Underline Proceedings

HAMLET4Fairness: Enhancing Fairness in AI Pipelines Through Human-Centered AutoML and Argumentation

Addressing the critical disconnect between algorithmic fairness and real-world deployment, HAMLET4Fairness introduces an AutoML framework that integrates human-centered logic and argumentation. By grounding the optimization process in the CRISP-DM methodology, the system allows stakeholders to co-design AI pipelines using multi-objective optimization bounded by user-defined constraints. This approach not only supports intersectional fairness—addressing multiple overlapping protected characteristics—but also provides transparency into how specific preprocessing choices influence ethical outcomes, bridging the gap between automated search and human-centric accountability.

Link to Underline Proceedings

Macro-Thinking, Micro-Coding: A Hierarchical Approach to LLM-Based High-performance GPU Kernel Generation

Addressing the intractable search space of GPU kernel optimization, Macro Thinking Micro Coding (MTMC) introduces a hierarchical paradigm that decouples high-level optimization strategies from low-level implementation. By employing reinforcement learning to guide “Macro” strategy selection and utilizing LLMs for incremental “Micro” code generation, the framework mirrors human expert workflows to ensure both hardware efficiency and code correctness. This dual-layered approach yields significant empirical gains, achieving up to a 34× speedup on challenging benchmarks like TritonBench and outperforming expert-optimized PyTorch kernels by 2.2×.

Link to Underline Proceedings

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Addressing the persistent lack of standardization in AI benchmark documentation, Auto-BenchmarkCard introduces an automated workflow for generating high-fidelity, validated benchmark descriptions. The system utilizes a multi-agent architecture to extract data from heterogeneous sources—such as Hugging Face and academic literature—and synthesizes it using LLMs. Crucially, it incorporates a FactReasoner validation phase that employs atomic entailment scoring to ensure factual accuracy, significantly improving the transparency and reproducibility of benchmark reporting in the community.

Link to Underline Proceedings

Summary and Takeaways

The research at AAAI-26 signals a definitive shift from “black-box” optimization toward interpretable, hierarchical, and resource-efficient automation. A primary takeaway is the move toward hierarchical orchestration, where high-level strategic reasoning is decoupled from low-level implementation to solve complex tasks like GPU kernel engineering. Efficiency is being redefined through advanced meta-learning, utilizing low-fidelity priors and cross-task knowledge to reduce computational overhead by up to 90%. Furthermore, the integration of game-theoretic frameworks has enhanced both robustness against outlier tasks and the interpretability of hyperparameter interactions. We also observed a critical trend in human-centric AutoML, where fairness and documentation are being automated through multi-agent systems and logic-based constraints. Collectively, these advancements transition AutoML from a tuning utility into a comprehensive, trustworthy architect for the next generation of AI systems.

Categories:

AutoML Events

Tags:

No Tag

Comments are closed

HyperSHAP: Shapley Values and Interactions for Explaining Hyperparameter Optimization (Disclaimer: our own paper)

Neural Architecture and Hyperparameter Selection Through Meta-Learning on Time Series

DA-DFGAS: Differentiable Federated Graph Neural Architecture Search with Distribution-Aware Attentive Aggregation

PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning

MetaGameBO: Hierarchical Game-Theoretic Driven Robust Meta-Learning for Bayesian Optimization

Function-on-Function Bayesian Optimization

Faster Game Solving via Hyperparameter Schedules

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

HAMLET4Fairness: Enhancing Fairness in AI Pipelines Through Human-Centered AutoML and Argumentation

Macro-Thinking, Micro-Coding: A Hierarchical Approach to LLM-Based High-performance GPU Kernel Generation

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Summary and Takeaways

Recent Posts

Tags