Adaptive Multi-objective Reinforcement Learning for Non-linear and Implicit Utility Functions

Shaojing ZHAO; Songchen FU; Letian BAI; Hong LIANG; Qingwei ZHAO; Ta LI

doi:10.1587/transinf.2025EDP7099

Shaojing ZHAO, Songchen FU, Letian BAI, Hong LIANG, Qingwei ZHAO, Ta LI

著者情報

キーワード: multi-objective reinforcement learning, implicit utility function, dynamic weighting, actor-critic, policy optimization

ジャーナルフリー早期公開

論文ID: 2025EDP7099

DOI https://doi.org/10.1587/transinf.2025EDP7099

詳細

抄録

Multi-objective reinforcement learning (MORL) addresses sequential decision-making problems involving conflicting objectives. While most existing methods assume access to known or explicitly defined utility functions, many real-world tasks feature implicit, nonlinear utilities that are only available as delayed black-box feedback. To tackle this challenge, we propose Adaptive Multi-Objective Actor-Critic (AMOAC), a scalable framework that dynamically aligns policy optimization with implicit utility signals, without requiring prior knowledge of the utility function's form. AMOAC employs a multi-critic architecture to maintain computational efficiency as the number of objectives grows, and introduces a dynamic direction-aligned weighting mechanism to guide policy updates toward utility maximization. Experiments on benchmark environments—including Deep Sea Treasure, Minecart, and Four Room—demonstrate that AMOAC consistently matches or exceeds the performance of baselines with explicit utility access, achieving robust adaptation and convergence under both linear and nonlinear utility scenarios. These results highlight the potential of dynamic weight adjustment in MORL for handling implicit preference structures and limited feedback settings.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）