IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Adaptive Multi-objective Reinforcement Learning for Non-linear and Implicit Utility Functions
Shaojing ZHAOSongchen FULetian BAIHong LIANGQingwei ZHAOTa LI
著者情報
ジャーナル フリー 早期公開

論文ID: 2025EDP7099

詳細
抄録

Multi-objective reinforcement learning (MORL) addresses sequential decision-making problems involving conflicting objectives. While most existing methods assume access to known or explicitly defined utility functions, many real-world tasks feature implicit, nonlinear utilities that are only available as delayed black-box feedback. To tackle this challenge, we propose Adaptive Multi-Objective Actor-Critic (AMOAC), a scalable framework that dynamically aligns policy optimization with implicit utility signals, without requiring prior knowledge of the utility function's form. AMOAC employs a multi-critic architecture to maintain computational efficiency as the number of objectives grows, and introduces a dynamic direction-aligned weighting mechanism to guide policy updates toward utility maximization. Experiments on benchmark environments—including Deep Sea Treasure, Minecart, and Four Room—demonstrate that AMOAC consistently matches or exceeds the performance of baselines with explicit utility access, achieving robust adaptation and convergence under both linear and nonlinear utility scenarios. These results highlight the potential of dynamic weight adjustment in MORL for handling implicit preference structures and limited feedback settings.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top