人工知能学会第二種研究会資料
Online ISSN : 2436-5556
Mean-Variance Efficient Reinforcement Learning
Kato MasahiroNakagawa KeiAbe KenshiMorimura TetsuroBaba Kentaro
著者情報
研究報告書・技術報告書 フリー

2024 年 2024 巻 FIN-033 号 p. 177-184

詳細
抄録

This study investigates the mean-variance (MV) trade- off in reinforcement learning (RL), an instance of sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies, whose means and variances are located on the Pareto efficient frontier with respect to the MV trade- off; under this condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employ constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more computationally efficient as it eliminates the need for gradient estimation of variance, a contributing factor to the double sampling issue observed in existing methodologies. Through experimentation, we validate the efficacy of our approach.

著者関連情報
© 2024 著作者
前の記事 次の記事
feedback
Top