JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Mean-Variance Efficient Reinforcement Learning
Masahiro KATOKei NAKAGAWAKenshi ABETetsuro MORIMURAKentaro BABA
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2024 Volume 2024 Issue FIN-033 Pages 177-184

Details
Abstract

This study investigates the mean-variance (MV) trade- off in reinforcement learning (RL), an instance of sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies, whose means and variances are located on the Pareto efficient frontier with respect to the MV trade- off; under this condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employ constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more computationally efficient as it eliminates the need for gradient estimation of variance, a contributing factor to the double sampling issue observed in existing methodologies. Through experimentation, we validate the efficacy of our approach.

Content from these authors
© 2024 Authors
Previous article Next article
feedback
Top