Mean-Variance Efficient Reinforcement Learning

Kato Masahiro; Nakagawa Kei; Abe Kenshi; Morimura Tetsuro; Baba Kentaro

doi:10.11517/jsaisigtwo.2024.FIN-033_177

Abstract

This study investigates the mean-variance (MV) trade- off in reinforcement learning (RL), an instance of sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies, whose means and variances are located on the Pareto efficient frontier with respect to the MV trade- off; under this condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employ constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more computationally efficient as it eliminates the need for gradient estimation of variance, a contributing factor to the double sampling issue observed in existing methodologies. Through experimentation, we validate the efficacy of our approach.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!