Hypervolume最大化Q学習

柴原 琢磨; 竹下 孔喜

doi:10.11517/pjsai.JSAI2023.0_1B5GS205

Abstract

Multi-Objective Reinforcement Learning (MORL) is a generalization of standard reinforcement learning that aims to balance multiple, possibly conflicting, objectives. A common challenge in MORL is to learn policies that correspond to any Pareto optimal solution, especially when the Pareto front is non-convex. In this paper, we propose a novel method that learns a single policy that directly optimizes the hypervolume metric, which measures the volume dominated by a set of points in the objective space. The main idea is to transform the multiple objective values into hypervolumes and apply Watkins' Q-learning algorithm to learn a policy that maximizes the hypervolume. Moreover, our method can adapt the policy to achieve any desired Pareto solution without retraining. We call our method hypervolume maximization Q-learning, and present two variants of it–a tabular version and a deep learning version. We evaluated our method on the Deep Sea Treasure benchmark, a non-convex MORL problem, and show that it can effectively learn policies that achieve all Pareto solutions.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!