Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1B5-GS-2-05
Conference information

Hypervolume Maximization Q-learning
*Takuma SHIBAHARATakeshita KOUKI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Multi-Objective Reinforcement Learning (MORL) is a generalization of standard reinforcement learning that aims to balance multiple, possibly conflicting, objectives. A common challenge in MORL is to learn policies that correspond to any Pareto optimal solution, especially when the Pareto front is non-convex. In this paper, we propose a novel method that learns a single policy that directly optimizes the hypervolume metric, which measures the volume dominated by a set of points in the objective space. The main idea is to transform the multiple objective values into hypervolumes and apply Watkins' Q-learning algorithm to learn a policy that maximizes the hypervolume. Moreover, our method can adapt the policy to achieve any desired Pareto solution without retraining. We call our method hypervolume maximization Q-learning, and present two variants of it–a tabular version and a deep learning version. We evaluated our method on the Deep Sea Treasure benchmark, a non-convex MORL problem, and show that it can effectively learn policies that achieve all Pareto solutions.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top