Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Multi-Objective Reinforcement Learning (MORL) is a generalization of standard reinforcement learning that aims to balance multiple, possibly conflicting, objectives. A common challenge in MORL is to learn policies that correspond to any Pareto optimal solution, especially when the Pareto front is non-convex. In this paper, we propose a novel method that learns a single policy that directly optimizes the hypervolume metric, which measures the volume dominated by a set of points in the objective space. The main idea is to transform the multiple objective values into hypervolumes and apply Watkins' Q-learning algorithm to learn a policy that maximizes the hypervolume. Moreover, our method can adapt the policy to achieve any desired Pareto solution without retraining. We call our method hypervolume maximization Q-learning, and present two variants of it–a tabular version and a deep learning version. We evaluated our method on the Deep Sea Treasure benchmark, a non-convex MORL problem, and show that it can effectively learn policies that achieve all Pareto solutions.