Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 3Q5-IS-2b-04
Conference information

Generative Model of Policies: Exploring the Latent Space with Human Feedback
*Raffael Bolla Di LORENZOMichita IMAI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Reinforcement learning often makes use of training a population of agents with a diversity of behaviors. A population of agents can be used to train a robust agent, that can for instance cooperate with a human partner, or simply discover many ways to solve a given task. Generative Models of Policies are able to discover a wide range of agent policies that succeed at a given task without requiring separate policy parameters. Moreover, they can adapt to new tasks or goals simply by optimizing in the learnt latent space of policies. In this paper, we focus on the understanding and the exploration of the latent space of policies for discovering new behaviors. More specifically, we take inspiration from StyleGAN's mapping network to better structure the latent space. We then design an exploration protocol that uses human feedback to discover new behaviors.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top