Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
32nd (2018)
Session ID : 3Pin1-11
Conference information

Objective Correction for Policy Improvement under Entropy Regularization
*Ryo IWAKIMinoru ASADA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Reinforcement learning aims to find a policy which maximizes long term future reward by interacting with unknown environment through trial and error. In this study, we propose an objective correction method for entropy regularized Markov decision process. After deriving a policy gradient under the regularization by the entropy and relative entropy, we propose an on-policy objective correction method for off-policy policy improvement under entropy regularization.

Content from these authors
© 2018 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top