Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 4E1-GS-2-03
Conference information

Reward-oriented environment inference on reinforcement learning
*Kazuki TAKAHASHITomoki FUKAIYutaka SAKAITakashi TAKEKAWA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The development of deep neural networks has made it possible to achieve performance that exceeds human performance in simulation reinforcement learning problems. However, for real-world problems, issues such as explainability and online learning remain. Because real-world environments include reward-independent observables, the apparent pattern of observables becomes so large that it is difficult to explain AI's operating principles. In addition, achieving high performance requires a large amount of training data, making online learning difficult. Therefore, in this study, we attempt online policy learning in an environment that generates a huge number of patterns of observables by combining reward-dependent and reward-independent environments. The proposed learning method consists of action decisions that control exploration and exploitation by sampling and reward-oriented environment inference that reduces the number of observable patterns to a concise state. As a result, the reward-oriented environment inference model recovers the reward-dependent environment from a large number of observable patterns. Furthermore, the combination of the proposed model and the action decision improved the learning speed of the optimal policy.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top