Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
Deep reinforcement learning has enabled to learn from complex input signals, which was a difficulty before, owing to the excellent approximation properties of neural networks. On the other hand, optimal action learning in finite time still poses challenges in environments as large and complex as the real world. This problem is caused by the huge number of sampling times required for data gathering for function approximation and the number of explorations for reinforcement learning. We focused on satisficing in an effort to combine the reduction of the number of explorations and the function approximation. Satisficing is a human decision-making tendency that is to explore with the aim of achieving. Linear Risk-sensitive Satisficing (LinRS) has been proposed, which is based on this satisficing and applied to the contextual bandit problem. However, LinRS has a problem in that the approximation of the past trial memory (reliability) is dull to the feature vectors, and the original concept of satisficing cannot be fully demonstrated. In this study, we proposed Regional LinRS, which uses episodic memory to approximate the memory in the temporal neighborhood, and showed its usefulness.