信頼度を局所的に近似する認知的満足化方策

南 朱音; 甲野 佑; 高橋 達二

doi:10.11517/pjsai.JSAI2022.0_2C4GS201

Abstract

Deep reinforcement learning has enabled to learn from complex input signals, which was a difficulty before, owing to the excellent approximation properties of neural networks. On the other hand, optimal action learning in finite time still poses challenges in environments as large and complex as the real world. This problem is caused by the huge number of sampling times required for data gathering for function approximation and the number of explorations for reinforcement learning. We focused on satisficing in an effort to combine the reduction of the number of explorations and the function approximation. Satisficing is a human decision-making tendency that is to explore with the aim of achieving. Linear Risk-sensitive Satisficing (LinRS) has been proposed, which is based on this satisficing and applied to the contextual bandit problem. However, LinRS has a problem in that the approximation of the past trial memory (reliability) is dull to the feature vectors, and the original concept of satisficing cannot be fully demonstrated. In this study, we proposed Regional LinRS, which uses episodic memory to approximate the memory in the temporal neighborhood, and showed its usefulness.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!