Reinforcement Learning (RL) is a promising technique for creating agents that can be applied to real world problems. The most important features of RL are
trial-and-error search and
delayed reward. Thus, agents randomly act in the early learning stage. However, such random actions are impractical for real world problems.
This paper presents a novel model of RL agents. A feature of our learning agent model is to integrate the Analytic Hierarchy Process (AHP) into the standard RL agent model, which consists of three modules: state recognition, learning, and action selecting modules. In our model, the AHP module is designed with {\\it primary knowledge} that humans intrinsically have in a process until a goal state is attained. This integration aims at increasing promising actions instead of completely random actions in the standard RL algorithms.
Profit Sharing (PS) is adopted as a RL method for our model, since PS is known to be useful even in multi-agent environments. To evaluate our approach in a multi-agent environment, we test a PS RL method with our agent model on a pursuit problem in a grid world. Computational results show that our approach outperforms the standard PS in terms of learning speed in the earlier stages of learning. We also show that the learning performance of our approach is superior at least competitive to that of the standard one in the final stages of learning.
View full abstract