Reinforcement learning alogrithms can be classified into two approaches. One is “exploitation-oriented” approach which attempts to acquire action rules mainly by reinforcing and relying on good experiences, and the other is “exploration-oriented” approach which pursuits the optimality of actions to receive highest rewards by exploring the environment. In this paper, we propose
Q-PSP Learning method which incorporates the the idea of PSP (Profit Sharing Plan) used in Classifier System as “exploitation-oriented” reinforcement learning into
Q-Learning as “exploration-oriented” reinforcement learning in order to take the merits of these two approaches. Through applying the
Q-PSP Learning to several control problems and a robot navigation problem, it will be shown that not only the speed up of learning but also effectiveness for complex problems can be expected and that an appropriate balance between exploration and exploitation can be attained in
Q-PSP Learning.
抄録全体を表示