人工知能
Online ISSN : 2435-8614
Print ISSN : 2188-2266
人工知能学会誌(1986~2013, Print ISSN:0912-8085)
実例に基づく強化学習法
畝見 達夫
著者情報
解説誌・一般情報誌 フリー

1992 年 7 巻 4 号 p. 697-707

詳細
抄録

This paper proposes a reinforcement learning method based on an instance-based learning approach. The learning take is assumed as follows. The input on each learning cycle is a vector of real numbers, the output is a symbol selected from a Priori known finite set, and the reinforcement from environment is +1, 0 or -1 usually being 0, that is, in the manner of delayed reinforcement. The last assumption makes it difficult to apply any conventional supervised concept learning schema because the evaluation of its output is not given at every cycle. The key idea is to propagate reinforcement backward through the memorized experiences in the order of time. The learner tends to select the output which is associated with the input similar to current situation and which will likely lead to high positive reinforcement, scanning all of the past experiences stored in memory verbatim. In addition to this basic mechanism, two types of extensions are proposed. The first is to restrict the capacity of memory to avoid infinite increase of time and space complexity, replacing the oldest data by new data in each cycle. The second is to embed a feedback mechanism concerning with reliability of each memorized experience. Reliability of the experience employed to decide the output of nearly previous cycle is increased when the learner gets positive reinforcement, and is decreased when negative reinforcement. Experimental results show these learning algorithms work well for a domain of simulating adaptive behavior, and the extension methods are effective.

著者関連情報
© 1992 人工知能学会
前の記事 次の記事
feedback
Top