Abstract
A neural network is used as a function approximator of an action value function for reinforcement learning , in order to cope with a large number of discrete states. The learning of lambda return by the proposed network is based on a backward view of Sarsa(λ), which enables an on-line learning. The proposed method is applied to acquire heuristic strategy of a board game, which is known as Dots-and-Boxes. Computer experiments are executed for the learning by training matches competing with a mini-max method of the search depth 1.