抄録
'Reinforcement learning' is a computational framework for an autonomous agent to acquire novel behaviors based on 'reward' signal that reports the goodness of its performance. Despite recent progress in reinforcement learning theory, no artificial agent yet implements reinforcement learning as efficiently and robustly as our brain does. In this talk, I present a series of theoretical hypotheses about how reinforcement learning is realized in the brain:1) Within the circuit of cortico-basal ganglia loops, the striatum learns to represent the 'values' of actions, and the its downstream circuit realizes stochastic action selection. 2) The cerebellum, the basal ganglia, and the cerebral cortex are specialized respectively for supervised, reinforcement, and unsupervised learning paradigms, and their combination enables model-based reinforcement learning using contextual information. 3) The ascending neuromodulators signal global signals for learning: dopamine for the error of reward prediction, serotonin for time scale of prediction, noradrenaline for the sharpness of response tuning, and acetylcholine for the speed of memory update. 4) Parallel organizations of the cortico-basal ganglia loops enables flexible selection of different representations, algorithms, and time scales of learning and control. We report our simulation, brain imaging, and neural recording approaches to testing those hypotheses. [Jpn J Physiol 54 Suppl:S51 (2004)]