ドパミン神経系による予測と意思決定の制御機構

榎本 一紀; 松本 直幸; 木村 實

doi:10.11249/jsbpjjpp.24.2_89

Abstract

For survive in the ever-changing natural environment, it is essential to assign long-term reward value for actions. Although midbrain dopamine neurons are known to signal reward value and its prediction error, it is not examined experimentally whether and how dopamine neurons encode long- term value of multiple future rewards (TD error), as suggested in reinforcement learning theories. We address this issue by studying 185 dopamine neuron activities recorded from three monkeys that performed a multi- step choice task for three rewards. In the task, they explored a reward among three alternatives and then exploited this knowledge to receive two additional rewards by repeating the same choice in subsequent trials. Dopamine responses to the start cues represented expectations of multiple future rewards; the sum of immediate and discounted future rewards. In accordance with this result, responses to the reinforcers beeps reflected the errors of the multiple future rewards. These responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. Moreover, we confirmed that these responses were established through learning the multistep choice paradigm for rewards. These findings demonstrate that dopamine neurons “learn” to encode the long-term value of multiple future rewards with distant rewards discounted.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!