線形可解マルコフ決定過程のためのバッチ強化学習

西 智樹; 大滝 啓介; 吉村 貴克

doi:10.11517/pjsai.JSAI2018.0_3Pin105

Abstract

Linearly solvable Markov decision process (L-MDP) is an essential subclass of MDP to find a better policy efficiently. We first develop a novel batch reinforcement learning algorithm for L-MDP in discretized action space. The algorithm simultaneously learns a state value function and a predictor of state values at next step by using pre-collected data. We evaluate our method on traffic signal control domain in a single intersection with the traffic simulator SUMO. Our experiment demonstrates that our method finds the policy on the domain efficiently.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Conference information

Register with J-STAGE for free!