Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
Linearly solvable Markov decision process (L-MDP) is an essential subclass of MDP to find a better policy efficiently. We first develop a novel batch reinforcement learning algorithm for L-MDP in discretized action space. The algorithm simultaneously learns a state value function and a predictor of state values at next step by using pre-collected data. We evaluate our method on traffic signal control domain in a single intersection with the traffic simulator SUMO. Our experiment demonstrates that our method finds the policy on the domain efficiently.