2013 Volume 65 Issue 3 Pages 315-318
One difficulty of model designing problems on the partially observable Markov decision process (POMDP), such as apprenticeship learning, was its high calculation cost for solving the optimal policy of many POMDP problems. In this paper, we propose two techniques that reduce the calculation cost within such a setting, that is, transfer learning and subgradient calculation. We show that both techniques can be efficiently implemented on a policy-iteration POMDP solver.