2009 Volume 2 Issue 4 Pages 213-221
In this paper, we present a modified dynamic programming (DP) method. The method is basically the same as the value iteration method (VI), a representative DP method, except the preprocess of a system's state transition model for reducing its complexity, and is called the dynamic programming on reduced models (DPRM). That reduction is achieved by imaginarily considering causes of the probabilistic behavior of a system, and then cutting off some causes with low occurring probabilities. In computational illustrations, VI, DPRM, and the real-time Q-learning method (RTQ) are applied to elevator operation problems, which can be modeled by using Markov decision processes. The results show that DPRM can compute quasi-optimal value functions which bring more effective allocations of elevators than value functions by RTQ in less computational times than VI. This characteristic is notable when the traffic pattern is complicated.