Journal of Japan Industrial Management Association
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Neuro dynamic programming algorithms for computing optimal control of production lines
Katsuhisa OHNOKenji YASHIMATakahiro ITO
Author information
JOURNAL FREE ACCESS

2003 Volume 54 Issue 5 Pages 316-325

Details
Abstract
This paper discusses an optimal control problem for a failure-prone multi-stage production line that minimizes expected total cost per unit time. The problem is formulated as an undiscounted Markov decision process (UMDP), but it is difficult to solve the problem exactly due to the curse of dimensionality. In recent years, however, several algorithms in the field of reinforcement learning, or neuro-dynamic programming (NDP), have been devised to overcome the curse of dimensionality. In this paper, the simulation-based modified policy iteration method (SBMPIM) is proposed as a new algorithm of NDP. The SBMPIM adopts simulation in the value approximation routine of the modified policy iteration method (MPIM), which is an algorithm of UMDP applicable to relatively large-scale problems. The SBMPIM and existing algorithms, such as SMART, RELAXED-SMART and SBPI, of NDP are numerically compared with an optimal control policy of a single-stage production line computed by MPIM. Moreover, the performance of JIT production lines with the optimal numbers of kanbans is numerically compared with that of optimal or near-optimal control policies computed by SBMPIM. It is shown that all algorithms except the SBMPIM of NDP fail to converge, and the expected total cost per unit time of JIT production lines with the optimal numbers of kanbans rises by more than 7 percent in comparison with the expected total cost per unit time computed by SBMPIM.
Content from these authors
© 2003 Japan Industrial Management Association
Previous article Next article
feedback
Top