2024 Volume 2024 Pages 10-19
The stochastic shortest path problem (SSP) is a standard model for sequential decision making under uncertain environments. This model cannot handle cases where a catastrophic event occurs in the middle of an episode, or situations where a transition to a terminal state may not occur. Therefore, we define an SSP that takes into account dead-ends and undesired terminal states. The optimal policy for this problem is a stochastic semi-Markov policy. Since it is difficult to solve this SSP directly, an approximate problem is proposed. The optimal policy for an approximation problem is expressed as a probability distribution on a set of at most three deterministic policies. Deterministic policies are derived by considering a Bayesian-Adaptive MDP (BAMDP) for three Markov decision processes (MDPs) corresponding to the objective function and constraints regarding dead-ends and undesired terminal states. The probability distribution on a set of deterministic policies can be obtained by solving a two-person zero-sum game problem between the deterministic policies and three MDPs.