2023 Volume 2023 Pages 46-55
First, we define a stochastic shortest path problem (SSP) on Borel space considering the distribution of the total cost. Next, for this SSP, policy classes are set considering the number of steps to reach the terminal state for the first time, and a policy is designed for each class. There is no guarantee that the probability of a large total cost is small for a policy with a small expected value of total cost (expected total cost). The policy that minimizes the expected total cost or Conditional Value at Risk (CVaR) may not reach the terminal state. The proposed policy in this paper by the method of reducing to δ-perturbed problem (δ-PP) or the method of combining the policies obtained by discounted problems has a low probability of a large total cost and is guaranteed to reach the terminal state. Finally, we compare the proposed methods with the policy that minimizes the expected total cost or CVaR.