Abstract
We propose Chain Form Reinforcement Learning for a reinforcement learning agent. In the real world, learning is difficult because there are an infinite number of states and actions that need a large number of stored memories and learning times. To solve a problem, estimated values are categorized as"GOOD" or "NO GOOD" in the reinforcement learning process. Additionally, the alignment sequence of estimated values is changed because they are regardedas an important sequence themselves. We conducted some simulations and observed the influence of our methods.