2025 Volume 37 Issue 4 Pages 680-694
This study proposes a reinforcement learning method incorporating relative vector-based rules to improve deadlock avoidance and task efficiency in object transportation problems. The proposed method enables each agent to learn optimal actions independently without sharing rewards among agents. By generating relative vectors from current and past positions, agents can achieve accurate environmental perception and efficient learning even under partially observable conditions. Experimental results demonstrated that the proposed approach mitigates mutual interference among agents, promotes the acquisition of temporary stopping behaviors, and improves overall task performance, as measured by the number of delivered items. In addition, some agents exhibited altruistic behavior, such as yielding to others, despite the absence of any explicitly encoded cooperation mechanism. These behaviors emerged as a result of individual reward optimization during learning. The findings indicate that reinforcement learning without shared rewards can still lead to the autonomous emergence of cooperative behavior, offering a practical and efficient learning framework for dynamic and complex environments.