Host: Japan Society for Fuzzy Theory and Intelligent Info rmatics (SOFT)
Name : 41th Fuzzy System Symposium
Number : 41
Location : [in Japanese]
Date : September 03, 2025 - September 05, 2025
In the field of off-policy evaluation, methods have been proposed that use reward estimation models learned from data to predict rewards in unobserved domains. However, the data used for learning depends on the action selection probabilities of the policy used to obtain the data, and the model’s prediction accuracy may deteriorate due to selection bias. This is because variables that influence policy action selection also influence the results, leading to spurious correlations caused by confounding factors that are reflected in the prediction model. Therefore, this study aims to construct a reward estimation model based on causal relationships rather than correlation-based prediction models. As the first step, we constructed a causal graph from real data using the Peter-Clark algorithm, one of the causal exploration methods. Additionally, we analyzed the constructed causal graph and explored methods for applying it to reward estimation models.