報酬推定モデルの改善を目指した適切な因果グラフの構築

杉村 真理子; 小林 一郎

doi:10.14864/fss.41.0_118

Abstract

In the field of off-policy evaluation, methods have been proposed that use reward estimation models learned from data to predict rewards in unobserved domains. However, the data used for learning depends on the action selection probabilities of the policy used to obtain the data, and the model’s prediction accuracy may deteriorate due to selection bias. This is because variables that influence policy action selection also influence the results, leading to spurious correlations caused by confounding factors that are reflected in the prediction model. Therefore, this study aims to construct a reward estimation model based on causal relationships rather than correlation-based prediction models. As the first step, we constructed a causal graph from real data using the Peter-Clark algorithm, one of the causal exploration methods. Additionally, we analyzed the constructed causal graph and explored methods for applying it to reward estimation models.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!