2020 年 24 巻 1 号 p. 142-155
Open data are becoming increasingly available in various domains, and many organizations rely on making decisions according to data. Such decision making requires care to distinguish between correlations and causal relationships. Among data analysis tasks, causal relationship analysis is especially complex because of unobserved confounders. For example, to correctly analyze the causal relationship between two variables, the possible confounding effect of a third variable should be considered. In the open-data environment, however, it is difficult to consider all possible confounders in advance. In this paper, we propose a framework for exploratory causal analysis of open data, in which possible confounding variables are collected and incrementally tested from a large volume of open data. To the extent of the authors’ knowledge, no framework has been proposed to incorporate data for possible confounders in causal analysis process. This paper shows an original way to expand causal structures and generate reasonable causal relationships. The proposed framework accounts for the effect of possible confounding in causal analysis by first using a crowdsourcing platform to collect explanations of the correlation between variables. Keywords are then extracted using natural language processing methods. The framework searches the related open data according to the extracted keywords. Finally, the collected explanations are tested using several automated causal analysis methods. We conducted experiments using open data from the World Bank and the Japanese government. The experimental results confirmed that the proposed framework enables causal analysis while considering the effects of possible confounders.