Many empirical sciences, including the social sciences and life sciences, aim to study causal relationships. Researchers in these fields need computational methods for analyzing observed data and identifying causal structures among a set of variables. Such computational methods enable researchers to draw conclusions on the basis of both their assumptions and the observed data. Moreover, these methods are useful for developing hypotheses on causal relations, designing future observational studies, and planning future experimental studies that can potentially provide stronger evidence of estimated causal relations. The objective of this special issue is to present an up-to-date overview of causal discovery methods, which have witnessed rapid advancements in recent years. The chief editor and guest editors invited the following three survey papers on various hot topics related to causal discovery:
Graphical models provide a principled way to take advantage of independence constraints for probabilistic and causal modeling, while giving an intuitive graphical description of “qualitative features” useful for these tasks. A popular graphical model, known as a Bayesian network, represents joint distributions by means of a directed acyclic graph (DAG). DAGs provide a natural representation of conditional independence constraints, and also have a simple causal interpretation. When all variables are observed, the associated statistical models have many attractive properties. However, in many practical data analyses unobserved variables may be present. In general, the set of marginal distributions obtained from a DAG model with hidden variables is a much more complicated statistical model: the likelihood of the marginal is often intractable; the model may contain singularities. There are also an infinite number of such models to consider. It is possible to avoid these difficulties by modeling the observed marginal directly. One strategy is to define a model by means of conditional independence constraints induced on the observed marginal by the hidden variable DAG; we call this the ordinary Markov model. This model will be a supermodel that contains the set of marginal distributions obtained from the original DAG. Richardson and Spirtes (2002) and Evans and Richardson (2013a) gave parametrizations of this model in the Gaussian and discrete case, respectively. However, it has long been known that hidden variable DAG models also imply nonparametric constraints which generalize conditional independences; these are sometimes called “Verma Constraints”. In this paper we describe a natural extension of the ordinary Markov approach, whereby both conditional independences and these generalized constraints are used to define a nested Markov model. The binary nested Markov model may be parametrized via a simple extension of the binary parametrization of the ordinary Markov model of Evans and Richardson (2013a). We also give evidence for a characterization of nested Markov equivalence for models with four observed variables. A consequence of this characterization is that, in some instances, most structural features of hidden variable DAGs can be recovered exactly when a single generalized independence constraint holds under the distribution of the observed variables.
While randomized controlled experiments are often considered the gold standard for predicting causal relationships between variables, they are expensive if one is interested in understanding the complete set of causal relationships governing a large set of variables and it may not be possible to manipulate certain variables due to ethical or practical constraints. To address these scenarios, procedures have been developed which use conditional independence relationships among variables when they are passively observed to predict which variables may or may not be causally related to other variables. Until recently, most of these procedures assumed that the data consisted of a single i.i.d. dataset of observations, but in practice researchers often have access to multiple similar datasets, e.g. from multiple labs studying the same problem, which measure slightly different variable sets and where recording conventions and procedures may vary. This paper discusses recent state of the art approaches for predicting causal relationships using multiple observational and experimental datasets in these contexts.
In many empirical sciences, the causal mechanisms underlying various phenomena need to be studied. Structural equation modeling is a general framework used for multivariate analysis, and provides a powerful method for studying causal mechanisms. However, in many cases, classical structural equation modeling is not capable of estimating the causal directions of variables. This is because it explicitly or implicitly assumes Gaussianity of data and typically utilizes only the covariance structure of data. In many applications, however, non-Gaussian data are often obtained, which means that more information may be contained in the data distribution than the covariance matrix is capable of containing. Thus, many new methods have recently been proposed for utilizing the non-Gaussian structure of data and estimating the causal directions of variables. In this paper, we provide an overview of such recent developments in causal inference, and focus in particular on the non-Gaussian methods known as LiNGAM.
The main purpose of this study is to investigate influence of nonresponse in the “Interview Survey for Stratification and Social Psychology in 2010” (SSP-I2010 Survey). Now, social stratification is one of main research themes in the study of Japanese society, and the SSP-I2010 Survey provides basic data to study social stratification and people’s views on economic inequality in Japan. From a target sample of 3,500, approximately half (1,737) did not respond in the survey, thus nonresponse bias is a serious concern. From a survey methodological viewpoint, studies applying methods for dealing with nonresponse to Japanese surveys are few. Therefore many empirical studies with nonresponse bias adjustment are needed to understand influence of nonresponse in Japanese surveys. In an attempt to reduce the nonresponse bias in the SSP-I2010 Survey, we used two bias adjustment methods using information on both survey locations and individuals as auxiliary variables. The effectiveness of the bias adjustment methods was evaluated by a simulation and several items of the SSP-I2010 Survey where the values of population proportions are known. In this study, stratum identification was relatively insensitive to bias adjustment. On the other hand, the estimates of the proportion of people who accept the economic inequality increased by bias adjustment.
We propose a novel approach to finding an optimal subspace of multi-dimensional variables for identifying a cluster structure of objects. When some variables are irrelevant to the cluster structure and are correlated between themselves, they are likely to have an adverse effect on clustering of objects. In such situations, the proposed method aims to obtain an optimal subspace for partitioning objects by eliminating the effects of these irrelevant variables. The proposed method can be considered an extension of reduced k-means analysis and factorial k-means analysis for the settings where irrelevant variables are correlated. The proposed method is applied for the analyses of artificial and real data to investigate how it performs as compared to the existing methods.
A method for estimation of ability using pseudocounts in dichotomous item response models is given when associated item parameters are known or estimated by a separate calibration sample of examinees with the size of an appropriate order. The pseudocount minimizing the asymptotic mean square error is algebraically obtained. Though the pseudocount depends on unknown ability, a fixed lower bound for the pseudocount is derived under the logistic model with equivalent items. The lower bound is numerically shown to be reasonable under the 3-parameter logistic model with and without model misspecification.