Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data [article]

Chao Li, Shohei Shimizu
2018 arXiv   pre-print
Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete
more » ... a and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.
arXiv:1802.05889v1 fatcat:4vjrnhnxrffsxovej5pt7adeaq