A Causal Perspective on OSIM2 Data Generation, with Implications for Simulation Study Design and Interpretation
Journal of Causal Inference
AbstractResearch by the Observational Medical Outcomes Partnership (OMOP) has focused on developing and evaluating strategies to exploit observational electronic data to improve post-market prescription drug surveillance. A data simulator known as OSIM2 developed by the OMOP statistical methods group has been used as a testbed for evaluating and comparing different estimation procedures for detecting adverse drug-related events from data similar to that found in electronic insurance claims
... surance claims data. The simulation scheme produces a longitudinal dataset with millions of observations designed to closely match marginal distributions of important covariates in a known dataset. In this paper we provide a non-parametric structural equation model for the data generating process and construct the associated directed acyclic graph (DAG) depicting the causal structure. These representations reveal key differences between simulated and real-world data, including a departure from longitudinal causal relationships, absence of (presumed) sources of bias and time ordering of covariates that conflicts with reality. The DAG also reveals the presence of unmeasured baseline confounding of the causal effect of a drug on a subsequent medical condition. Conclusions naively drawn from this simulation study could mislead an investigator trying to gain insight into estimator performance on real data. Applying causal inference tools allows us to draw more informed conclusions and suggests modifications to the simulation scheme that would more closely align simulated and real-world data.