Big data problems on discovering and analyzing causal relationships in epidemiological data

Yiheng Liang, Armin R. Mikler
<span title="">2014</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/faqqmambavbalpofpx3p6nntua" style="color: black;">2014 IEEE International Conference on Big Data (Big Data)</a> </i> &nbsp;
Publicly available datasets in health science are often large and observational, in contrast to experimental datasets where a small number of data are collected in controlled experiments. Variables' causal relationships in the observational dataset are yet to be determined. However, there is a significant interest in health science to discover and analyze causal relationships from health data since identified causal relationships will greatly facilitate medical professionals to prevent diseases
more &raquo; ... or to mitigate the negative effects of the disease. Recent advances in Computer Science, particularly in Bayesian networks, has initiated a renewed interest for causality research. Causal relationships can be possibly discovered through learning the network structures from data. However, the number of candidate graphs grows in a more than exponential rate with the increase of variables. Exact learning for obtaining the optimal structure is thus computationally infeasible in practice. As a result, heuristic approaches are imperative to alleviate the difficulty of computations. This research provides effective and efficient learning tools for local causal discoveries and novel methods of learning causal structures with a combination of background knowledge. Specifically in the direction of constraint based structural learning, polynomial-time algorithms for constructing causal structures are designed with first-order conditional independence. Algorithms of efficiently discovering noncausal factors are developed and proved. In addition, when the background knowledge is partially known, methods of graph decomposition are provided so as to reduce the number of conditioned variables. Experiments on both synthetic data and real epidemiological data indicate the provided methods are applicable to large-scale datasets and scalable for causal analysis in health data. Followed by the research methods and experiments, this dissertation gives thoughtful discussions on the reliability of causal discoveries computational health science research, complexity, and implications in health science research. Copyright 2015 by Yiheng Liang ii ACKNOWLEDGMENTS
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/bigdata.2014.7004421">doi:10.1109/bigdata.2014.7004421</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/bigdataconf/LiangM14.html">dblp:conf/bigdataconf/LiangM14</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zr4b3oxlurhf3b7tgiw3yf63ji">fatcat:zr4b3oxlurhf3b7tgiw3yf63ji</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200323072156/https://digital.library.unt.edu/ark:/67531/metadc804966/m2/1/high_res_d/dissertation.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9a/7e/9a7e3a0d0501cecd91e178f1f7aac4b66cee0fd6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/bigdata.2014.7004421"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>