Exploratory Data Mining for Subgroup Cohort Discoveries and Prioritization

Danlu Liu, William Baskett, David Beversdorf, Chi-Ren Shyu
2019 IEEE journal of biomedical and health informatics  
Finding small homogeneous subgroup cohorts in large heterogeneous populations is a critical process for hypothesis development in biomedical research. Concurrent computational approaches are still lacking in robust answers to the question "what hypotheses are likely to be novel and to produce clinically relevant results with well thought-out study designs?" We have developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of
more » ... subpopulations and prioritize potential cohorts based on their explainable contrast patterns and which may provide interventionable insights. We conducted computational experiments on both synthesized data and a clinical autism data set to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. We also conducted a scaling analysis using a distributed computing environment to suggest computational resource needs for when the subpopulation number increases. This work will provide a robust data-driven framework to automatically tailor potential interventions for precision health.
doi:10.1109/jbhi.2019.2939149 pmid:31494566 pmcid:PMC9341221 fatcat:n7evwo55u5ebjintvtojfr57yu