Diversity-Promoting and Large-Scale Machine Learning for Healthcare

Pengtao Xie
In healthcare, a tsunami of medical data has emerged, including electronic healthrecords, images, literature, etc. These data are heterogeneous and noisy, which rendersclinical decision-makings time-consuming, error-prone, and suboptimal. In this thesis, we develop machine learning (ML) models and systems for distilling highvalue patterns from unstructured clinical data and making informed and real-timemedical predictions and recommendations, to aid physicians in improving the efficiencyof
more » ... low and the quality of patient care. When developing these models, we encounter several challenges: (1) How to better capture infrequent clinical patterns,such as rare subtypes of diseases; (2) How to make the models generalize well on unseen patients? (3) How to promote the interpretability of the decisions? (4)How to improve the timeliness of decision-making without sacrificing its quality?(5) How to efficiently discover massive clinical patterns from large-scale data?To address challenges (1-4), we systematically study diversity-promoting learning, which encourages the components in ML models (1) to diversely spread out togive infrequent patterns a broader coverage, (2) to be imposed with structured constraints for better generalization performance, (3) to be mutually complementary formore compact representation of information, and (4) to be less redundant for better interpretability. The study is performed in both frequentist statistics and Bayesianstatistics. In the former, we develop diversity-promoting regularizers that are empirically effective, theoretically analyzable, and computationally efficient, and proposea rich set of optimization algorithms to solve the regularized problems. In the latter, we propose Bayesian priors that can effectively entail an inductive bias of "diversity"among a finite or infinite number of components and develop efficient posterior inference algorithms. We provide theoretical analysis on why promoting diversity canbetter capture [...]
doi:10.1184/r1/7553468 fatcat:ac5ifp2lnzbk3hcupr2rszxj2m