Updating mortality risk estimation in intensive care units from high-dimensional electronic health records with incomplete data
AbstractContextIntensive care units (ICU) are subject to a high mortality rate, currently addressed by the implementation of scores (SAPS II, SOFA, APACHE II) assessing the risk of in-hospital mortality from admission data. Their performances are satisfactory to predict death when complications occur early after admission; however, they may become irrelevant in the case of long hospital stays.MethodsUsing the MIMIC-III database, we developed predictive models of short-term mortality in ICU from
... longitudinal data collected throughout patients' stays of at least 48 hours. Several statistical learning approaches were compared, including deep neural networks and penalized regression. Missing data were handled using either complete case analysis or multiple imputation. Models' performances were evaluated via repeated 5-fold cross-validation.ResultsPredictions relying on longitudinal data were more accurate than those relying solely on admission data. Complete case analyses from 19 predictors showed good discrimination (area under the ROC curve [AUC] > 0.77 for several statistical learning approaches) to predict death between 12 and 24 hours onwards, while keeping only 25% of patients in the sample. Multiple imputation allowed to include 70 predictors and keep 95% of patients, with similar performances, hence allowing predictions in patients with incomplete data. Calibration was satisfactory for all models.DiscussionThis proof of concept supports that automated analysis of electronic health records can be of great interest throughout patients' stays, as a surveillance tool likely to detect lethal complications in ICU soon enough to take corrective measures. Though this framework relies on a large set of predictors, it is robust to data imputation and may be effective early after admission, as data is still scarce.