Protocol for an observational study evaluating new approaches to modelling diagnostic information from large administrative hospital datasets
The Charlson and Elixhauser indices define sets of conditions used to adjust for patients' comorbidities in administrative hospital data. A strength of these indices is the parsimony that results from including only 19 and 30 conditions respectively, but the conditions included may not be the ones most relevant to a specific outcome and population. Our objectives are to: (1) test an approach to developing parsimonious indices for the specific outcome and populations being studied, while
... died, while comparing performance to the Charlson and Elixhauser indices; and (2) evaluate several approaches that involve models with more diagnosis-related terms and aim to improve prediction performance by capturing more of the information in large datasets. Methods and analysis: This is a modelling study using a linked national dataset of administrative hospital records and death records. The study populations are patients admitted to hospital for acute myocardial infarction, hip fracture, or major surgery for colorectal cancer in England between 1 January 2015 and 31 December 2017. The outcome is death within 365 days of the date of admission (acute myocardial infarction and hip fracture) or procedure (colorectal surgery). In the 'First analysis', prognostic indices will be developed based on the presence/absence of individual ICD-10 codes in patients' medical histories. Logistic regression will be used to estimate associations with a full set of sociodemographic and diagnostic predictors from which reduced models (with fewer diagnostic predictors) will be produced using a step-down approach. In the 'Second analysis', models will also account for the timing that each ICD-10 code was last recorded and allow for non-linear relationships and interactions between conditions and the timings of records. Validation will include an overall measure of performance (scaled Brier score) and measures of discrimination (c-statistic) and calibration (such as the Integrated Calibration Index) in bootstrap or cross-validation samples. Sensitivity analyses will include varying the length of medical history analysed, using a comparator that combines the Charlson and Elixhauser sets of conditions, and aggregating ICD-10 codes into clinical groups.