Early detection of pancreatic ductal adenocarcinomas with an ensemble learning model based on a panel of protein serum biomarkers [article]

Nuno Rocha Nene, Alexander Ney, Tatiana Nazarenko, Oleg Blyuss, Harvey Johnston, Harry Whitwell, Eva Sedlak, Aleksandra Gentry-Maharaj, Eithne Costello, William Greenhalf, Ian Jacobs, Usha Menon (+4 others)
2021 medRxiv   pre-print
Earlier detection of pancreatic ductal adenocarcinoma (PDAC) is key to improving patient outcomes, as it is mostly detected at advanced stages which are associated with poor survival. Developing non-invasive blood tests for early detection would be an important breakthrough. The primary objective of the work presented here was to use a unique dataset, that is both large and prospectively collected, to quantify a set of 96 cancer-associated proteins and construct multi-marker models with the
more » ... city to accurately predict PDAC years before diagnosis. The data is part of a nested case control study within UK Collaborative Trial of Ovarian Cancer Screening and is comprised of 219 samples, collected from a total of 143 post-menopausal women who were diagnosed with pancreatic cancer within 70 months after sample collection, and 248 matched non-cancer controls. We developed a stacked ensemble modelling technique to achieve robustness in predictions and, therefore, improve performance in newly collected datasets. With a pool of 10 base-learners and a Bayesian averaging meta-learner, we can predict PDAC status with an AUC of 0.91 (95% CI 0.75 - 1.0), sensitivity of 92% (95% CI 0.54 - 1.0) at 90% specificity, up to 1 year to diagnosis, and at an AUC of 0.85 (95% CI 0.74 - 0.93) up to 2 years to diagnosis (sensitivity of 61%, 95 % CI 0.17 - 0.83, at 90% specificity). These models also use clinical covariates such as hormone replacement therapy use (at randomization), oral contraceptive pill use (ever) and diabetes and outperform biomarker combinations cited in the literature.
doi:10.1101/2021.12.02.21267187 fatcat:cbyf3wkasnbxrk5ybq75k7c56y