Developing a Diagnostic Decision Support Tool With Machine Learning Classification Algorithms to Improve Breast Cancer Screening: A Cross-Sectional Study on Iranian Women
Background: Improper patient navigation and follow-up measures hamper breast cancer screening programs. To augment existing programs, we aimed to develop a decision support system for early breast cancer detection, by training and validating machine learning classification algorithms on routinely available patient data.Methods: Data were collected prospectively from eligible consenting women who visited a single university affiliated center in Tehran, Iran, during a two-year period. We selected
... period. We selected 17 features from patient demographics, history, clinical examination and screening imaging. Breast cancer diagnosis was assessed one year after initial data collection. Positive outcomes where confirmed with tissue biopsy. Six supervised machine learning classification algorithms (including two artificial neural networks) were trained on 743 cases. Odds ratios were calculated using logistic regression.Results: 34% of participants were diagnosed with breast cancer. Highest adjusted odds ratios (95%CI) belonged to ultrasound: 24.8 (12.4,52.0) and mammography: 21.7 (8.8,58.5). When evaluated on all patients, random forest model possessed the highest AUC (95%CI) of 0.98 (0.97,0.99). The results of 10-fold stratified cross-validation supported model stability. Based on the mean of ten validation iterations, random forest provided the highest accuracy (93.3%) sensitivity (91.9%) and NPV (96.2%). K-nearest-neighbors model provided the highest specificity (95.9%) and PPV (91.9%).Conclusions: Machine learning models trained on basic demographics, history, clinical examination and breast screening imaging can predict breast cancer accurately. Such decision support tools when added to existing programs can boost the effectiveness of screening measures. Implementation ultimately depends on future works which will focus on external validation, interface development and cost-effectiveness analysis.