Data mining and classification of polycystic ovaries in pelvic ultrasound reports [article]

Jay Jojo Cheng, Shruthi Mahalingaiah
2018 bioRxiv   pre-print
Objectives: To develop and evaluate the performance of a rules-based classifier and a gradient boosted tree model for automatic feature extraction and classification of polycystic ovary morphology (PCOM) in pelvic ultrasounds. Methods: Pelvic ultrasound reports from patients at Boston Medical Center between October 1, 2003 and December 12, 2016 were included for analysis, which resulted in 39,093 ultrasound reports from 25,535 unique women. Following the 2003 Rotterdam Consensus Criteria for
more » ... sus Criteria for polycystic ovary syndrome, 2000 randomly selected ultrasounds were manually labeled for PCOM status as present, absent, or unidentifiable. Half of the labeled data was used as a training set, and the other half was used as a test set. Results: On the test set of 1000 random US reports, the accuracy of rules-based classifier (RBC) was 97.6% (95% CI: 96.5%, 98.5%) and 96.1% (94.7%, 97.2%) for the gradient boosted tree model (GBT). Both models were more adept at identifying non-PCOM ultrasounds than either unidentifiable or PCOM ultrasounds. The two classifiers estimated prevalence of PCOM within our population's ultrasounds to be about 44%, unidentifiable 32%, and PCOM 24%. Conclusions: Although accuracy measured on the test set and inter-rater agreement between the two classifiers (Cohen's Kappa = 0.988) was high, a major limitation of our approach is that it uses the ultrasound report text as a proxy and does not directly count follicles from the ultrasound images themselves.
doi:10.1101/254870 fatcat:ekffcpo7cnfxjf4joqpx4zzfhe