Use of ontology structure and Bayesian models to aid the crowdsourcing of ICD-11 sanctioning rules

Yun Lou, Samson W. Tu, Csongor Nyulas, Tania Tudorache, Robert J.G. Chalmers, Mark A. Musen
2017 Journal of Biomedical Informatics  
The International Classification of Diseases (ICD) is the de facto standard international classification for mortality reporting and for many epidemiological, clinical, and financial use cases. The next version of ICD, ICD-11, will be submitted for approval by the World Health Assembly in 2018. Unlike previous versions of ICD, where coders mostly select single codes from pre-enumerated disease and disorder codes, ICD-11 coding will allow extensive use of multiple codes to give more detailed
more » ... ase descriptions. For example, "severe malignant neoplasms of left breast" may be coded using the combination of a "stem code" (e.g., code for malignant neoplasms of breast) with a variety of "extension codes" (e.g., codes for laterality and severity). The use of multiple codes (a process called post-coordination), while avoiding the pitfall of having to pre-enumerate vast number of possible disease and qualifier combinations, risks the creation of meaningless expressions that combine stem codes with inappropriate qualifiers. To prevent that from happening, "sanctioning rules" that define legal combinations are necessary. In this work, we developed a crowdsourcing method for obtaining sanctioning rules for the post-coordination of concepts in ICD-11. Our method utilized the hierarchical structures in the domain to improve the accuracy of the sanctioning rules and to lower the crowdsourcing cost. We used Bayesian networks to model crowd workers' skills, the accuracy of their responses, and our confidence in the acquired sanctioning rules. We applied reinforcement learning to develop an agent that constantly adjusted the confidence cutoffs during the crowdsourcing process to maximize the overall quality of sanctioning rules under a fixed budget. Finally, we performed formative evaluations using a skin-disease branch of the draft ICD-11 and demonstrated that the crowd-sourced sanctioning rules replicated those defined by an expert dermatologist with high precision and recall. This work demonstrated that a crowdsourcing approach could offer a reasonably efficient method for generating a first draft of sanctioning rules that subject matter experts could verify and edit, thus relieving them of the tedium and cost of formulating the initial set of rules. Graphical Abstract Highlights • We defined crowdsourcing microtasks to obtain ICD-11 sanctioning rules. • We used hierarchical structures to improve the efficiency of crowdsourcing. • We used Bayesian networks to model our confidence in the acquired sanctioning rules. • We developed a method to maximize the quality of the rules within a fixed budget. Highlights • We defined crowdsourcing microtasks to obtain ICD-11 sanctioning rules. • We used hierarchical structures to improve the efficiency of crowdsourcing. • We used Bayesian networks to model our confidence in the acquired sanctioning rules. • We developed a method to maximize the quality of the rules within a fixed budget.
doi:10.1016/j.jbi.2017.02.004 pmid:28192233 pmcid:PMC5428551 fatcat:mysixlvfune6llzox6cnigux7u