Using text mining of FDA reports to inform early signal detection of cardiovascular lead recalls [article]

Lisa Garnsey Ensign
The Food and Drug Administration (FDA) 's Manufacturer and User Facility Device Experience (MAUDE) spontaneous reporting systems (SRS) database provides information about product-related adverse events and problems. In the last decade, MAUDE reports and recalls of medical devices that pose a significant risk of serious injury or death have increased dramatically. With over 70,000 text-based MAUDE reports submitted per month, new computational approaches are needed to aid in the earlier
more » ... ation of possible product problems. This retrospective cohort study utilized a novel, sequential combination of text and data mining methodologies to evaluate the primary hypothesis that information in MAUDE adverse event text descriptions provides an early signal of product problems in advance of Class I implantable cardioverter-defibrillator (ICD) lead recalls. Secondary hypotheses explored the model's predictive ability for: 1) recalled medical devices that are increasingly differentiated from ICD leads; 2) forecasting less serious ICD lead recalls. Sensitivity analyses examined the influence of reports for particular ICD lead brands and the effect of using cumulative cohorts over time to build the text classification models. Early recognition of ICD lead problems could have a profound impact on patients who experience inappropriate shocks, loss of pacing, failure of defibrillation, or death due iv to ICD lead failure. Strong signals of the Riata and Riata ST lead recalls were observed 18 months in advance of their recall and 6 months before their removal from distribution. Modest signals were detected 2.5 years prior to the Sprint Fidelis and 4 years in advance of the Riata/Riata ST recalls. Signals were also observed 2.5 years before an FDA mandate placing two other non-recalled leads under additional surveillance. The developed model was able to differentiate between more and less serious ICD lead recalls and showed good performance across a range of different devices. Sensitivity analyses proved useful in identifying lead-specific patterns and may suggest the model's responsiveness to different ICD lead failure modes. This work is likely the first research to use disproportionality analysis techniques to systematically assess temporal trends in SRS predictions made from a text classification algorithm. The results from this study suggest expansion to other products. The form and content of this abstract are approved. I recommend its publication. Approved: Kevin Bretonnel Cohen v DEDICATIONS This work would not have been possible without the guidance, help, love and support of a number of individuals, most of whom will never know just how important they were to the ultimate success of this effort. I will be eternally grateful to my parents, Rosalee and Stephen Garnsey, for instilling in me the importance of education and the joy of life-long learning. I might never have considered my first foray into graduate studies, nor the pursuit of this degree at a much later time in my life, if I had not been raised in a household with a mother who worked as a teacher until I came on the scene, and a father who has had a long and storied career as a citrus virologist after earning his doctorate. Dad, your insights about my academic strengths not only introduced me to the application of analytics to medical data, but were instrumental in helping me think about how to effectively lead readers through this current work. I have grown to appreciate what I was too young to recognize as a child and young adult -the importance of a parent who was the first to sympathize with my struggles, to celebrate my successes, and who was always among my loudest and proudest cheerleaders. Mom, you are that person, and I have cherished your positivity throughout this journey. My dissertation topic intersected many different disciplines, and I was extremely fortunate to find a committee whose collective breadth of expertise facilitated my understanding of a number of areas critical to this work. My chair, Heather Haugen, not only directs the Clinical Science Health Information Technology tract, but was a role-model throughout my time in the program of someone successfully straddling academic and vi private sector pursuits. Heather helped me to navigate the sometimes murky waters of program and university requirements, committee selection, and team dynamics. Her support throughout this endeavor will always be appreciated. Larry Hunter, Michael Kahn are Paul Varosy are recognized experts in their fields, and I was humbled and delighted that they agreed to serve on my committee. Larry's insights into the mechanics and nuances of machine learning were unfailing. Not only could I always rely on Michael to take the time to thoroughly review my work, but his questions and intuitions led me to flesh out a number of critical areas. Paul's experience using and studying ICD leads provided the critical clinical perspective necessary to put the findings of this work in context. His enthusiasm for the importance of this work and appreciation of my desire to tackle new horizons will never be forgotten. Finally, I was indeed fortunate to have Kevin Cohen as my advisor and mentor. Kevin not only guided my first deep dive into natural language processing by way of a summer independent study, but his support as I embarked upon my comprehensive exams, and then through the detailed analytical work of my research were paramount to my success. I will always be thankful for the story that Kevin shared from his own dissertation experience that helped me to systematically think my way through and overcome a lastminute obstacle in my own research. Perhaps most importantly, as I work on future papers and presentations, Kevin's guidance on how to promote my main message will always me in the back of my mind. To Steve Ross -thank you for introducing me to the world of health information technology, to the subject of natural language processing, and for being someone I could vii confide in when I was deparately in need of advice from outside the group directly involved in this work. To my brother, Mike -I treasure our relationship and your counsel and humor. WWKD? -Leap into the unknown and ROAR!!!!!! Finally, I dedicate this dissertation to the individual who is, and has been, my love and soul mate for over a quarter of a century. Phil -thank you for your enthusiasm in my decision to return to school; for your insights into how to tackle and present some of the most difficult concepts in this work; and most importantly, for your constant love and unwavering belief that I was up to this challenge. Without you by my side, I would not have made it to the finish line ... with you, I am so proud of what I -rather, we -were able to accomplish. viii xxvii LIST OF DEFINITIONS Class I Medical Device: a low-risk medical device, such as a tongue depressor, stethoscope or adhesive bandage. Class II Medical Device: a medium-risk medical device, such as an intravenous catheter, powered wheelchair, contact lens, condom or home pregnancy test. Class III Medical Device: a high-risk medical device typically defined as a product that sustains or supports life, is implanted, or that presents an unreasonable risk of illness or injury if it were to fail. Examples include pacemakers/ICDs, breast implants, heart valves, and heart stents. Class I Recall: a product withdrawal due to a reasonable probability of serious adverse health consequences or death with use or exposure to the product. Class II Recall: a product withdrawal due to the risk of temporary or medically reversible or remotely occurring serious adverse health consequences with use of or exposure to a product. Class III Recall: a product released in violation of FDA regulations, but posing no immediate or perceived danger of adverse health consequences. Data Mining: statistical and computational techniques used to discover patterns in large datasets. Machine Learning: the application of computer algorithms that iteratively learn from data. Natural Language Processing (NLP): using computational techniques for analyzing natural texts for the purpose of achieving human-like language processing. Ontology: a taxonomy representing the concepts and relationships of objects or expressions. RStudio: open source statistical computing environment. SAS: a suite of software programs used for data management and analysis. Signal: a relationship between a product and adverse event that is strong enough to warrant further evaluation. Sudden Cardiac Death: unexpected death due to malfunction of the electrical system to the heart death becoming suddenly irregular, causing loss of heart function. xxviii Text Mining: extracting information and patterns from unstructured (free form) text documents. Weka: open source machine learning software written in Java and developed at the
doi:10.25677/myd5-jw86 fatcat:n7bk5awzxncenetu6ba7a37eye