Information Retrieval Systems [book]

Gerald J. Kowalski
1997 The Information Retrieval Series  
The Steiermärkische Krankenanstalten Ges.m.b.H. (KAGes) conducted the roll-out of an electronic patient record (EPR) system in 2004. This system contains an increasing amount of unstructured clinical text documents in German language. In order to facilitate the patient-related medical decision-making for physicians, this diploma thesis analyses and implements methods retrieving relevant medical information from these documents and methods for automatically classifying clinical text documents
more » ... l text documents into medical field categories. In the theoretical part of this thesis, techniques for indexing and the retrieval of relevant information in textual documents have been presented. Additionally, an approach, based on machine learning, for creating metadata using automated multilabel document classification has been investigated. In the practical part, a design approach for a medical information retrieval system (MIRS) has been developed and selected components of the model have been implemented as a first prototype. The model is based on J2EE technologies and several open source frameworks like Apache Lucene and WEKA. The prototype has been evaluated based on an extracted sample of 18,000 clinical text documents from the EPR system of the KAGes. Multi-label document classification in medical field categories achieved a F 1 -measure of 0.886. The results are comparable to the results of published studies and have been accepted for poster presentation on Medinfo2007 congress. The created metadata has been used in order to find patient-related information within the unstructured clinical text documents more easily. Finally, a sample application of the prototype has been illustrated in order to prove functionality. vii
doi:10.1007/b102478 dblp:series/irs/Kowalski97 fatcat:bv4a2etodnfqtpxurhyvopq76a