PRIS at 2012 TREC Medical Track: Query Expansion, Retrieval and Ranking
Text Retrieval Conference
The official datasets are XML format so we have to parse them before indexing. We choose Lucene as our tool for indexing and searching ,we select the Jakarta-commons-Digester (the following we referred to as digester) to parse the xml documents. The xml document is processed by the Digester to be a java object and then we can get the fields that we would use from the java object .In addition, we also process the tag "report_text" in the xml documents so that we can get the age and sexuality
... rmation which are very important fields for searching task. Negation Detection People always find some phrases like "did not have head pain" or "there is no pain in your leg"in the medical diagnosis reports .These phrases will make some boring troubles in the medical text retrieval. For example, when we want to find someone who have a headache we may get a report like this This patient is a**AGE[in 50s]-year-old male with a past medical history of multiple transplants including small bowel, liver, and pancreas in 1998 and status post kidney transplant in 2006, presents with fever. The patient states he woke this morning and thought to have fevers and chills. He also has had some vomiting and diarrhea. Denies any belly pain. He states he feels a little short of breath. He denies any chest pain. No sore throat. No headache..... In fact, this patient just has fevers and chills. To solve this problem, we use the famous NegEx algorithm .NegEx  algorithm is mostly known to Text Mining researchers for finding terms used in negative senses. While, there is a java class to implement Wendy Chapman's NegEx algorithm. This class' author is Junebae Kye .On the base of this class, we write a program to finish the negation detection work and the result show us that this method takes us a better performance. 2 Indexing Model main component is a search engine based on Apache Lucene. Lucene is a powerful Java library that lets you easily add document retrieval to any application. In recent years Lucene has become exceptionally popular and is now the most widely used information retrieval library We utilized Lucene for indexing purpose. Lucene provided the function to achieve this goal. Documents and fields are Lucene's fundamental units of indexing and searching. A document is Lucene's atomic unit of indexing and searching. It is a container that holds one or more fields, which in turn contain the real content. Each field has a name to identify it, a text or binary value, and a series of detailed options that describe what Lucene should do with the field value. We use the "age","sex","icd9 code"...as the fields to build the index. This process is not very difficult.