32 Hits in 0.91 sec

Drexel at TREC 2007: Question Answering

Protima Banerjee, Hyoil Han
2007 Text Retrieval Conference  
The TREC Question Answering Track presented several distinct challenges to participants in 2007. Participants were asked to create a system which discovers the answers to factoid and list questions about people, entities, organizations and events, given both blog and newswire text data sources. In addition, participants were asked to expose interesting information nuggets which exist in the data collection, which were not uncovered by the factoid or list questions. This year is the first time
more » ... e Intelligent Information Processing group at Drexel has participated in the TREC Question Answering Track. As such, our goal was the development of a Question Answering system framework to which future enhancements could be made, and the construction of simple components to populate the framework. The results of our system this year were not significant; our primary accomplishment was the establishment of a baseline system which can be improved upon in 2008 and going forward.
dblp:conf/trec/BanerjeeH07 fatcat:43poz7rvpndznobqti6vbuv2n4

A survey on ontology mapping

Namyoun Choi, Il-Yeol Song, Hyoil Han
2006 SIGMOD record  
Ontology is increasingly seen as a key factor for enabling interoperability across heterogeneous systems and semantic web applications. Ontology mapping is required for combining distributed and heterogeneous ontologies. Developing such ontology mapping has been a core issue of recent ontology research. This paper presents ontology mapping categories, describes the characteristics of each category, compares these characteristics, and surveys tools, systems, and related work based on each
more » ... y of ontology mapping. We believe this paper provides readers with a comprehensive understanding of ontology mapping and points to various research topics about the specific roles of ontology mapping.
doi:10.1145/1168092.1168097 fatcat:we52fyqq2jbgtkfwzo3ybkaoeu


Lawrence Reeve, Hyoil Han, Ari D. Brooks
2006 Proceedings of the 2006 ACM symposium on Applied computing - SAC '06  
Lexical chaining is a technique for identifying semanticallyrelated terms in text. We propose concept chaining to link semantically-related concepts within biomedical text together. The resulting concept chains are then used to identify candidate sentences useful for extraction. The extracted sentences are used to produce a summary of the biomedical text. The concept chaining process is adapted from existing lexical chaining approaches, which focus on chaining semantically-related terms, rather
more » ... than semantically-related concepts. The Unified Medical Language System (UMLS) Metathesaurus and Semantic Network are used as semantic resources. The UMLS MetaMap Transfer tool is used to perform text-to-concept mapping. The goal is to propose concept chaining and develop a novel concept chaining system for the biomedical domain using UMLS lexicon and the ideas of lexical chaining. The resulting concept chains from the full-text are evaluated against the concepts of a human summary (the paper's abstract). Precision is measured at 0.90 and recall at 0.92. The resulting concept chains are used to summarize the text. We also evaluate generated summaries using existing summarization systems using sentence matching, and confirm the generated summaries are useful to a domain expert. Our results show that the proposed concept chaining is a promising methodology for biomedical text summarization.
doi:10.1145/1141277.1141317 dblp:conf/sac/ReeveHB06 fatcat:e4hreo4firfo5e5erqljp4io4i

Automating Computer Science Ontology Extension with Classification Techniques

Natasha C. Santosa, Jun Miyazaki, Hyoil Han
2021 IEEE Access  
In information technology, an ontology is a knowledge structure consisting of the definitions and relations of information within one or even multiple domains. This semantically represented information is helpful for tasks such as document classification and item recommendation in recommender systems. However, as big data prevails, manually extending existing ontologies with up-to-date terminologies becomes challenging due to the tedious and time-consuming process and the expensive cost of
more » ... t manual labor. This study aims to achieve a fully automatic ontology extension. We propose a novel "Direct" approach for extending an existing Computer Science Ontology (CSO). This approach consists of two steps: initially extending the CSO with new topics and using this extended graph to obtain the new topic's node embeddings as inputs for training classifiers. However, this initial extension still contains many noisy links; therefore, the classifier later acts as a filter and a link predictor. We experiment with various traditional machine learning and recent deep learning models and then compare them using our Direct approach. We also propose two evaluation procedures to decide the best-performing model and approach: the novel Wikipedia-based F 1 w score and the total number of resulting links. Furthermore, manual evaluation by four human experts is conducted to conclude the reliability of our proposed approach and evaluation procedure. This study concludes that the Direct approach's Gaussian Naive Bayes model produces the most valid and reliable links, and we, therefore, use it to further extend the CSO with hundreds of new CS topics and links.
doi:10.1109/access.2021.3131627 fatcat:cxgkerhggrafvmd7rc4vj4tzgq

Biomedical question answering: A survey

Sofia J. Athenikos, Hyoil Han
2010 Computer Methods and Programs in Biomedicine  
Objectives: In this survey, we reviewed the current state of the art in biomedical QA (Question Answering), within a broader framework of semantic knowledge-based QA approaches, and projected directions for the future research development in this critical area of intersection between Artificial Intelligence, Information Retrieval, and Biomedical Informatics. Materials and methods: We devised a conceptual framework within which to categorize current QA approaches. In particular, we used
more » ... knowledge-based QA" as a category under which to subsume QA techniques and approaches, both corpus-based and knowledge base (KB)-based, that utilize semantic knowledge-informed techniques in the QA process, and we further classified those approaches into three subcategories: (1) semantics-based, (2) inference-based, and (3) logic-based. Based on the framework, we first conducted a survey of open-domain or non-biomedical-domain QA approaches that belong to each of the three subcategories. We then conducted an in-depth review of biomedical QA, by first noting the characteristics of, and resources available for, biomedical QA and then reviewing medical QA approaches and biological QA approaches, in turn. The research articles reviewed in this paper were found and selected through online searches. Results: Our review suggested the following tasks ahead for the future research development in this area: (1) Construction of domain-specific typology and taxonomy of questions (biological QA), (2) Development of more sophisticated techniques for natural language (NL) question analysis and classification, (3) Development of effective methods for answer generation from potentially conflicting evidences, (4) More extensive and integrated utilization of semantic knowledge throughout the QA process, and (5) Incorporation of logic and reasoning mechanisms for answer inference. Conclusion: Corresponding to the growth of biomedical information, there is a growing need for QA systems that can help users better utilize the ever-accumulating information. Continued research toward development of more sophisticated techniques for processing NL text, for utilizing semantic knowledge, and for incorporating logic and reasoning mechanisms, will lead to more useful QA systems. (S.J. Athenikos). cise answers to their questions, by employing Information Extraction (IE) and Natural Language Processing (NLP) techniques, instead of providing a large number of documents that are potentially relevant for the questions posed by the inquirers. As such, QA is regarded as involving the most 0169-2607/$ -see front matter
doi:10.1016/j.cmpb.2009.10.003 pmid:19913938 fatcat:tv4sel4llbcnjppnn7g2tygqze

Survey of semantic annotation platforms

Lawrence Reeve, Hyoil Han
2005 Proceedings of the 2005 ACM symposium on Applied computing - SAC '05  
The realization of the Semantic Web requires the widespread availability of semantic annotations for existing and new documents on the Web. Semantic annotations are to tag ontology class instance data and map it into ontology classes. The fully automatic creation of semantic annotations is an unsolved problem. Instead, current systems focus on the semi-automatic creation of annotations. The Semantic Web also requires facilities for the storage of annotations and ontologies, user interfaces,
more » ... ss APIs, and other features to fully support annotation usage. This paper examines current Semantic Web annotation platforms that provide annotation and related services, and reviews their architecture, approaches and performance.
doi:10.1145/1066677.1067049 dblp:conf/sac/ReeveH05 fatcat:e7bwfayhljbavpd4afpnqbfwke

Semantically enhanced user modeling

Palakorn Achananuparp, Hyoil Han, Olfa Nasraoui, Roberta Johnson
2007 Proceedings of the 2007 ACM symposium on Applied computing - SAC '07  
Content-based implicit user modeling techniques usually employ traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in vector space model, a simple term vector is not a sufficient representation of the user model as it ignores the semantic relations between terms. In this paper, we present a novel method to enhance a traditional term-based user model with the WordNet-based semantic similarity techniques. To achieve this, we utilize
more » ... d definitions and relationship hierarchies in WordNet to perform word sense disambiguation and employ domain-specific concepts as category labels for the derived user models. We tested our method on Windows to the Universe, a public educational website covering subjects in the Earth and Space Science and performed an evaluation of our semantically enhanced user models against human judgment. Our approach is distinguishable from existing work because we automatically narrow down the set of domain specific concepts from an initial domain concepts obtained from Wikipedia and because we automatically create semantically enhanced user model.
doi:10.1145/1244002.1244291 dblp:conf/sac/AchananuparpHNJ07 fatcat:64p7u2hmuffkrokf2pdwksvelm

Relation-Based Document Retrieval for Biomedical Literature Databases [chapter]

Xiaohua Zhou, Xiaohua Hu, Xia Lin, Hyoil Han, Xiaodan Zhang
2006 Lecture Notes in Computer Science  
In this paper, we explore the use of term relations in information retrieval for precision-focused biomedical literature search. A relation is defined as a pair of two terms which are semantically and syntactically related to each other. Unlike the traditional "bag-of-word" model for documents, our model represents a document by a set of sense-disambiguated terms and their binary relations. Since document level co-occurrence of two terms, in many cases, does not mean this document addresses
more » ... r relationship s, the direct use of relation may improve the precision of very specific search, e.g. searching documents that mention genes regulated by Smad4. For this purpose, we develop a generic ontology-based approach to extract terms and their relations; a prototyped IR system supporting relation-based search is then built for Medline abstract search. We then use this novel IR system to improve the retrieval result of all official runs in TREC-2004 Genomics Track. The experiment shows promising performance of relation-based IR. The mean of P@100 (the precision of top 100 documents) for all 50 topics is raised from 26.37 %( the P@100 of the best run is 42.10%) to 5 3.69% while the recall is kept at an acceptable level of 44.31%. The experiment also shows the expressiveness of relations for the representation of information needs, especially in the area of biomedical literature full of various biological relations. Rule for entity-attribute relation: term1 preposition term2 Example: Obesity is an independent risk factor (term1) for periodontal disease (term2).
doi:10.1007/11733836_48 fatcat:lnj42gn3ajdbhgpbkpbcwr7ema

XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses [chapter]

Byung-Kwon Park, Hyoil Han, Il-Yeol Song
2005 Lecture Notes in Computer Science  
Recently, a large number of XML documents are available on the Internet. This trend motivated many researchers to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression
more » ... ge for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.
doi:10.1007/11546849_4 fatcat:hwwgbbrjwzhuhh3yns3fburfoe

Converting Semi-structured Clinical Medical Records into Information and Knowledge

Xiaohua Zhou, Hyoil Han, I. Chankai, A.A. Prestrud, A.D. Brooks
2005 21st International Conference on Data Engineering Workshops (ICDEW'05)  
Clinical medical records contain a wealth of information, largely in free-textual form. Thus, means to extract structured information from free-text records becomes an important research endeavor. In this paper, we propose and implement an information extraction system that extracts three types of information -numeric values, medical terms and categorical value -from semi-structured patient records. Three approaches are proposed to solve the problems posed by each of the three types of values,
more » ... espectively, and very good performance (precision and recall) is achieved. A novel link-grammar based approach was invented to associate feature and number in a sentence, and extremely high accuracy was achieved. A simple but efficient approach, using POS-based pattern and domain ontology, was adopted to extract medical terms of interest. Finally, an NLPbased feature extraction method coupled with an ID3based decision tree is used to classify and extract categorical cases. This preliminary approach to categorical fields has, so far, proven to be quite effective.
doi:10.1109/icde.2005.207 dblp:conf/icde/ZhouHCPB05 fatcat:4e4sprbwo5abtouswyglnfpaae

Concept frequency distribution in biomedical text summarization

Lawrence H. Reeve, Hyoil Han, Saya V. Nagori, Jonathan C. Yang, Tamara A. Schwimmer, Ari D. Brooks
2006 Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06  
Text summarization is a data reduction process. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. Our contribution is
more » ... o-fold: 1) to propose the frequency of domain concepts as a method to identify important sentences within a full-text; and 2) propose a novel frequency distribution model and algorithm for identifying important sentences based on term or concept frequency distribution. An evaluation of several existing summarization systems using biomedical texts is presented in order to determine a performance baseline. For domain concept comparison, a recent high-performing frequency-based algorithm using terms is adapted to use concepts and evaluated using both terms and concepts. It is shown that the use of concepts performs closely with the use of terms for sentence selection. Our proposed frequency distribution model and algorithm outperforms a stateof-the-art approach.
doi:10.1145/1183614.1183701 dblp:conf/cikm/ReeveHNYSB06 fatcat:756fs5inq5hzve6x4viugnzfhq

A Comparative Study on Optimization, Obfuscation, and Deobfuscation tools in Android

Geunha You, Gyoosik Kim, Seong-je Cho, Hyoil Han
2021 Journal of Internet Services and Information Security  
the characteristics of the four tools and compare their performance by performing experiments. 2 A Comparative Study on Optimization, Obfuscation, and Deobfuscation Tools in Android You, Kim, Cho and Han  ...  That is, its A Comparative Study on Optimization, Obfuscation, and Deobfuscation Tools in Android You, Kim, Cho and Han outputs are incomplete .jar files.  ...  R8 is a tool for both optimization and A Comparative Study on Optimization, Obfuscation, and Deobfuscation Tools in Android You, Kim, Cho and Han obfuscation.  ... 
doi:10.22667/jisis.2021.02.28.002 dblp:journals/jisis/YouKCH21 fatcat:wfnipslpgvgwvhmyxkulvxcq4i

A Generic Framework: From Clinical Notes to Electronic Medical Records

Hyoil Han, Yoori Choi, Yoo Myung Choi, Xiaohua Zhou, A.D. Brooks
2006 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)  
Electronic Medical Records are important to manage health data and save lives to improve the quality of service in hospitals. Clinical medical records contain a wealth of information, largely in free-text form. This paper proposes a generic framework to semi-automatically extract and mine data from clinical note, automatically learn patterns for each physician's clinical notes, and automatically populate EMR databases for multi users. In this paper, we also develop a web-based system with a
more » ... tional database to automatically store data from MEDical Information Extraction (MedIE) system that extracts and mines a variety of patient information with breast complaints from semistructured clinical records.
doi:10.1109/cbms.2006.13 dblp:conf/cbms/HanCCZB06 fatcat:3v5ecclo6ve27myk57dlqwcg2i

A framework of a logic-based question-answering system for the medical domain (LOQAS-Med)

Sofia J. Athenikos, Hyoil Han, Ari D. Brooks
2009 Proceedings of the 2009 ACM symposium on Applied Computing - SAC '09  
Question-answering systems that provide precise answers to questions, by combining techniques for information retrieval, information extraction, and natural language processing, are seen as the next-generation search engines. Due to the growth and realworld impact of biomedical information, the need for questionanswering systems that can aid medical researchers and health care professionals in their information search is acutely felt. In order to provide users with accurate answers, such
more » ... need to go beyond lexico-syntactic analysis to semantic analysis and processing of texts and knowledge resources. Moreover, questionanswering systems equipped with reasoning capabilities can derive more adequate answers by using inference. Research on question answering in the medical and health care domain is still in its inception stage. While several recent approaches to medical question answering have explored use of semantic knowledge, few approaches have exploited the utility of logic formalisms and of inference mechanisms. In this paper, we present a framework for a logic-based question-answering system for the medical domain, which uses Description Logic as the formalism for knowledge representation and reasoning. As a first step toward building the proposed system, we present semantic analysis and classification of medical questions.
doi:10.1145/1529282.1529462 dblp:conf/sac/AthenikosHB09 fatcat:6w7tkssx3rh6dmzm5bzopnzxk4

Answer credibility

Protima Banerjee, Hyoil Han
2009 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers on - NAACL '09   unpublished
Answer Validation is a topic of significant interest within the Question Answering community. In this paper, we propose the use of language modeling methodologies for Answer Validation, using corpus-based methods that do not require the use of external sources. Specifically, we propose a model for Answer Credibility which quantifies the reliability of a source document that contains a candidate answer and the Question's Context Model. 157
doi:10.3115/1620853.1620897 fatcat:gdhkevlbhvbx3gzevbbkjhfrga
« Previous Showing results 1 — 15 out of 32 results