A synergistic strategy for combining thesaurus-based and corpus-based approaches in building ontology for multilingual search engines

Leyla Zhuhadar
2016 Figshare  
In this article we illustrate a methodology for building cross-language search engine. A synergisticapproach between thesaurus-based approach and corpus-based approach is proposed. First, a bilingualontology thesaurus is designed with respect to two languages: English and Spanish, where a simplebilingual listing of terms, phrases, concepts, and subconcepts is built. Second, term vector translation isused – a statistical multilingual text retrieval techniques that maps statistical information
more » ... ut termuse between languages (Ontology co-learning). These techniques map sets of t f id f term weights fromone language to another. We also applied a query translation method to retrieve multilingual documentswith an expansion technique for phrasal translation. Finally, we present our findings.
doi:10.6084/m9.figshare.3423686.v1 fatcat:h7un4wlxyfg67avecgk7ub36qu