Exploiting Parallel Sentences and Cosine Similarity for Identifying Target Language Translation

Vijay Kumar Sharma, Namita Mittal
2016 Procedia Computer Science  
In recent times, The Internet has become a huge information resource which contains information in multiple languages. Users are not acquainted with all languages and this language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for this language barrier where a user can search the required information in his regional language. In this paper, a CLIR system is proposed based on Parallel Corpus (PC). A set of parallel
more » ... nces are extracted from PC which are based on query words. Term frequency matrix and cosine similarity measure are used for identifying target language translation. The proposed Term Frequency Method (TFM) approach is compared with Probabilistic Lexicon Method (PLM) approach and result analysis shows that proposed TFM approach performs better than the PLM approach.
doi:10.1016/j.procs.2016.06.092 fatcat:qmhsfxlh3rbytdrss4fx5rpema