Filters








798 Hits in 3.4 sec

Extracting an English-Persian Parallel Corpus from Comparable Corpora [article]

Akbar Karimi, Ebrahim Ansari, Bahram Sadeghi Bigham
2019 arXiv   pre-print
In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wikipedia.  ...  Two machine translation systems are employed to translate from Persian to English and the reverse after which an IR system is used to measure the similarity of the translated sentences.  ...  result in extracting better equivalents from the comparable corpus.  ... 
arXiv:1711.00681v3 fatcat:535ajkwoffb53byuzd6wekuahu

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora [article]

Ebrahim Ansari, M.H. Sadreddini, Mostafa Sheikhalishahi, Richard Wallace, Fatemeh Alimardani
2017 arXiv   pre-print
In this paper, a novel approach is presented to extract bilingual Persian-Italian parallel sentences from a non-parallel (comparable) corpus.  ...  For low-resource language pairs there are not enough parallel corpora to build an accurate SMT.  ...  Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora  ... 
arXiv:1701.08339v1 fatcat:fb34owbdx5asvamf2iq5k53cg4

Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries [article]

Ebrahim Ansari, M.H. Sadreddini, Lucio Grandinetti, Mahsa Radinmehr, Ziba Khosravan, Mehdi Sheikhalishahi
2019 arXiv   pre-print
In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed.  ...  In this paper, we discuss the use of different types of dictionaries as the initial starting list for creating a bilingual Persian-Italian lexicon from a comparable corpus.  ...  Extracting persian-english parallel sentences from document level aligned comparable corpus using bi-directional translation. ACSIJ Advances in Computer Science: an International Journal 3(11):59-65.  ... 
arXiv:1701.08340v2 fatcat:3hg6tz3vwrbn3fky6ynibjlepu

Constructing a Large-Scale English-Persian Parallel Corpus

Tayebeh Mosavi Miangah
2009 Meta : Journal des traducteurs  
However, there are many problems in extracting parallel corpora in English and Persian from the Web, such as the following.  ...  It is parallel corpora from which empirical data are extracted for study.  ... 
doi:10.7202/029804ar fatcat:xggtax6oevd45kcap7q47tpuji

A hierarchical phrase-based model for English-Persian statistical machine translation

Mahsa Mohaghegh, Abdolhossein Sarrafzadeh
2012 2012 International Conference on Innovations in Information Technology (IIT)  
In this paper we show that a hierarchical phrasebased translation system will outperform a classical (nonhierarchical) phrase-based system in the English-to-Persian translation direction, yet for the Persian-to-English  ...  We seek to explain why this is so, and detail a series of translation experiments with our SMT system using various bilingual corpora each with both toolkits Moses (non-hierarchical) and Joshua (hierarchical  ...  aligned parallel corpus without the need to extract an SCFG prior to decoding.  ... 
doi:10.1109/innovations.2012.6207733 fatcat:tdketlhnd5afhmnf6iez4fbmem

A Method for Cross-Language Retrieval of Chunks Using Monolingual and Bilingual Corpora

Tayebeh Mosavi Miangah, Amin Nezarat
2010 International Journal of Computer Applications  
equivalent of this Persian chunk in English using the English-Persian bilingual parallel corpus.  ...  Bilingual Parallel Corpus The English-Persian parallel corpus has been compiled as a bilingual textual database consisting of aligned original English texts and their translations into Persian, and of  ... 
doi:10.5120/1518-1902 fatcat:qf4mdwxduvfcpl2mjpytoj4jye

Creating a Persian-English Comparable Corpus [chapter]

Homa Baradaran Hashemi, Azadeh Shakery, Heshaam Faili
2010 Lecture Notes in Computer Science  
In this study, we build a Persian-English comparable corpus from two independent news collections: BBC News in English and Hamshahri news in Persian.  ...  Evaluation results show the high quality of the aligned documents and using the Persian-English comparable corpus for extracting translation knowledge seems promising.  ...  Current available Persian-English corpora are the Miangah's English-Persian parallel corpus [13] consisting of 4,860,000 words, Tehran English-Persian parallel corpus composed of 612,086 bilingual sentences  ... 
doi:10.1007/978-3-642-15998-5_5 fatcat:dpbwztz4z5axtimn77bqtktghy

A Novel Method for Cross-Language Retrieval of Chunks Using Monolingual and Bilingual Corpora [chapter]

Tayebeh Mosavi Miangah, Amin Nezarat
2010 Communications in Computer and Information Science  
equivalent of this Persian chunk in English using the English-Persian bilingual parallel corpus.  ...  Bilingual Parallel Corpus The English-Persian parallel corpus has been compiled as a bilingual textual database consisting of aligned original English texts and their translations into Persian, and of  ... 
doi:10.1007/978-3-642-15766-0_45 fatcat:j2e7aoc6obbt3hkjvfvw6dhftq

Learning to Exploit Different Translation Resources for Cross Language Information Retrieval [article]

Hosein Azarbonyad, Azadeh Shakery, Heshaam Faili
2014 arXiv   pre-print
To evaluate the proposed method we do English-Persian CLIR, in which we employ the translation ranking model to find translations of English queries and employ the translations to retrieve Persian documents  ...  We use the contextual information contained in translation resources for extracting context based features.The proposed method uses LTR to construct a translation ranking model based on defined features  ...  We use two parallel corpora: TEP [4] and 20M, UTPECC comparable corpus version 2.0 [5] and a bilingual English-Persian dictionary as the resources of extracting features.  ... 
arXiv:1405.5447v1 fatcat:ry5ytkil35dxjoo53nhs6gpbri

Presenting an Optimal Method for Constructing an English-Persian Comparable Corpus

Seyede Roya Mohammadi
2016 International Journal of Intelligent Information Systems  
We built a Persian-English comparable corpus from two independent news collections: BBC news in English and Hamshahri news in Persian.  ...  One of these corpora's is comparable corpus.  ...  However, parallel corpora have small volume and are strictly limited to the language and scope. In addition, there is a few number of Persian-English parallel corpora.  ... 
doi:10.11648/j.ijiis.20160503.12 fatcat:wigtznqatjhabetj2idtf4xbjq

MIZAN: A Large Persian-English Parallel Corpus [article]

Omid Kashefi
2020 arXiv   pre-print
Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature.  ...  One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora.  ...  Acknowledgments This work was supported by a grant from Supreme Council of Information and Communication Technology (SCICT) to the School of Computer Engineering at the Iran University of Science and Technology  ... 
arXiv:1801.02107v3 fatcat:g555w6ikeng3xkx3uguj347qrm

TEP: Tehran English-Persian Parallel Corpus [chapter]

Mohammad Taher Pilevar, Heshaam Faili, Abdol Hamid Pilevar
2011 Lecture Notes in Computer Science  
In this paper, the construction process of Tehran English-Persian parallel corpus (TEP) using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence  ...  To the best of our knowledge, TEP has been the first freely released large-scale (in order of million words) English-Persian parallel corpus.  ...  We hope that our work would bring about more efforts to develop large-scale parallel corpora for Persian language.  ... 
doi:10.1007/978-3-642-19437-5_6 fatcat:3bhjoq3vgrc6xep57qkpbxsfpy

An overview of the challenges and progress in PeEn-SMT: First large scale Persian-English SMT system

Mahsa Mohaghegh, Abdolhossein Sarrafzadeh
2011 2011 International Conference on Innovations in Information Technology  
We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus content, and how matching the domains of PeEn-SMT's components affect translation  ...  We show how one combination of corpora gave us a metric score outperforming Google Translate for the English-to-Persian translation.  ...  Experiments To develop a translation model, an English-Persian parallel corpus was built as explained in Section B Data Development.  ... 
doi:10.1109/innovations.2011.5893841 fatcat:7hr7o5frbfe2ljh3icoqrb7qqi

Employing Pivot Language Technique Through Statistical and Neural Machine Translation Frameworks : The Case of Under-Resourced Persian-Spanish Language Pair

Benyamin Ahmadnia, Javier Serrano
2017 International Journal on Natural Language Computing  
During our experiments on the Persian-Spanish, taken as an under-resourced translation task, we discovered that, the aforementioned method, in both frameworks, significantly improves the translation quality  ...  quality of Neural Machine Translation (NMT) systems like Statistical Machine Translation (SMT) systems, heavily depends on the size of training data set, while for some pairs of languages, high-quality parallel  ...  We have benefited from her erudition and thoughtful comments which truly enriched this article.  ... 
doi:10.5121/ijnlc.2017.6503 fatcat:eczs5qx6tjexndic3chn7ctvga

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRANSLATION FRAMEWORKS: THE CASE OF UNDER-RESOURCED PERSIAN-SPANISH LANGUAGE PAIR

Benyamin Ahmadnia
2019 Zenodo  
During our experiments on the Persian-Spanish, taken as an under-resourced translation task, we discovered that, the aforementioned method, in both frameworks, significantly improves the translation quality  ...  quality of Neural Machine Translation (NMT) systems like Statistical Machine Translation (SMT) systems, heavily depends on the size of training data set, while for some pairs of languages, high-quality parallel  ...  We have benefited from her erudition and thoughtful comments which truly enriched this article.  ... 
doi:10.5281/zenodo.3578007 fatcat:nc4jku2gljbpzn2n3xo5bjjbfy
« Previous Showing results 1 — 15 out of 798 results