190 Hits in 4.3 sec

A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora [chapter]

Jong-Hoon Oh, Key-Sun Choi, Hitoshi Isahara
2006 Lecture Notes in Computer Science  
Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents.  ...  Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations.  ...  "Transliteration equivalent" refers to a set of transliteration pairs that originate from the same foreign word. Note that most Korean transliterations generally originate from English words.  ... 
doi:10.1007/11846406_15 fatcat:zprcxxb45vfh5bvxmfap5os3wa

Machine transliteration and transliterated text retrieval: a survey

Dinesh Kumar Prabhakar, Sukomal Pal
2018 Sadhana (Bangalore)  
With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data.  ...  According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English native speakers is quite high there.  ...  Lee et al [96] proposed a model for English-Chinese transliteration pair extraction from parallel corpora. The model can extract bilingual name and transliteration pairs.  ... 
doi:10.1007/s12046-018-0828-8 fatcat:dg3gwugmqrfevnzu3deuk5w67i

Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora

Jin-Shea Kuo, Ying-Kuei Yang
2005 International Journal of Asian Language Processing  
A novel approach to automatically extracting transliterated-term pairs from Web corpora is proposed in this paper.  ...  Extracting transliterated-term pairs is a fundamental yet important task in natural language processing to collect large enough paired cognates for further studies on transliteration.  ...  Constructing an English-Chinese transliteration lexicon automatically from Web corpora is the most important goal of this paper.  ... 
dblp:journals/jclc/KuoY05 fatcat:3fkuvlgk2vgs5knisf7ioqkd6m

A phonetic similarity model for automatic extraction of transliteration pairs

Jin-Shea Kuo, Haizhou Li, Ying-Kuei Yang
2007 ACM Transactions on Asian Language Information Processing  
________________________________________________________________________ This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora.  ...  The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.  ...  page corpora; and Wern-Jun Wang at Chung-Hwa Telecommunication Laboratories for providing speech data.  ... 
doi:10.1145/1282080.1282081 fatcat:cabttqaf6vd6la4xfh46pxtbcu

Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words

Nobuhiro Kaji, Masaru Kitsuregawa
2011 Conference on Empirical Methods in Natural Language Processing  
Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing  ...  ., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often outof-vocabulary.  ...  Acknowledgement This work was supported by the Multimedia Web Analysis Framework towards Development of Social Analysis Software program of the Ministry of Education, Culture, Sports, Science and Technology  ... 
dblp:conf/emnlp/KajiK11 fatcat:3c7g6ccn4ber7jwpbeaqxvvtau

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

Jae-Hoon Kim, Hong-Seok Kwon, Hyeong-Won Seo
2015 Computational Intelligence and Neuroscience  
A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English.  ...  Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity  ...  This approach directly extracts them from MRDs or Web-based dictionaries like Wikitionary ( and Wikipedia (  ... 
doi:10.1155/2015/434153 pmid:25983745 pmcid:PMC4423015 fatcat:7cg2qrle7zcnlaq2ygzlzw3gki

A machine transliteration model based on correspondence between graphemes and phonemes

Jong-Hoon Oh, Key-Sun Choi, Hitoshi Isahara
2006 ACM Transactions on Asian Language Information Processing  
With this model, we have achieved better performance-improvements of about 15 to 41% in Englishto-Korean transliteration and about 16 to 44% in English-to-Japanese transliteration-than has been reported  ...  Three types of machine transliteration models-grapheme-based, phoneme-based, and hybrid-have been proposed.  ...  In candidate extraction, an algorithm tries to find transliteration pair candidates in bilingual corpora.  ... 
doi:10.1145/1194936.1194938 fatcat:vnaz3ca2wbbtzokxzelmj32bme

Creating multilingual translation lexicons with regional variations using web corpora

Pu-Jen Cheng, Yi-Cheng Pan, Wen-Hsiang Lu, Lee-Feng Chien
2004 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04  
of geographic information obtained from Web search engines.  ...  We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues  ...  These methods are feasible but only certain pairs of languages and subject domains can extract sufficient parallel texts as corpora.  ... 
doi:10.3115/1218955.1219023 dblp:conf/acl/ChengLTC04 fatcat:fhyi4bmy3jhgfd7yq2ausw7ysm

Study on Unknown Term Translation Mining from Google Snippets

Bin Li, Jianmin Yao
2019 Information  
Afterwards, valid candidates were extracted from small-sized, noisy bilingual corpora using an improved frequency change measurement that combines adjacent information.  ...  This study focused on an effective solution for obtaining relevant web pages, extracting translations with correct lexical boundaries, and ranking the translation candidates.  ...  Their approach depends on bilingual pairs for lexico-syntactic templates that are previously extracted from parallel corpora.  ... 
doi:10.3390/info10090267 fatcat:fpxv7nsnazbe7exxcsqyij6yiy

Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

Alexandre Klementiev, Dan Roth
2006 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06  
Seeded with a small number of transliteration pairs, our algorithm discovers multi-word NEs, and takes advantage of a dictionary (if one exists) to account for translated or partially translated NEs.  ...  NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algorithm that exploits both observations iteratively.  ...  ., 2000) train English-Arabic and English-Korean generative transliteration models, respectively.  ... 
doi:10.3115/1220175.1220278 dblp:conf/acl/KlementievR06 fatcat:47q5yumvejacxecv3tefdi4ja4

Japanese term extraction using dictionary hierarchy and machine translation system

2001 Terminology  
Finally the terms recognized from the Korean documents are translated into terms in the foreign language. By using our method, one can extract terms for languages, which one does not know.  ...  In our method, we translate documents in foreign languages into documents in Korean and extract terms in the translated Korean documents.  ...  For extracting terms from given Japanese documents, we use a machine translation system. First we translate Japanese into Korean. Then, terms are extracted from Korean text.  ... 
doi:10.1075/term.6.2.09oh fatcat:xkr3fnu3g5ai7pmog2cthsemxy

A Systematic Literature Review on Extraction of Parallel Corpora from Comparable Corpora

Dilshad Kaur, Satwinder Singh
2021 Journal of Computer Science  
Because parallel corpora are not readily accessible for many different language pairs, comparable corpora that are widely accessible can be used to extract parallel corpora.  ...  A proposed architecture and a mind map are also showcased in this review article to provide better clarity regarding parallel data extraction using comparable corpora.  ...  The proposed solution was to extract the dictionary for a low-resource language pair of English and Kannada using Comparable Corpora (CC) collected from Wikipedia dumps and corpus collected from the Indian  ... 
doi:10.3844/jcssp.2021.924.952 fatcat:irlfbohfhzgpjo7qgoxzptlh6e

Survey on Machine Transliteration and Machine Learning Models

Dhore M L, Dhore R M, Rathod P H
2015 International Journal on Natural Language Computing  
This paper provides the thorough survey on machine transliteration models and machine learning approaches used for machine transliteration over the period of more than two decades for internationally used  ...  Support of local languages can be given in all internet based applications by means of Machine Transliteration and Machine Translation.  ...  In the transliteration approach foreign words and English words were extracted and then English words were transliterated into Korean phonetic equivalents .  ... 
doi:10.5121/ijnlc.2015.4202 fatcat:kegqa5k4abahvbno2setnkxwtq

Named Entity Transliteration and Discovery in Multilingual Corpora [chapter]

Alexandre Klementiev, Dan Roth
2008 Learning Machine Translation  
Seeded with a small number of transliteration pairs, our algorithm discovers multi-word NEs, and takes advantage of a dictionary (if one exists) to account for translated or partially translated NEs.  ...  NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algorithm that exploits both observations iteratively.  ...  Negative examples here and during the rest of the training were pairs of non-NE English and Russian words selected uniformly randomly from the respective corpora.  ... 
doi:10.7551/mitpress/9780262072977.003.0004 fatcat:ulimg3m2jjh6vlf2d2dgmkqvsy

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages [chapter]

Kazuaki Kishida, Kuang-hua Chen
2020 Evaluating Information Retrieval and Access Tasks  
Such comparable corpora are helpful for comparing the performance of CLIR between pairs of CJK and English. This comparison leads to deeper insights into CLIR techniques.  ...  Specifically, CLIR tasks from NTCIR-3 to NTCIR-6 utilized multilingual corpora consisting of newspaper articles that were published in Taiwan, Japan, and Korea during the same time periods.  ...  For solving this problem, some groups attempted to extract translations from web pages for the unknown term. 4.  ... 
doi:10.1007/978-981-15-5554-1_2 fatcat:x7e2tnp7gjekzlqzgkzaddu2mi
« Previous Showing results 1 — 15 out of 190 results