324 Hits in 6.0 sec

A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

2003 International Journal of Computer Processing Of Languages  
In this paper, we will describe a Korean transliterated foreign word extraction algorithm.  ...  Syllable sequences of Korean strings are modelled by Hidden Markov Model whose state represents a character with binary marking to indicate whether the syllable is part of a transliterated foreign word  ...  And this work was partially supported by the Ministry of Science and Technology through the "Knowledge base prototype construction and its application for human knowledge processing modelling" (M1-0107  ... 
doi:10.1142/s021942790300084x fatcat:5gnyrkzrrjcd3hfjxh3ufji7oe

Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval

Byung-Ju Kang, Key-Sun Choi
2000 Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00  
To make matters worse the Korean transliterations of an English word may be very various.  ...  The mixed use of English words and their various transliterations may cause severe word mismatch problem in Korean information retrieval.  ...  For We developed a new effective method of foreign word extraction through word segmentation [4] .  ... 
doi:10.1145/355214.355234 dblp:conf/iral/KangC00 fatcat:tbixddfeona6ndjypkuvboskai

Japanese term extraction using dictionary hierarchy and machine translation system

2001 Terminology  
There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, they focus on a mono-lingual term extraction method.  ...  This article describes an automatic term extraction method from documents in foreign languages using a machine translation system.  ...  The main assumption for extracting a foreign word is that the composition of foreign words is different from that of pure Korean words, since the Korean phonetic system is different from that of the foreign  ... 
doi:10.1075/term.6.2.09oh fatcat:xkr3fnu3g5ai7pmog2cthsemxy

Survey on Machine Transliteration and Machine Learning Models

Dhore M L, Dhore R M, Rathod P H
2015 International Journal on Natural Language Computing  
This paper provides the thorough survey on machine transliteration models and machine learning approaches used for machine transliteration over the period of more than two decades for internationally used  ...  Survey shows that linguistic approach provides better results for the closely related languages and probability based statistical approaches are good when one of the languages is phonetic and other is  ...  In the transliteration approach foreign words and English words were extracted and then English words were transliterated into Korean phonetic equivalents .  ... 
doi:10.5121/ijnlc.2015.4202 fatcat:kegqa5k4abahvbno2setnkxwtq

Cross-Language IR at University of Tsukuba: Automatic Transliteration for Japanese, English, and Korean

Atsushi Fujii, Tetsuya Ishikawa
2004 NTCIR Conference on Evaluation of Information Access Technologies  
We apply our method, which was originally proposed for Japanese Katakana words, to Korean Hangul words and realize JEK transliteration in a single framework.  ...  We produced a transliteration dictionary for Japanese and English letters via the Roman representation. To produce a new dictionary, we use the Unicode system to romanize Korean words.  ...  [10] proposed a statistical method to detect foreign words in Korean. However, their method requires a training corpus in which conventional and foreign words are annotated.  ... 
dblp:conf/ntcir/FujiiI04 fatcat:6dv2nr7lnjh2pdasqdi7l5fnbq

Term recognition using technical dictionary hierarchy

Jong-Hoon Oh, KyungSoon Lee, Key-Sun Choi
2000 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00  
For example, domain dictionaries can improve the performance in ATR. This paper focuses on a method for extracting terms using a dictionary hierarchy.  ...  In recent years, statistical approaches on ATR (Automatic Term Recognition) have achieved good results. However, there are scopes to improve the performance in extracting terms still further.  ...  Many fundamental researches are supported by the fund of Ministry of Science and Technology under a project of plan STEP2000.  ... 
doi:10.3115/1075218.1075281 dblp:conf/acl/OhLC00 fatcat:shq4yatqsvganayoazgzditl6i

Effective foreign word extraction for Korean information retrieval

Byung-Ju Kang, Key-Sun Choi
2002 Information Processing & Management  
In Korean text, foreign words, which are mostly transliterations of English words, are frequently used.  ...  So accurate foreign word extraction is crucial for high performance of information retrieval.  ...  We also showed that the impact of accurate foreign word extraction on Korean information retrieval performance is great. Fig. 1 . 1 The HMM model for Korean eojeol.  ... 
doi:10.1016/s0306-4573(00)00065-0 fatcat:3jvkcbxizna5bod6n6utmxnwtm

Machine transliteration and transliterated text retrieval: a survey

Dinesh Kumar Prabhakar, Sukomal Pal
2018 Sadhana (Bangalore)  
With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data.  ...  To leverage this huge information repository, there is a matching effort to process transliterated text. In this article, we survey the recent body of work in the field of transliteration.  ...  Overall, the recall rate is low for the foreign words of Korean and Japanese origins.  ... 
doi:10.1007/s12046-018-0828-8 fatcat:dg3gwugmqrfevnzu3deuk5w67i

Selection of Korean Proper Translation Words Using Bi-Gram-Based Histograms

Hanmin Jung, Hee-Kwan Koo, Won-Kyung Sung, Dong-In Park
2007 Data Science Journal  
This paper describes a proper translation-selecting and translation-clustering algorithm for Korean translation of words automatically extracted from newspapers.  ...  As about 80% of the English words in Korean newspapers appear in abbreviated form, it is necessary to make clusters of translation words to construct easily bilingual knowledge bases such as dictionaries  ...  Unfortunately, there is no study of the issues for the Korean newspaper corpus. No one has previously tried to extract a set of Korean translations for an English word in a real newspaper.  ... 
doi:10.2481/dsj.6.s125 fatcat:3sznxs4ccva4xn3wzayylymw64

Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

Sungchul Kim, Kristina Toutanova, Hwanjo Yu
2012 Annual Meeting of the Association for Computational Linguistics  
The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence.  ...  In this paper we propose a method to automatically label multi-lingual data with named entity tags.  ...  The transliteration model can return an n-best list of transliterations of a foreign string, together with scores.  ... 
dblp:conf/acl/KimTY12 fatcat:wkzndohprzbqxksgfynboyu74u

A Weighted Finite-State Transducer Implementation of Phoneme Rewrite Rules for English to Korean Pronunciation Conversion

Hahn Koo
2011 Procedia - Social and Behavioral Sciences  
This paper describes a method for developing a finitestate model that predicts how English words and named entities are pronounced in Korean.  ...  A formal model that properly captures this change has theoretical implications in phonology and practical applications in speech processing and machine transliteration.  ...  Foreign words form a major class of out-of-vocabulary words and pose problems for text-tospeech synthesis and automatic speech recognition.  ... 
doi:10.1016/j.sbspro.2011.10.599 fatcat:7e3lnfo5ive6zoeje7qfc6n6jq

A phonetic similarity model for automatic extraction of transliteration pairs

Jin-Shea Kuo, Haizhou Li, Ying-Kuei Yang
2007 ACM Transactions on Asian Language Information Processing  
transliteration in the k-neighborhood of a recognized English word.  ...  ________________________________________________________________________ This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora.  ...  We also thank Yu Chen at the Institute for Infocomm Research, Singapore, for her efforts in improving the manuscript; Wen-Hsiang Lu at the National Cheng-Kung University for providing hyperlink and Web  ... 
doi:10.1145/1282080.1282081 fatcat:cabttqaf6vd6la4xfh46pxtbcu

Transliteration Generation and Mining with Limited Training Resources

Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma, Aditya Bhargava, Qing Dou, Mi-Young Kim, Grzegorz Kondrak
2010 Named Entity Workshop  
We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from simple to sophisticated.  ...  We present DIRECTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features.  ...  Acknowledgments This research was supported by the Alberta Ingenuity Fund, Informatics Circle of Research Excellence (iCORE), and the Natural Sciences and Engineering Research Council of Canada (NSERC)  ... 
dblp:conf/aclnews/JiampojamarnDBB10 fatcat:g4mnuqlukve2vklcpo4wigbtym

How to Translate Dialects: A Segmentation-Centric Pivot Translation Approach

Michael Paul, Andrew Finch, Eiichiro Sumita
2013 Journal of Natural Language Processing  
This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian alignment (BA) models with pivot-based SMT approaches  ...  and a standard language automatically, (2) it avoids segmentation mismatches between the input and the translation model by mapping the character sequences of the dialect language to the word segmentation  ...  using transliteration pairs, i.e., the most likely sequence of source characters and target words according to a joint language model built from the alignment of Bayesian model.  ... 
doi:10.5715/jnlp.20.563 fatcat:vmy5rgifxnajxnxtytqbezrugq

Transliteration of proper names in cross-lingual information retrieval

Paola Virga, Sanjeev Khudanpur
2003 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition -  
We demonstrate the application of statistical machine translation techniques to "translate" the phonemic representation of an English name, obtained by using an automatic text-to-speech system, to a sequence  ...  © " is estimated from a paired corpus of foreign-language sentences and their English translations, and the language model © " is trained from English text. Software tools 1  ...  Since we seek Chinese names which are transliteration of a given English name, the notion of words in a sentence in the IBM model above is replaced with phonemes in a word.  ... 
doi:10.3115/1119384.1119392 dblp:conf/acl/VirgaK03 fatcat:jar523futrauvl6jlgg7ulkfte
« Previous Showing results 1 — 15 out of 324 results