Asian language processing: current state-of-the-art

Chu-Ren Huang, Takenobu Tokunaga, Sophia Yat Mei Lee
2007 Language Resources and Evaluation  
Academia Sinica Background: The Challenge of Asian Language Processing Asian language processing presents formidable challenges to achieving multilingualism and multiculturalism in our society. One of the first and most obvious challenges is the multitude and diversity of languages: more than 2,000 languages are listed as languages in Asia by Ethnologue (Gordon, 2005) , representing four major language families: Austronesian, Trans-New Guinea, Indo-European, and Sino-Tibetan 1 . The challenge
more » ... made more formidable by the fact that as a whole, Asian languages range from the language with most speakers in the world (Mandarin Chinese, close to 900 million native speakers) to the more than 70 nearly extinct languages (e.g. Pazeh in Taiwan, one speaker). As a result, there are vast differences in the level of language processing capability and the number of sharable resources available for individual languages. Major Asian languages such as Mandarin Chinese, Hindi, Japanese, Korean, and Thai have benefited from several years of intense language processing research, and fast-developing languages (e.g., Filipino, Urdu, and Vietnamese) are gaining ground. However, for many nearextinct languages, research and resources are scarce, and computerization represents the last resort for preservation after extinction. A comprehensive overview of the current state of Asian language processing must necessarily address the range of issues that arise due to the diversity of Asian languages and must reflect the vastly different state-ofthe-art for specific languages. Therefore, we have divided the special issues on Asian language technology into two parts. The first is a double issue entitled Asian Language Processing: State of the Art Resources and Processing, which focuses on state-of-the-art research issues given the diversity of Asian languages. Although the majority of papers in this double issue deal with 1 These four language families, plus the Niger-Congo family in Africa, each include more than 400 languages. Other larger language families in Asia include Austro-Asiatic (169), Tai-Kadai (76), Dravidian (73), and Altaic (66).
doi:10.1007/s10579-007-9041-9 fatcat:ryrtqspk5nggdgzp7o75knch3m