An Improved Statistical Machine Translation Method for United Chinese-Japanese Word Segmentation

Xiaowei Wang, Jinke Wang
2016 Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016)   unpublished
As Chinese and Japanese word segmentation is processed with different tagging system and semantic performance, the granularity of word segmentation results should be readjusted to improve the performance of Statistical Machine Translation (SMT). This paper proposes an approach to adjust the word segmentation granularity for improving the performance of SMT, which combines Hanzi-Kanji comparison table and Japanese-Chinese dictionary. Experimental results express that the proposed method could
more » ... ust the granularity between Chinese and Japanese effectively and improve the performance of SMT. Character Chart Construction and Processing dictionary Construction of Chinese and Japanese Chart.The correspondence of Japanese characters and Chinese characters is very complex. Chu et al. [8] use Open Source Resources to construct Kanji, Traditional Chinese characters, Simplified Chinese characters table. 1) Character pattern changes dictionary. In this paper, taking variants of Japanese kanji font change, if there is a link between the two characters through variants, then the two characters can be transformed into each other. 2) Chinese-Japanese kanji dictionary. This paper use Kanconvit 2 in Chinese and Japanese kanji conversion table as a Chinese-Japanese kanji dictionary. The dictionary contains a total of 1,159 words table variants of different characters on the information. 3) Traditional and simplified
doi:10.2991/iceeecs-16.2016.1 fatcat:gzyijj5l3vbdne534f3n7bhkwa