Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Lirong Qiu
2015 International Journal of Database Theory and Application  
Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are unpractical for newly underway Tibetan information processing because those methods have to use a
more » ... e number of manufactured sentences as training corpus while extracting inter-translatable word pairs. This paper proposes a multi-strategy Tibetan-Chinese sentence alignment method based on length of sentence, syntactic rules and bilingual dictionary. We test our approach on a bilingual corpus crawled from bilingual website and perform manual evaluation on bilingual sentences pairs extracted from Tibetan-Chinese corpora. information processing because it has to use a large number of manufactured sentences as training corpus while extracting inter-translatable word pairs. In practice, we have some sentences in Tibetan from the web, and most of the sentences are translated from the Chinese website. Our mission is to find the bilingual sentence pairs. We've found that, in most cases, if there is a sentence in Tibetan, there will be a sentence in Chinese. Conversely, given a Chinese sentence, the associated Tibetan sentence is often failed to found. Under these conditions, this paper proposes a multi-strategy Tibetan-Chinese sentence alignment method based on length of sentence, syntactic rules and bilingual dictionary. Firstly, this method aims at the length of one sentence of Chinese, screens out all possible aligned Tibetan sentences making use of sentence-length-based algorithm. Secondly, judge with the syntactic rules, such as whether the Tibetan has obvious corresponding syntactic rules for Chinese interrogative sentence or exclamatory sentence. Thirdly, do accurate Tibetan-Chinese sentence alignment with bilingual dictionary. The remainder of the paper is organized as follows. Section 2introduces the preliminaries of our work. Section 3 presents our work on multi-features Tibetan-Chinese sentence alignment approach. The empirical analysis and the results are presented in Section 4. In Section 5, we provide an overview of related work on sentence-level bilingual alignment, followed by the conclusions, discussions, and future work in Section6.
doi:10.14257/ijdta.2015.8.4.27 fatcat:3x3crntv6zfp3oge44rczfflru