Integrating CAT and MT in AnglaBhart-II architecture
European Association for Machine Translation Conferences/Workshops
Machine translation (MT) is a complex and a difficult task. It is not possible to achieve human competing performance with the present state of technology. Automating the process of translation of natural languages requires a number of knowledge sources and their appropriate invocation in the translation engine. A practical machine translation system with limited resources cannot embody all the knowledge sources that the human beings use. However, performance of an MT system can be considerably
... improved if the automated translation system is integrated with supporting modules that provide synergy for arriving at correct translations. The computer assisted tools (CAT) that identify the limitations of the MT system and provide clues to cope up with them, constitute an important module for enhancing the MT system performance. This paper presents details of AnglaBharti-II system architecture highlighting the role of CAT in the system. AnglaBharti-II is a system for translating English to Indian languages. AnglaBharti is primarily a rule-based system (RBMT). The input English sentence is transformed to a pseudo-interlingual structure called PLIL (Pseudo Lingua for Indian Languages) using a CFG like pattern directed rule-base. RBMT presents limitations of its own in dealing with real-life texts. We have tried to overcome some of these limitations in AnglaBharti-II architecture by integrating some additional modules. These additional modules are basically CAT tools incorporating translation memory, raw and generalized example-bases, interactive and automated pre-editing, paraphrasing, failure analysis and a number of heuristics that attempt to deal with a variety of constructs that are frequently encountered in a real life English text.