The Utilization of Parallel Corpora for the Extension of Machine Translation Lexicons

Jeanne Pienaar, G.D. Oosthuizen
2012 Lexikos  
There has recently been an increasing awareness of the importance of large collections of texts (corpora) used as resources in machine translation research. The process of creating or extending machine translation lexicons is time-consuming, difficult and costly in tenns of human involvement. The contribution that corpora can make towards the reduction in cost, time and complexity has been explored by several research groups. This article describes a system that has been developed to identify
more » ... rd-pairs, utilizing an aligned bilingual (English-Afrikaans) cor-PUS in order to extend a bilingual lexicon with the words and their translations that are not present in the lexicon. New translations for existing entries can be added and the system also applies grammar rules for the identification of the grammatical category of each word-pair. This system limits the involvement of the human translator and has a positive impact on the time, cost and effort needed to extend a bilingual lexicon. MACHINE lRANSLA nON, MONOLINGUAL CORPORA, PARALLEL CORPORA Opsomming: Die benuHing van parallelle korpusse vir die uitbreiding van masjienvertalingsleksikons. Onlangs was daar In toenemende bewustheid van die belangrikheid van groot versamelings tekste (korpusse) wat as bronne in die navorsing van masjienvertaling gebruik word. Die proses om masjienvertalingsleksikons te skep of uit te brei is tydrowend, kompleks en duur in terme van menslike betrokkenheid. Die bydrae wat korpusse kan maak tot die vermindering van koste, tyd en kompleksiteit is deur verskeie navorsingsgroepe ondersoek. Hierdie artikel beskryf die ontwikkeling van In ste1se1 wat gebruik maak van In afgepaarde tweetalige (Engels-Afrikaanse) korpus vir die identifisering van woordpare met die doel om In bestaande tweetalige leksikon uit te brei met hierdie woorde en hul vertalings wat nie in die leksikon voorkom nie of om nuwe vertalings vir bestaande inskrywings by te voeg. Die stelsel pas ook grammatikareels toe vir die identifisering van die grammatikale kategorie van elke woordpaar. Die ste1se1 beperk die betrokkenheid van die menslike vertaler en het In positiewe impak op die vermindering van tyd, koste en moeite in die uitbreiding van In tweetalige leksikon.
doi:10.5788/7-1-975 fatcat:2ziy66itqna4pa3y3cgdsf34di