Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task

Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, José A. R. Fonollosa
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)  
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model (Vaswani et
more » ... , 2017) with this termsinformed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.
doi:10.18653/v1/w19-5418 dblp:conf/wmt/CarrinoRCF19 fatcat:rh37dboj5jawtgt3llrwhlb7nm