T4T Solution: WMT21 Similar Language Task for the Spanish-Catalan and Spanish-Portuguese Language Pair

Miguel Canals, Marc Raventós Tato
2021 Conference on Machine Translation  
This system description describes the participation in the EMNLP 2021 Sixth Conference on MT (WMT21) -Shared Task: Similar translation for the language pairs SPA<>CAT and PTG<>SPA for our T4T solution. The main objective has been to prove that good data with a good standard NMT toolkit, as Open-NMT, is able to provide good results. We have focus in the corpus cleaning (both from the physical and from the statistical side), try to find some alternatives to subword segmentation (syllabic and
more » ... pair-enconding), and finally use OpenNMT as out-box system with a transformer model. The results have been pretty close to the best ones, if not the best.
dblp:conf/wmt/CanalsT21 fatcat:rg7pqsi2qzdnraou6uffer657q