Examining Scaling and Transfer of Language Model Architectures for Machine Translation
[article]
Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier Garcia, Jonathan Shen, Orhan Firat
2022
arXiv
pre-print
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. ...
Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap ...
) tasks (Brown et al., 2020; Raffel et al., However, in neural machine translation (NMT), EncDec has been the dominant paradigm across all translation tasks (e.g. high/low-resource, multilingual and ...
arXiv:2202.00528v3
fatcat:jlzm5kxssvamzmh2a3g43oknya