Limitations of Autoregressive Models and Their Alternatives [article]

Chu-Cheng Lin and Aaron Jaech and Xin Li and Matthew R. Gormley and Jason Eisner
2021 arXiv   pre-print
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless
more » ... e model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.
arXiv:2010.11939v3 fatcat:byfd27lqzvfc7h5khjkihc7t4e