1 Hit in 6.9 sec

Tabula Nearly Rasa: Probing the Linguistic Knowledge of Character-level Neural Language Models Trained on Unsegmented Text

Michael Hahn, Marco Baroni
2019 Transactions of the Association for Computational Linguistics  
We present a multi-lingual study of the linguistic knowledge encoded in RNNs trained as character-level language models, on input data with word boundaries removed.  ...  The results show that our "near tabula rasa" RNNs are mostly able to solve morphological, syntactic and semantic tasks that intuitively presuppose word-level knowledge, and indeed they learned, to some  ...  Acknowledgments We would like to thank Piotr Bojanowski, Alex Cristia, Kristina Gulordava, Urvashi Khandelwal, Germán Kruszewski, Sebastian Riedel, Hinrich Schütze, and the anonymous reviewers for feedback  ... 
doi:10.1162/tacl_a_00283 fatcat:lw3p6pttabfylp7t7u53xx2smi