Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation

Michele Bevilacqua, Department of Computer Science, Sapienza University of Rome, Italy, Roberto Navigli
2019 Proceedings - Natural Language Processing in a Deep Learning World  
While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformerbased architecture for contextualized embeddings which makes use of a coattentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the
more » ... the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.
doi:10.26615/978-954-452-056-4_015 dblp:conf/ranlp/BevilacquaN19 fatcat:4to5byjocfgjjd67clboodsh7i