3 Hits in 2.2 sec

PMI-Masking: Principled masking of correlated spans [article]

Yoav Levine, Barak Lenz, Opher Lieber, Omri Abend, Kevin Leyton-Brown, Moshe Tennenholtz, Yoav Shoham
2020 arXiv   pre-print
To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation  ...  , and random-span masking.  ...  PMI: FROM BIGRAMS TO n-GRAMS Our aim is to define a masking strategy that targets correlated sequences of tokens in a principled way.  ... 
arXiv:2010.01825v1 fatcat:wo6hobt64bfr7gjn3wu5ghttjq

Data Efficient Masked Language Modeling for Vision and Language

Yonatan Bitton, Michael Elhadad, Gabriel Stanovsky, Roy Schwartz
2021 Findings of the Association for Computational Linguistics: EMNLP 2021   unpublished
PMI-masking: Principled pages 4171–4186, Minneapolis, Minnesota. Associ- masking of correlated spans. In Proc. of ICLR.  ...  The tendency of VLP models is to predict something that is correlated with the text, or common answers.  ... 
doi:10.18653/v1/2021.findings-emnlp.259 fatcat:skhhfoittjg33b26oo23olx37a

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention [article]

Noam Wies, Yoav Levine, Daniel Jannai, Amnon Shashua
2021 arXiv   pre-print
We theoretically predict the existence of an embedding rank bottleneck that limits the contribution of self-attention width to the Transformer expressivity.  ...  We empirically demonstrate the existence of this bottleneck and its implications on the depth-to-width interplay of Transformer architectures, linking the architecture variability across domains to the  ...  Yoav Levine was supported by the Israel Academy of Sciences Adams fellowship.  ... 
arXiv:2105.03928v2 fatcat:bah7r5jzwzdsbifa6iffcz27xe