Filters








821 Hits in 4.5 sec

Sentence Bottleneck Autoencoders from Transformer Language Models [article]

Ivan Montero, Nikolaos Pappas, Noah A. Smith
2021 arXiv   pre-print
We therefore explore the construction of a sentence-level autoencoder from a pretrained, frozen transformer language model.  ...  We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.  ...  To fill in this gap, we introduce AUTOBOT, a new autoencoder model for learning sentence "bottleneck" (i.e., fixed-size) representations from pretrained transformers that is useful for similarity, generation  ... 
arXiv:2109.00055v2 fatcat:mzozllfuunhsxjo46dvj3ugynm

Sentence Bottleneck Autoencoders from Transformer Language Models

Ivan Montero, Nikolaos Pappas, Noah A. Smith
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
We therefore explore the construction of a sentence-level autoencoder from a pretrained, frozen transformer language model.  ...  We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.  ...  To fill in this gap, we introduce AUTOBOT, a new autoencoder model for learning sentence "bottleneck" (i.e., fixed-size) representations from pretrained transformers that is useful for similarity, generation  ... 
doi:10.18653/v1/2021.emnlp-main.137 fatcat:fsoavrdj5fdj3m5qffczxylg4a

Squeezing bottlenecks: Exploring the limits of autoencoder semantic representation capabilities

Parth Gupta, Rafael E. Banchs, Paolo Rosso
2016 Neurocomputing  
We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability  ...  of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconstruction  ...  We trained the autoencoder varying down the size of the bottleneck layer from 100 to 10 with step-sizes of 10.  ... 
doi:10.1016/j.neucom.2015.06.091 fatcat:ezj3jaf6yvbcddjugxri4vh2mm

Discrete Autoencoders for Sequence Models [article]

Łukasz Kaiser, Samy Bengio
2018 arXiv   pre-print
For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models.  ...  Nevertheless, it remains challenging to extract good representations from these models.  ...  For example, a language model would just condition on s <i while a neural machine translation model would condition on the input sentence (in the other language) and s <i .  ... 
arXiv:1801.09797v1 fatcat:r4e6zowc3zejtpkctcmj77hwfy

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

Peter West, Ari Holtzman, Jan Buys, Yejin Choi
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach  ...  Using only pretrained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus.  ...  Simply, a large corpus of unsupervised summaries is generated with BottleSum Ex using a strong language model, then the same language model is tuned to produce summaries from source sentences on that dataset  ... 
doi:10.18653/v1/d19-1389 dblp:conf/emnlp/WestHBC19 fatcat:a7rllnns5nhpbk7d2wtpgdp7ua

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle [article]

Peter West, Ari Holtzman, Jan Buys, Yejin Choi
2019 arXiv   pre-print
In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach  ...  Building on our unsupervised extractive summarization (BottleSumEx), we then present a new approach to self-supervised abstractive summarization (BottleSumSelf), where a transformer-based language model  ...  Simply, a large corpus of unsupervised summaries is generated with BottleSum Ex using a strong language model, then the same language model is tuned to produce summaries from source sentences on that dataset  ... 
arXiv:1909.07405v2 fatcat:ph3oriu4k5emlo5v6j2zoj6tse

Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities [article]

Parth Gupta, Rafael E. Banchs, Paolo Rosso
2014 arXiv   pre-print
capabilities of autoencoders; and iii) we propose an automatic method to find the critical bottleneck dimensionality for text language representations (below which structural information is lost).  ...  We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability  ...  We trained the autoencoder varying down the size of the bottleneck layer from 100 to 10 with step-sizes of 10.  ... 
arXiv:1402.3070v1 fatcat:ffbphiul4vftjm5j55svaxv2re

Plug and Play Autoencoders for Conditional Text Generation [article]

Florian Mai
2020 arXiv   pre-print
Text autoencoders are commonly used for conditional generation tasks such as style transfer.  ...  We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a mapping within the autoencoder's embedding space, training embedding-to-embedding (  ...  Left: we pretrain an autoencoder on (unannotated) text, which transforms an input sentence x into an embedding z x and uses it to predict a reconstructionx of the input sentence.  ... 
arXiv:2010.02983v2 fatcat:t27lepghkbb6ddycs5ozshqtf4

Bilingual is At Least Monolingual (BALM): A Novel Translation Algorithm that Encodes Monolingual Priors [article]

Jeffrey Cheng, Chris Callison-Burch
2019 arXiv   pre-print
language.  ...  State-of-the-art machine translation (MT) models do not use knowledge of any single language's structure; this is the equivalent of asking someone to translate from English to German while knowing neither  ...  generalize to other German-to-English translation tasks outside of image 8 Appendix Links to Research Materials Github repository: https://github.com/jeffreyscheng/senior-thesis-translation Pre-trained BALM models  ... 
arXiv:1909.01146v1 fatcat:q6h2ixriyjcktnm42nonxw4iui

Binary Autoencoder for Text Modeling [chapter]

Ruslan Baynazarov, Irina Piontkovskaya
2019 Communications in Computer and Information Science  
Experiments reported in this paper show binary autoencoder to have the main features of VAE: semantic consistency and good latent space coverage; while not suffering from the mode collapse and being a  ...  In this paper, autoencoder with binary latent space trained using straight-through estimator is shown to have advantages over VAE on text modeling task.  ...  Latent variables obtained from different autoencoders were used as input features to feedforward classifier. Table 1 : 1 Language modeling results.  ... 
doi:10.1007/978-3-030-34518-1_10 fatcat:xsci4zzlqncljd2ldhwhaxmvbu

Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation [article]

Florian Mai, James Henderson
2021 arXiv   pre-print
models.  ...  Our experimental evaluations on unsupervised sentiment transfer and sentence summarization show that our method performs substantially better than a standard autoencoder.  ...  In this paper, we extent Emb2Emb from single-vector bottleneck AEs to Bag-of-Vector Autoencoders (BoV-AEs), which encode text into a variable-size representation where the number of vectors grows with  ... 
arXiv:2110.07002v1 fatcat:b5bfzyxzjve4nhwnzegbew72bq

A Survey on Self-supervised Pre-training for Sequential Transfer Learning in Neural Networks [article]

Huanru Henry Mao
2020 arXiv   pre-print
It involves first pre-training a model on a large amount of unlabeled data, then adapting the model to target tasks of interest.  ...  Deep neural networks are typically trained under a supervised learning framework where a model learns a single task using labeled data.  ...  ., 2018) implementation the authors proposed to jointly perform masked language modeling and next sentence prediction.  ... 
arXiv:2007.00800v1 fatcat:jgjl2l7wqfaq5do4vre5fryuoe

Semantic Sentence Matching with Densely-Connected Recurrent and Co-Attentive Information

Seonhoon Kim, Inho Kang, Nojun Kwak
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering.  ...  It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer.  ...  And, by combining DRCN with the ELMo, one of the contextualized embeddings from language models, our model outperforms the LM-Transformer which has 85m parameters with fewer parameters of 61m.  ... 
doi:10.1609/aaai.v33i01.33016586 fatcat:mniquud2fbdfpmisfnyxqgzdi4

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information [article]

Seonhoon Kim, Inho Kang, Nojun Kwak
2018 arXiv   pre-print
Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering.  ...  It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer.  ...  And, by combining DRCN with the ELMo, one of the contextualized embeddings from language models, our model outperforms the LM-Transformer which has 85m parameters with fewer parameters of 61m.  ... 
arXiv:1805.11360v2 fatcat:q7gksggltzhoxovdea6nmocfdq

Language Model-Based Paired Variational Autoencoders for Robotic Language Learning [article]

Ozan Özdemir, Matthias Kerzel, Cornelius Weber, Jae Hee Lee, Stefan Wermter
2022 arXiv   pre-print
Next, we introduce PVAE-BERT, which equips the model with a pretrained large-scale language model, i.e., Bidirectional Encoder Representations from Transformers (BERT), enabling the model to go beyond  ...  Our experiments suggest that using a pretrained language model as the language encoder allows our approach to scale up for real-world scenarios with instructions from human users.  ...  ACKNOWLEDGMENT The authors gratefully acknowledge support from the German Research Foundation DFG, project CML (TRR 169).  ... 
arXiv:2201.06317v1 fatcat:yklxc5c5mjbm5byx5ktmyl7eoy
« Previous Showing results 1 — 15 out of 821 results