Filters








92 Hits in 6.6 sec

Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering

Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
Acknowledgements We kindly acknowledge the support of NVIDIA Corporation with the donation of the GPUs used in our research to the University of Trento and IT University of Copenhagen. R.  ...  Fernández was funded by the Netherlands Organisation for Scientific Research (NWO) under VIDI grant nr. 276-89-008, Asymmetry in Conversation.  ...  Contributions: Our study contributes to the literature on CL in NLP.  ... 
doi:10.18653/v1/p19-1350 dblp:conf/acl/GrecoPFB19 fatcat:a7oj4wzw75agdn4i3nxapcdl2e

Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation [article]

Michalis Korakakis, Andreas Vlachos
2021 arXiv   pre-print
We also observe that as a side-effect, it worsens performance when the model-generated prefix is correct, a form of catastrophic forgetting.  ...  Scheduled sampling is a simple and often empirically successful approach which addresses this issue by incorporating model-generated prefixes into the training process.  ...  Does an LSTM forget more than a CNN? an empirical study of catastrophic forgetting in NLP.  ... 
arXiv:2109.06308v1 fatcat:55ij4ulbufbpxlyykwal2eopte

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents [article]

Aditya Siddhant, Anuj Goyal, Angeliki Metallinou
2018 arXiv   pre-print
The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain  ...  Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional  ...  An advantage of this unsupervised pre-training is that the CNN-BIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained  ... 
arXiv:1811.05370v1 fatcat:ss646d5c5vfmvdt3tjd2rquvke

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Aditya Siddhant, Anuj Goyal, Angeliki Metallinou
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain  ...  Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional  ...  An advantage of this unsupervised pre-training is that the CNN-BIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained  ... 
doi:10.1609/aaai.v33i01.33014959 fatcat:ttcydusdfngengs72ppbtyalle

Universal Language Model Fine-tuning for Text Classification

Jeremy Howard, Sebastian Ruder
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language  ...  Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100× more data. We opensource our pretrained models and code 1 .  ...  A hypercolumn at a pixel in CV is the vector of activations of all CNN units above that pixel.  ... 
doi:10.18653/v1/p18-1031 dblp:conf/acl/RuderH18 fatcat:rqco3rcberdi5ob4pqdwuzbycu

Universal Language Model Fine-tuning for Text Classification [article]

Jeremy Howard, Sebastian Ruder
2018 arXiv   pre-print
We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language  ...  Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.  ...  hypercolumn at a pixel in CV is the vector of activations of all CNN units above that pixel.  ... 
arXiv:1801.06146v5 fatcat:fl2vdrb37vauvkrqclmvak7uty

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, J. Santamaría, Mohammed A. Fadhel, Muthana Al-Amidie, Laith Farhan
2021 Journal of Big Data  
Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL.  ...  Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field.  ...  Acknowledgements We would like to thank the professors from the Queensland University of Technology and the University of Information Technology and Communications who gave their feedback on the paper.  ... 
doi:10.1186/s40537-021-00444-8 pmid:33816053 pmcid:PMC8010506 fatcat:x2h5qs5c2jbntipu7oi5hfnb6u

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning [article]

Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, Daniel Rubin
2022 arXiv   pre-print
Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model  ...  Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits.  ...  Acknowledgments This work was supported in part by a grant from the NCI, U01CA242879.  ... 
arXiv:2106.06047v2 fatcat:nlbpw53xxnek5ilys7ga7cwfdy

How to Fine-Tune BERT for Text Classification? [article]

Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang
2020 arXiv   pre-print
As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks.  ...  In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.  ...  BERT takes an input of a sequence of no more than 512 tokens and outputs the representation of the sequence.  ... 
arXiv:1905.05583v3 fatcat:6f7ozgdzc5ecpdhh3khd7ejfy4

Total Recall: a Customized Continual Learning Method for Neural Semantic Parsers [article]

Zhuang Li, Lizhen Qu, Gholamreza Haffari
2021 arXiv   pre-print
balances distributions of parse actions in a memory; ii) a two-stage training method that significantly improves generalization capability of the parsers across tasks.  ...  We conduct extensive experiments to study the research problems involved in continual semantic parsing and demonstrate that a neural semantic parser trained with TotalRecall achieves superior performance  ...  The computational resources of this work are supported by the Multimodal Australian Science Imaging and Visualisation Environment (MASSIVE).  ... 
arXiv:2109.05186v2 fatcat:6ymn564ys5h2zdspwjv5jqohae

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen (+4 others)
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research.  ...  In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer.  ...  Some of our experiments resemble those of Yogatama et al. (2019) , who also empirically investigate transfer performance with limited amounts of data and find similar evidence of catastrophic forgetting  ... 
doi:10.18653/v1/p19-1439 dblp:conf/acl/WangHXPMPKTHYJC19 fatcat:rqhxn6jtgjcepclspzad3odir4

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis [article]

Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, Dacheng Tao
2021 arXiv   pre-print
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.  ...  Thereby, the learner model can maintain more domain-invariant knowledge when learning new knowledge from the target dataset.  ...  In this way, the knowledge guidance model can not only learn the target knowledge, but also avoid the problem of catastrophic forgetting. 2) Analysis of β: The β in Eq. 6 is an important parameter that  ... 
arXiv:2110.13398v1 fatcat:54e3fpncmvcubdw5iuopyxdmwm

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics [article]

Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu
2020 arXiv   pre-print
A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks.  ...  These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks.  ...  Additionally, we thank the authors of the image classification library at https://github.com/hysts/ pytorch_image_classification, on top of which we built much of our codebase.  ... 
arXiv:2007.07400v1 fatcat:ppumynt6jjfjtewtpk4fxkecey

Cross-lingual Transfer Learning and Multitask Learning for Capturing Multiword Expressions

Shiva Taslimipoor, Omid Rohanian, Le An Ha
2019 Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)  
Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems.  ...  In this study, we explore for the frst time, the application of transfer learning (TRL) and multitask learning (MTL) to the identifcation of Multiword Expressions (MWEs).  ...  Using an LSTM-based model, Bingel and Søgaard (2017) performed a study to fnd benefcial tasks for the purpose of MTL in a sequence labelling scenario.  ... 
doi:10.18653/v1/w19-5119 dblp:conf/mwe/TaslimipoorRH19 fatcat:w36xm7rmajhkzjkldujs4lw7dy

From Characters to Understanding Natural Language (C2NLU): Robust End-to-End Deep Learning for NLP (Dagstuhl Seminar 17042)

Phil Blunsom, Kyunghyun Cho, Chris Dyer, Hinrich Schütze, Marc Herbstritt
2017 Dagstuhl Reports  
In most of the discussions, the need for a more detailed model analysis was pointed out.  ...  Therefore, benefits and challenges of transfer learning were an important topic of the working groups as well as of the panel discussion and the final plenary discussion.  ...  Recently, more and more studies use characters as input to (hierarchical) neural networks. This talk provides an overview of character-based NLP/NLU systems.  ... 
doi:10.4230/dagrep.7.1.129 dblp:journals/dagstuhl-reports/BlunsomCDS17 fatcat:lyp7srzsg5cgngiklccjox4abm
« Previous Showing results 1 — 15 out of 92 results