A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Acknowledgements We kindly acknowledge the support of NVIDIA Corporation with the donation of the GPUs used in our research to the University of Trento and IT University of Copenhagen. R. ...
Fernández was funded by the Netherlands Organisation for Scientific Research (NWO) under VIDI grant nr. 276-89-008, Asymmetry in Conversation. ...
Contributions: Our study contributes to the literature on CL in NLP. ...
doi:10.18653/v1/p19-1350
dblp:conf/acl/GrecoPFB19
fatcat:a7oj4wzw75agdn4i3nxapcdl2e
Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation
[article]
2021
arXiv
pre-print
We also observe that as a side-effect, it worsens performance when the model-generated prefix is correct, a form of catastrophic forgetting. ...
Scheduled sampling is a simple and often empirically successful approach which addresses this issue by incorporating model-generated prefixes into the training process. ...
Does an LSTM forget more than a CNN? an empirical study of catastrophic forgetting in NLP. ...
arXiv:2109.06308v1
fatcat:55ij4ulbufbpxlyykwal2eopte
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
[article]
2018
arXiv
pre-print
The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain ...
Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional ...
An advantage of this unsupervised pre-training is that the CNN-BIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained ...
arXiv:1811.05370v1
fatcat:ss646d5c5vfmvdt3tjd2rquvke
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain ...
Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional ...
An advantage of this unsupervised pre-training is that the CNN-BIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained ...
doi:10.1609/aaai.v33i01.33014959
fatcat:ttcydusdfngengs72ppbtyalle
Universal Language Model Fine-tuning for Text Classification
2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language ...
Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100× more data. We opensource our pretrained models and code 1 . ...
A hypercolumn at a pixel in CV is the vector of activations of all CNN units above that pixel. ...
doi:10.18653/v1/p18-1031
dblp:conf/acl/RuderH18
fatcat:rqco3rcberdi5ob4pqdwuzbycu
Universal Language Model Fine-tuning for Text Classification
[article]
2018
arXiv
pre-print
We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language ...
Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code. ...
hypercolumn at a pixel in CV is the vector of activations of all CNN units above that pixel. ...
arXiv:1801.06146v5
fatcat:fl2vdrb37vauvkrqclmvak7uty
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021
Journal of Big Data
Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. ...
Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. ...
Acknowledgements We would like to thank the professors from the Queensland University of Technology and the University of Information Technology and Communications who gave their feedback on the paper. ...
doi:10.1186/s40537-021-00444-8
pmid:33816053
pmcid:PMC8010506
fatcat:x2h5qs5c2jbntipu7oi5hfnb6u
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
[article]
2022
arXiv
pre-print
Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model ...
Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. ...
Acknowledgments This work was supported in part by a grant from the NCI, U01CA242879. ...
arXiv:2106.06047v2
fatcat:nlbpw53xxnek5ilys7ga7cwfdy
How to Fine-Tune BERT for Text Classification?
[article]
2020
arXiv
pre-print
As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. ...
In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. ...
BERT takes an input of a sequence of no more than 512 tokens and outputs the representation of the sequence. ...
arXiv:1905.05583v3
fatcat:6f7ozgdzc5ecpdhh3khd7ejfy4
Total Recall: a Customized Continual Learning Method for Neural Semantic Parsers
[article]
2021
arXiv
pre-print
balances distributions of parse actions in a memory; ii) a two-stage training method that significantly improves generalization capability of the parsers across tasks. ...
We conduct extensive experiments to study the research problems involved in continual semantic parsing and demonstrate that a neural semantic parser trained with TotalRecall achieves superior performance ...
The computational resources of this work are supported by the Multimodal Australian Science Imaging and Visualisation Environment (MASSIVE). ...
arXiv:2109.05186v2
fatcat:6ymn564ys5h2zdspwjv5jqohae
Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research. ...
In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. ...
Some of our experiments resemble those of Yogatama et al. (2019) , who also empirically investigate transfer performance with limited amounts of data and find similar evidence of catastrophic forgetting ...
doi:10.18653/v1/p19-1439
dblp:conf/acl/WangHXPMPKTHYJC19
fatcat:rqhxn6jtgjcepclspzad3odir4
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis
[article]
2021
arXiv
pre-print
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively. ...
Thereby, the learner model can maintain more domain-invariant knowledge when learning new knowledge from the target dataset. ...
In this way, the knowledge guidance model can not only learn the target knowledge, but also avoid the problem of catastrophic forgetting. 2) Analysis of β: The β in Eq. 6 is an important parameter that ...
arXiv:2110.13398v1
fatcat:54e3fpncmvcubdw5iuopyxdmwm
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
[article]
2020
arXiv
pre-print
A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. ...
These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks. ...
Additionally, we thank the authors of the image classification library at https://github.com/hysts/ pytorch_image_classification, on top of which we built much of our codebase. ...
arXiv:2007.07400v1
fatcat:ppumynt6jjfjtewtpk4fxkecey
Cross-lingual Transfer Learning and Multitask Learning for Capturing Multiword Expressions
2019
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. ...
In this study, we explore for the frst time, the application of transfer learning (TRL) and multitask learning (MTL) to the identifcation of Multiword Expressions (MWEs). ...
Using an LSTM-based model, Bingel and Søgaard (2017) performed a study to fnd benefcial tasks for the purpose of MTL in a sequence labelling scenario. ...
doi:10.18653/v1/w19-5119
dblp:conf/mwe/TaslimipoorRH19
fatcat:w36xm7rmajhkzjkldujs4lw7dy
From Characters to Understanding Natural Language (C2NLU): Robust End-to-End Deep Learning for NLP (Dagstuhl Seminar 17042)
2017
Dagstuhl Reports
In most of the discussions, the need for a more detailed model analysis was pointed out. ...
Therefore, benefits and challenges of transfer learning were an important topic of the working groups as well as of the panel discussion and the final plenary discussion. ...
Recently, more and more studies use characters as input to (hierarchical) neural networks. This talk provides an overview of character-based NLP/NLU systems. ...
doi:10.4230/dagrep.7.1.129
dblp:journals/dagstuhl-reports/BlunsomCDS17
fatcat:lyp7srzsg5cgngiklccjox4abm
« Previous
Showing results 1 — 15 out of 92 results