61,913 Hits in 8.8 sec

Improving language models fine-tuning with representation consistency targets [article]

Anastasia Razdaibiedina, Vivek Madan, Zohar Karnin, Ashish Khetan, Vishaal Kapoor
2022 arXiv   pre-print
Fine-tuning contextualized representations learned by pre-trained language models has become a standard practice in the NLP field.  ...  In this paper, we propose a novel fine-tuning method that avoids representation collapse during fine-tuning by discouraging undesirable changes in the representations.  ...  Our method: CAPCORT-X Representation Consistency Targets In this section, we first discuss representation collapse and how it is connected to generalization performance of the fine-tuned model.  ... 
arXiv:2205.11603v1 fatcat:rniglb3q5res5pl6ep3px2c5aq

Can Monolingual Pre-trained Encoder-Decoder Improve NMT for Distant Language Pairs?

Hwichan Kim, Mamoru Komachi
2021 Pacific Asia Conference on Language, Information and Computation  
To this end, we analyze BART fine-tuned with languages exhibiting different syntactic proximities to the source language in terms of the translation accuracy and network representations.  ...  after fine-tuning.  ...  The fine-tuned BART models achieve consistent improvements for all language pairs and directions.  ... 
dblp:conf/paclic/KimK21 fatcat:dqrvrcqjbbacjn2mv2ctmnlzzi

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information [article]

Zehui Lin, Xiao Pan, Mingxuan Wang, Xipeng Qiu, Jiangtao Feng, Hao Zhou, Lei Li
2021 arXiv   pre-print
We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models.  ...  Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space.  ...  We would also like to thank Liwei Wu, Huadong Chen, Qianqian Dong, Zewei Sun, and Weiying Ma for their useful suggestion and help with experiments.  ... 
arXiv:2010.03142v3 fatcat:v5zcixzrurecnnmlaewkmxzg6u

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

Raj Dabre, Atsushi Fujita, Chenhui Chu
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure finetuning on the target parallel corpora.  ...  Even when the helping target language is not one of the target languages of our concern, our multistage finetuning can give 3-9 BLEU score gains over a simple one-to-one model.  ...  Irrespective of the external parallel corpus for En-XX, the three-stage fine-tuned model (#7) achieved the highest BLEU scores for all the seven target languages, consistently outperforming the other models  ... 
doi:10.18653/v1/d19-1146 dblp:conf/emnlp/DabreFC19 fatcat:xwdq2gdw7fhyfm4ze7medtzftm

Investigating Learning Dynamics of BERT Fine-Tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu
2020 International Joint Conference on Natural Language Processing  
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves  ...  In this paper, we inspect the learning dynamics of BERT fine-tuning with two indicators.  ...  Figure 3 : 3 Figure 3: SVCCA distance of the last layer between the original fine-tuned model and the fine-tuned model with parameters of a target layer replaced with their pretrained values.  ... 
dblp:conf/ijcnlp/HaoDWX20 fatcat:6p4b3o2ivrhk3dbsnnsx6ts2ya

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning [article]

Benjamin Minixhofer, Milan Gritta, Ignacio Iacobacci
2021 arXiv   pre-print
For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask.  ...  The FreeGBDT shows a consistent improvement over the MLP classification head.  ...  Pretraining is followed by fine-tuning the model on the target task.  ... 
arXiv:2105.03791v2 fatcat:dyaaxfas75chflbybctqsvuoyy

Consistency Regularization for Cross-Lingual Fine-Tuning [article]

Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
2021 arXiv   pre-print
In this work, we propose to improve cross-lingual fine-tuning with consistency regularization.  ...  Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.  ...  For instance, in part-of-speech XTUNE: Cross-Lingual Fine-Tuning with Consistency Regularization We propose to improve cross-lingual fine-tuning with two consistency regularization methods, so that  ... 
arXiv:2106.08226v1 fatcat:mgllkqwvrzgyppsdaqt5ey7yp4

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning [article]

Marc Tanti and Lonneke van der Plas and Claudia Borg and Albert Gatt
2021 arXiv   pre-print
This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different degrees  ...  The results presented here suggest that the process of fine-tuning causes a reorganisation of the model's limited representational capacity, enhancing language-independent representations at the expense  ...  For both target tasks, performance improves after fine-tuning, as expected.  ... 
arXiv:2109.06935v2 fatcat:73u3rdzb6nckzdzqw6pwevvnga

Model Selection for Cross-Lingual Transfer [article]

Yang Chen, Alan Ritter
2021 arXiv   pre-print
In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language.  ...  We propose a machine learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities.  ...  Evaluation on multilingual fine-tuned models An interesting question is whether fine-tuning on available development data in the auxiliary languages can improve performance.  ... 
arXiv:2010.06127v2 fatcat:ua4z67ka2fczrbwod44i4vmopm

Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation [article]

Huiling You, Xingran Zhu, Sara Stymne
2021 arXiv   pre-print
poorly with fine-tuning but gives similar results to the other models when used as a feature extractor.  ...  We submitted our two best systems, fine-tuned with XLMR and mBERT.  ...  As our main method, we fine-tune the language models with a span classification head.  ... 
arXiv:2104.03767v2 fatcat:gq7yr3gzovgvbkpdxjramk254u

Improving Zero-Shot Cross-Lingual Hate Speech Detection with Pseudo-Label Fine-Tuning of Transformer Language Models

Haris Bin Zia, Ignacio Castro, Arkaitz Zubiaga, Gareth Tyson
2022 International Conference on Web and Social Media  
This work presents a novel zero-shot, cross-lingual transfer learning pipeline based on pseudo-label fine-tuning of Transformer Language Models for automatic hate speech detection.  ...  Our pipeline achieves an average improvement of 7.6% (in terms of macro-F1) over previous zero-shot, cross-lingual models.  ...  Our work is different in that we exploit pseudo-labeled indomain data in target language along with gold-labeled data in English to fine-tune transformer language models that overcome the limitation of  ... 
dblp:conf/icwsm/ZiaCZT22 fatcat:fzvukvo6g5eclef6tagct6hwd4

Twice Fine-tuning Deep Neural Networks for Paraphrase Identification

Bowon Ko, Ho-Jin Choi
2020 Electronics Letters  
Unlike conventional BERT, which fine tunes the target task such as PI to pre-trained BERT, twice fine-tuning deep neural networks first fine tune each task (e.g. general language understanding evaluation  ...  As a result, the multi-fine-tuned BERT model outperformed the fine-tuned model only with Microsoft Research Paraphrase Corpus (MRPC), which is paraphrase data, except for one case of Stanford Sentiment  ...  According to the experimental results, finding a source fine-tuning task that helps the target fine-tuning task improves the performance of the target task.  ... 
doi:10.1049/el.2019.4183 fatcat:3soxqzqy25f2le6u4wkvcdrlv4

The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer [article]

Pavel Efimov and Leonid Boytsov and Elena Arslanova and Pavel Braslavski
2022 arXiv   pre-print
When we fine-tune a cross-lingual adjusted mBERT for a specific task (e.g., NLI), the cross-lingual adjustment of mBERT may still improve the separation between related and related words, but this works  ...  In this study, we experiment with zero-shot transfer of English models to four typologically different languages (Spanish, Russian, Vietnamese, and Hindi) and three NLP tasks (QA, NLI, and NER).  ...  Finally, we apply the trained models to the test data in four target languages in a zero-shot fashion (i.e., without fine-tuning in the target language).  ... 
arXiv:2204.06457v1 fatcat:g6suifilqjehfai6bgd3pgj7oq

Transductive Learning of Neural Language Models for Syntactic and Semantic Analysis

Hiroki Ouchi, Jun Suzuki, Kentaro Inui
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Specifically, we fine-tune language models (LMs) on an unlabeled test set to obtain test-set-specific word representations.  ...  Through extensive experiments, we demonstrate that despite its simplicity, transductive LM fine-tuning consistently improves state-ofthe-art neural models in both in-domain and out-of-domain settings.  ...  Specifically, inspired by recent findings that language model (LM)-based word representations yield large performance improvement (Devlin et al., 2019) , we fine-tune Embeddings from Language Models (  ... 
doi:10.18653/v1/d19-1379 dblp:conf/emnlp/OuchiSI19 fatcat:rmefqromzrahtb5mrez3duoaza

hULMonA: The Universal Language Model in Arabic

Obeida ElJundi, Wissam Antoun, Nour El Droubi, Hazem Hajj, Wassim El-Hajj, Khaled Shaban
2019 Proceedings of the Fourth Arabic Natural Language Processing Workshop  
TL models are pre-trained on large corpora, and then fine-tuned on taskspecific datasets.  ...  Arabic is a complex language with limited resources which makes it challenging to produce accurate text classification tasks such as sentiment analysis.  ...  Sentence-level Language Models for English In contrast to word-level representation, sentence level representation develops language model which can then be fine-tuned for a supervised downstream task  ... 
doi:10.18653/v1/w19-4608 dblp:conf/wanlp/ElJundiADHES19 fatcat:oe6climpnrhyfnusyyt6ngbn3y
« Previous Showing results 1 — 15 out of 61,913 results