8,641 Hits in 2.7 sec

Selecting Informative Contexts Improves Language Model Finetuning [article]

Richard Antonello, Nicole Beckage, Javier Turek, Alexander Huth
2021 arXiv   pre-print
During fine-tuning, this learner selects informative examples and skips uninformative ones.  ...  Here we present a general fine-tuning method that we call information gain filtration for improving the overall training efficiency and final performance of language model fine-tuning.  ...  Introduction Language modeling is the task of generating language from context.  ... 
arXiv:2005.00175v2 fatcat:qmrcr5qeircf7bc6sdsze6jzae

Go Forth and Prosper: Language Modeling with Ancient Textual History [article]

Rik Koncel-Kedziorski, Noah A. Smith
2021 arXiv   pre-print
We introduce a technique for improving document-level language models (LM) by leveraging "ancient history": text that is outside the LM's current context window.  ...  The selected text spans are then copied directly into the LM's context window, replacing less predictive spans.  ...  On Line 3 of Table 1 we see the impact of finetuning. Finetuning to the in-domain data strongly improves model perplexity compared to the off-theshelf GPT2 small model.  ... 
arXiv:2104.08742v1 fatcat:lj5cdfgkovcn7arvuh4grsen2q

Quantifying the Contextualization of Word Representations with Semantic Class Probing [article]

Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Schütze
2020 arXiv   pre-print
Quantifying contextualization helps in understanding and utilizing pretrained language models.  ...  Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.  ...  In addition, since pretrained language models in practice need to be finetuned on downstream tasks (Devlin et al., 2019; , we further investigate the interactions between finetuning and contextualization  ... 
arXiv:2004.12198v2 fatcat:crnslqh4jjb5bhizczlpdmtgay

Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma

Amara Tariq, Omar Kallas, Patricia Balthazar, Scott Jeffery Lee, Terry Desser, Daniel Rubin, Judy Wawira Gichoya, Imon Banerjee
2022 Journal of Biomedical Semantics  
Method We present a concept of similar domain adaptation where we transfer inter-institutional language models (context-dependent and context-independent) between two different modalities (ultrasound and  ...  However, transferring language models requires special attention since cross-domain vocabularies (e.g. between two different modalities MR and US) do not always overlap as the pixel intensity range overlaps  ...  . 5 Word2Vec Language Spaces; (a): US Language Model, (b): US-finetuned Language Model, c): New words in US-finetuned Language Model, (d): MR Language Model, (e): MR-finetuned Language Model, (f): New  ... 
doi:10.1186/s13326-022-00262-8 pmid:35197110 pmcid:PMC8867666 fatcat:zsh5chrib5am5i6xnrrxoludye

Copenhagen at

Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein
2018 Proceedings of the  
We approach this with an encoder-decoder architecture over character sequences with three core innovations, all contributing to an improvement in performance: (1) a wide context window; (2) a multi-task  ...  learning approach with the auxiliary task of MSD prediction; (3) training models in a multilingual fashion.  ...  Finally, monolingual finetuning improves accuracy across the board, as one would expect, by 2.72% on average.  ... 
doi:10.18653/v1/k18-3011 dblp:conf/conll/Kementchedjhieva18a fatcat:odw2yw2ppzhqpciusn42m7xbqa

Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining? [article]

Subhabrata Dutta, Jeevesh Juneja, Dipankar Das, Tanmoy Chakraborty
2022 arXiv   pre-print
language modeling task.  ...  Furthermore, we introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method while leveraging on the discourse context.  ...  We trained our models for a total of 10 epochs on sMLM task, while saving checkpoints after each epoch. We used Adam optimizer with a learning rate of 10 −6 .  ... 
arXiv:2203.12881v1 fatcat:wdx5zxqbjffd3nb6bnjscpdtai

Transformer Based Language Models for Similar Text Retrieval and Ranking [article]

Javed Qadrud-Din, Ashraf Bah Rabiou, Ryan Walker, Ravi Soni, Martin Gajek, Gabriel Pack, Akhil Rangaraj
2020 arXiv   pre-print
Recent applications of transformer-based neural language models to text retrieval and ranking problems have been very promising, but still involve a two-step process in which result candidates are first  ...  Most approaches for similar text retrieval and ranking with long natural language queries rely at some level on queries and responses having words in common with each other.  ...  Prior to pretrained transformers, Neural Information Retrieval (NIR) models were shown to produce improvements only over weak baselines [22] .  ... 
arXiv:2005.04588v2 fatcat:agt2rv3zbrgh3nkkc7uzsv3ouq

Cross-Thought for Sentence Encoder Pre-training [article]

Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu
2020 arXiv   pre-print
Our proposed approach also achieves new state of the art on HotpotQA (full-wiki setting) by improving intermediate information retrieval performance.  ...  Instead of using the original signals of full sentences, we train a Transformer-based sequence encoder over a large set of short sequences, which allows the model to automatically select the most useful  ...  datasets: • Language Model (LM) (Radford et al., 2018) : The task is to predict the probability of the next word based on given context.  ... 
arXiv:2010.03652v1 fatcat:fb5f7fx4ufgbfce6tj23m6ct3u

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models [article]

Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze
2020 arXiv   pre-print
We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.  ...  Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.  ...  We use this estimator to train selective masks for pretrained language model parameters.  ... 
arXiv:2004.12406v2 fatcat:n4ao5uyodvgfbedoqxtm5khhly

Copenhagen at CoNLL--SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding [article]

Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein
2018 arXiv   pre-print
We approach this with an encoder-decoder architecture over character sequences with three core innovations, all contributing to an improvement in performance: (1) a wide context window; (2) a multi-task  ...  learning approach with the auxiliary task of MSD prediction; (3) training models in a multilingual fashion.  ...  Finally, monolingual finetuning improves accuracy across the board, as one would expect, by 2.72% on average.  ... 
arXiv:1809.01541v1 fatcat:7jwunrdi7jfhndsvl2u6of7koe

DEEP: DEnoising Entity Pre-training for Neural Machine Translation [article]

Junjie Hu, Hiroaki Hayashi, Kyunghyun Cho, Graham Neubig
2021 arXiv   pre-print
Besides, we investigate a multi-task learning strategy that finetunes a pre-trained neural machine translation model on both entity-augmented monolingual data and parallel data to further improve entity  ...  Earlier named entity translation methods mainly focus on phonetic transliteration, which ignores the sentence context for translation and is limited in domain and language coverage.  ...  In this paper, without changing model architectures, we focus on data augmentation methods to improve name entity translation within context.  ... 
arXiv:2111.07393v1 fatcat:a4iworhlrzcnfoplmch5jx6g2m

Training Question Answering Models From Synthetic Data [article]

Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
2020 arXiv   pre-print
This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic  ...  Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data.  ...  Additionally, modeling how humans ask questions can be used to improve search and information retrieval in query-conditional semantic information retrieval.  ... 
arXiv:2002.09599v1 fatcat:ag6llpnu75dfro76ynoexyt3li

Transfer Learning for Sequence Generation: from Single-source to Multi-source [article]

Xuancheng Huang, Jingfang Xu, Maosong Sun, Yang Liu
2021 arXiv   pre-print
conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient.  ...  Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.  ...  Recently, as pretraining language models that take advantage of massive unlabeled data have proven to improve natural language understanding (NLU) and generation tasks substantially (Devlin et al., 2019  ... 
arXiv:2105.14809v1 fatcat:cvf4ughuuzhbhh2dajkfkin254

Evaluation of related news recommendations using document similarity methods

Marko Pranjić, Vid Podpečan, Marko Robnik-Šikonja, Senja Pollak
2020 Zenodo  
Such news articles contain more context and background information and provide a richer experience to the reader.  ...  Our results show that the tf-idf weighting applied to bag-of-words document representation offers better matching with manually selected links by journalist than more sophisticated approaches, such as  ...  The results of this publication reflects only the author's view and the Commission is not responsible for any use that may be made of the information it contains. References  ... 
doi:10.5281/zenodo.4059710 fatcat:nhauxf25ezfs3fahvhjf2xeuty

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task [article]

Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita
2020 arXiv   pre-print
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques: document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional  ...  We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning.  ...  To further expose the model to the direction difference and improve the effect of unidirectional translation, we further finetune the bidirectional pre-trained model on the bilingual data.  ... 
arXiv:2010.05122v1 fatcat:rlgy4zy7pnawfhly2pn6gi74ce
« Previous Showing results 1 — 15 out of 8,641 results