Filters








9,339 Hits in 3.9 sec

Continual Domain-Tuning for Pretrained Language Models [article]

Subendhu Rongali, Abhyuday Jagannatha, Bhanu Pratap Singh Rawat, Hong Yu
2021 arXiv   pre-print
However, during the pretraining phase on the target domain, the LM models may catastrophically forget the patterns learned from their source domain.  ...  Pre-trained language models (LM) such as BERT, DistilBERT, and RoBERTa can be tuned for different domains (domain-tuning) by continuing the pre-training phase on a new target domain corpus.  ...  Related Work Domain-Specific Pretraining: The task of adapting pretrained language models for a specific domain is popular in literature.  ... 
arXiv:2004.02288v2 fatcat:vjtkcl7u4fhmhg6mpgr7wcipo4

Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization [chapter]

Bruno Taillé, Vincent Guigue, Patrick Gallinari
2020 Lecture Notes in Computer Science  
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context.  ...  For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-F1 score increase in-domain against +13% out-of-domain on the WNUT dataset (The code is available  ...  Recent improvements mainly stem from using new types of representations: learned characterlevel word embeddings [9] and contextualized embeddings derived from a language model (LM) [1, 6, 14] .  ... 
doi:10.1007/978-3-030-45442-5_48 fatcat:vwx6n7qcdfhspi3knpnwcvhrq4

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling [article]

Xiaochuang Han, Jacob Eisenstein
2019 arXiv   pre-print
To address this scenario, we propose domain-adaptive fine-tuning, in which the contextualized embeddings are adapted by masked language modeling on text from the target domain.  ...  Contextualized word embeddings such as ELMo and BERT provide a foundation for strong performance across a wide range of natural language processing tasks by pretraining on large corpora of unlabeled text  ...  Thanks to the anonymous reviewers and to Ross Girshick, Omer Levy, Michael Lewis, Yuval Pinter, Luke Zettlemoyer, and the Georgia Tech Computational Linguistics Lab for helpful discussions of this work  ... 
arXiv:1904.02817v2 fatcat:p7uxula2lnhdteztqfzzw4dysy

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Xiaochuang Han, Jacob Eisenstein
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
To address this scenario, we propose domain-adaptive finetuning, in which the contextualized embeddings are adapted by masked language modeling on text from the target domain.  ...  Contextualized word embeddings such as ELMo and BERT provide a foundation for strong performance across a wide range of natural language processing tasks by pretraining on large corpora of unlabeled text  ...  Thanks to the anonymous reviewers and to Ross Girshick, Omer Levy, Michael Lewis, Yuval Pinter, Luke Zettlemoyer, and the Georgia Tech Computational Linguistics Lab for helpful discussions of this work  ... 
doi:10.18653/v1/d19-1433 dblp:conf/emnlp/HanE19 fatcat:ph6bwyoz3zab7bawpnfexhvob4

Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization [article]

Bruno Taillé, Vincent Guigue, Patrick Gallinari
2020 arXiv   pre-print
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context.  ...  For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-F1 score increase in-domain against +13% out-of-domain on the WNUT dataset  ...  Recent improvements mainly stem from using new types of representations: learned character-level word embeddings [9] and contextualized embeddings derived from a language model (LM) [1, 6, 14] .  ... 
arXiv:2001.08053v1 fatcat:p7lg5nk35jgmhpkzik7mz72xoi

Efficient Domain Adaptation of Language Models via Adaptive Tokenization [article]

Vin Sachidananda and Jason S. Kessler and Yi-an Lai
2021 arXiv   pre-print
We propose an alternative approach for transferring pretrained language models to new domains by adapting their tokenizers.  ...  further pretraining the language model on domain-specific corpora on 8 TPUs.  ...  Acknowledgements We thank Yi Zhang, William Headden, Max Harper, Chandni Singh, Anuj Ahluwalia, Sushant Sagar, Jay Patel, Sachin Hulyalkar, and the anonymous reviewers for their valuable feedback.  ... 
arXiv:2109.07460v1 fatcat:orlfo4fyhvhgrlo4b3gm7icrzy

Contextual Adapters for Personalized Speech Recognition in Neural Transducers [article]

Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann
2022 arXiv   pre-print
Using an in-house dataset, we demonstrate that contextual adapters can be applied to any general purpose pretrained ASR model to improve personalization.  ...  In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models.  ...  In order for the contextual adapters to distinguish between different catalog types and bias toward a specific one, we introduce a learnable 'type embedding'.  ... 
arXiv:2205.13660v1 fatcat:hcjsnqdfrrhkxnl3fyiy5m5y2m

Emerging Cross-lingual Structure in Pretrained Language Models [article]

Shijie Wu and Alexis Conneau and Haoran Li and Luke Zettlemoyer and Veselin Stoyanov
2020 arXiv   pre-print
for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces.  ...  why these models are so effective for cross-lingual transfer.  ...  See Tab. 1 for full results. Model Domain BPE Merges Anchors Pts Share Param.  ... 
arXiv:1911.01464v3 fatcat:zcutmu7pq5hyxmpirkf243v6jy

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems [article]

Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury
2022 arXiv   pre-print
with BERT based contextual embeddings.  ...  Our model improves further when fine-tuned with additional regularization using SpecAugment especially when speech is noisy, giving an absolute improvement as high as 8% over previous results.  ...  Seq2seq pretraining is an ASR based pretraining, the simplest form of which is to train an ASR model on a large outof-domain dataset and to fine-tune the speech encoder from the trained ASR model for downstream  ... 
arXiv:2204.05188v2 fatcat:cjoxfzbjcjb7piaayp5qnzl6kq

MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare [article]

Shaoxiong Ji, Tianlin Zhang, Luna Ansari, Jie Fu, Prayag Tiwari, Erik Cambria
2021 arXiv   pre-print
Recent advances in pretrained contextualized language representations have promoted the development of several domain-specific pretrained models and facilitated several downstream applications.  ...  pretrained in the target domain improve the performance of mental health detection tasks.  ...  The authors wish to acknowledge CSC -IT Center for Science, Finland, for computational resources.  ... 
arXiv:2110.15621v1 fatcat:e5liorqyerc23hddvnngnzxt54

Publicly Available Clinical BERT Embeddings [article]

Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott
2019 arXiv   pre-print
We demonstrate that using a domain-specific model yields performance improvements on three common clinical NLP tasks as compared to nonspecific embeddings.  ...  Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months  ...  Overall, we feel our results demonstrates the utility of using domain-specific contextual embeddings for non de-ID clinical NLP tasks.  ... 
arXiv:1904.03323v3 fatcat:2sbf755lgresfiq7hosmw6nd2e

Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study [article]

Shaoxiong Ji and Matti Hölttä and Pekka Marttinen
2021 arXiv   pre-print
This paper conducts a comprehensive quantitative analysis of various contextualized language models' performance, pretrained in different domains, for medical code assignment from clinical notes.  ...  However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering.  ...  The authors wish to acknowledge CSC -IT Center for Science, Finland, for computational resources.  ... 
arXiv:2103.06511v2 fatcat:yrcbq4n4fng65m73344jk5sepe

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, Degui Zhi
2021 npj Digital Medicine  
Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients.  ...  The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets.  ...  Specifically, we would like to acknowledge the use of Cerner Health Facts ® and the IBM Truven MarketScan™ datasets as well as the assistance provided by the UTHealth SBMI Data Service team to extract  ... 
doi:10.1038/s41746-021-00455-y pmid:34017034 fatcat:nehiu6kytvfcdfh6vfbnzpqmje

Extended Study on Using Pretrained Language Models and YiSi-1 for Machine Translation Evaluation

Chi-kiu Lo
2020 Conference on Machine Translation  
the full strength of using pretrained language models for machine translation evaluation.  ...  We present an extended study on using pretrained language models and YiSi-1 for machine translation evaluation.  ...  Although finetuning the pretrained language models for specific downstream tasks show improvements in many cases, using the pretrained language models without fine-tuning makes the MT evaluation metrics  ... 
dblp:conf/wmt/Lo20 fatcat:hcp736bstfft3bpihdq66xuql4

FinBERT: A Pretrained Language Model for Financial Communications [article]

Yi Yang, Mark Christopher Siy UY, Allen Huang
2020 arXiv   pre-print
In this work,we address the need by pretraining a financial domain specific BERT models, FinBERT, using a large scale of financial communication corpora.  ...  also accumulates large amount of financial communication text.However, there is no pretrained finance specific language models available.  ...  To this end, several domain-specific BERT models are trained and released. BioBERT pretrains a biomedical domain-specific language representation model using large-scale biomedical corpora.  ... 
arXiv:2006.08097v2 fatcat:eiijfp6xorbghibcyerjdnbofm
« Previous Showing results 1 — 15 out of 9,339 results