9,686 Hits in 4.2 sec

Universal Text Representation from BERT: An Empirical Study [article]

Xiaofei Ma, Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang
2019 arXiv   pre-print
We present a systematic investigation of layer-wise BERT activations for general-purpose text representations to understand what linguistic information they capture and how transferable they are across  ...  and answer pairs are critical for those hard tasks.  ...  In this paper, we conducted an empirical study of layer-wise activations of BERT as generalpurpose text embeddings.  ... 
arXiv:1910.07973v2 fatcat:4tidsvmzsrbjtjfnvxovolpilu

What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study [article]

Ayush Kumar, Mukuntha Narayanan Sundararaman, Jithendra Vepa
2021 arXiv   pre-print
We probe BERT based language models (BERT, RoBERTa) trained on spoken transcripts to investigate its ability to understand multifarious properties in absence of any speech cues.  ...  Empirical results indicate that LM is surprisingly good at capturing conversational properties such as pause prediction and overtalk detection from lexical tokens.  ...  To evaluate the efficacy of the probing tasks and study the transferability of LM to unseen data, we also evaluate LM fine-tuned with multi-task learning framework on two external datasets: Switchboard  ... 
arXiv:2109.09105v2 fatcat:h3kmq32fcfbgxbseddrrxyljfe

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study

Fei Li, Yonghao Jin, Weisong Liu, Bhanu Pratap Singh Rawat, Pengshan Cai, Hong Yu
2019 JMIR Medical Informatics  
However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization.  ...  EhrBERT achieved 40.95% F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT)  ...  Acknowledgments This work was supported by two grants from the National Institutes of Health (grant numbers: 5R01HL125089 and 5R01HL135219) and an Investigator-Initiated Research grant from the Health  ... 
doi:10.2196/14830 pmid:31516126 pmcid:PMC6746103 fatcat:fc4kixcjdbhazdi77rjchmlmnq

Active Learning for BERT: An Empirical Study

Liat Ein-Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets.  ...  Active Learning (AL) is a ubiquitous paradigm to cope with data scarcity.  ...  not study AL for BERT.  ... 
doi:10.18653/v1/2020.emnlp-main.638 fatcat:cmz6hjtcxra63msz4le6d25hi4

Specializing Multilingual Language Models: An Empirical Study [article]

Ethan C. Chau, Noah A. Smith
2021 arXiv   pre-print
In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration.  ...  We also thank Benjamin Muller for insightful discussions and providing details about transliteration methods and baselines. Finally, we thank the anonymous reviewers for their helpful remarks.  ...  Acknowledgments We thank Jungo Kasai, Phoebe Mulcaire, and members of UW NLP for their helpful comments on preliminary versions of this paper.  ... 
arXiv:2106.09063v3 fatcat:h375kz7iwfcahnzqli6tl6q5h4

An Empirical Study on Transfer Learning for Privilege Review [article]

Haozhen Zhao, Shi Ye, Jingchao Yang
2021 arXiv   pre-print
In this paper, we study both traditional machine learning models and deep learning models based on BERT for privilege document classification tasks in legal document review, and we examine the effectiveness  ...  Our results show that BERT model outperforms the industry standard logistic regression algorithm and transfer learning models can achieve decent performance on datasets in same or close domains.  ...  In this paper, we empirically study three kinds of machine learning models in privilege prediction and investigate the transferability of models trained on different datasets using different machine learning  ... 
arXiv:2112.08606v1 fatcat:tdujs3aflzacfakulrrpa5duwe

An empirical analysis of excess interbank liquidity: a case study of Pakistan

Muhammad Omer, Jakob De Haan, Bert Scholtens
2015 Applied Economics  
Empirical studies frequently use the Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests.  ...  For empirical studies of unit root tests with structural breaks, we refer to Banerjee et al. (1992 ), Christiano (1992 , De Haan and Zelhorst (1994), Perron (2005) ,Glyn et al. (2007), and Carrion-Silvestre  ... 
doi:10.1080/00036846.2015.1034842 fatcat:nb4nmlj4snczbpbgzsb2tysz3i

An Empirical Study of Factors Affecting Language-Independent Models [article]

Xiaotong Liu, Yingbei Tong, Anbang Xu, Rama Akkiraju
2019 arXiv   pre-print
In this work, we empirically investigate the factors affecting language-independent models built with multilingual representations, including task type, language set and data resource.  ...  We experiment language-independent models with many different languages and show that they are more suitable for typologically similar languages.  ...  Related Works Multilingual representation learning has been an active area of research, starting from word embeddings alignment that uses small dictionaries to align word representations from different  ... 
arXiv:1912.13106v1 fatcat:licpf7z2s5b5xmwd2bxu2ne6ta

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition [article]

Mingyi Liu, Zhiying Tu, Tong Zhang, Tonghua Su, Zhongjie Wang
2020 arXiv   pre-print
Previous studies have demonstrated that active learning could elaborately reduce the cost of data annotation, but there is still plenty of room for improvement.  ...  Then we propose an uncertainty-based active learning strategy called Lowest Token Probability (LTP) which combines the input and output of CRF to select informative instance.  ...  Active learning Active learning strategies have been well studied [11, 12] , [13] .  ... 
arXiv:2001.02524v2 fatcat:imy6jxj4dzd65pzny7qwunoomi

Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking [article]

Gustavo Penha, Claudia Hauff
2019 arXiv   pre-print
To address both challenges and determine whether curriculum learning is beneficial for neural ranking models, we need large-scale datasets and a retrieval task that allows us to conduct a wide range of  ...  For this purpose, we resort to the task of conversation response ranking: ranking responses given the conversation history.  ...  For BERT loss we consider the loss of the model to be an indicator of the difficulty of an instance.  ... 
arXiv:1912.08555v1 fatcat:kz4g5kfgdbbenatw2ukw5zqhzi

Utterance-level Dialogue Understanding: An Empirical Study [article]

Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
2020 arXiv   pre-print
Specifically, we employ various perturbations to distort the context of a given utterance and study its impact on the different tasks and baselines.  ...  The recent abundance of conversational data on the Web and elsewhere calls for effective NLP systems for dialog understanding.  ...  We are thankful to Rishabh Bhardwaj for his time and insightful comments toward this work.  ... 
arXiv:2009.13902v5 fatcat:ucbe32erofekloa5eqqapzhdla

I-BERT: Integer-only BERT Quantization [article]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
2021 arXiv   pre-print
Based on lightweight integer-only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT inference without any floating  ...  We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to the full-precision baseline.  ...  Ablation Studies Here, we perform an ablation study to show the benefit of i-GELU as compared to other approximation methods for GELU, and in particular h-GELU in Eq. 6.  ... 
arXiv:2101.01321v3 fatcat:dhja6v44hnha5gvbzctmqcxwtq

Ceo Locus of Control and Small Firm Performance: an Integrative Framework and Empirical Test

Christophe Boone, Bert Brabander, Arjen Witteloostuijn
1996 Journal of Management Studies  
To overcome this fragmentation and polarization, we provide and empirically test an integrative framework based on previously tested hypotheses on the impact of CEO locus of control.  ...  In addition, distinct substreams have emerged in which intricately related phenomena are studied separately.  ...  An individual believing in personal control and acting consistently must actively search for laws ruling the way in which the environment reacts to her/his behaviour.  ... 
doi:10.1111/j.1467-6486.1996.tb00814.x fatcat:qjhdeefijzbpfjahnj7qgnsx2y

User Representation Learning for Social Networks: An Empirical Study

Ibrahim Riza Hallac, Betul Ay, Galip Aydin
2021 Applied Sciences  
In this study, we presented one of the most comprehensive studies in the literature in terms of learning high-quality social media user representations by leveraging state-of-the-art text representation  ...  In addition, various experiments were performed for investigating the performance of text representation techniques and concepts including word2vec, doc2vec, Glove, NumberBatch, FastText, BERT, ELMO, and  ...  We envision that the attempts for incorporating a wide range of social media activity data such as comment, like, retweet, mention, follow, etc. in user representation learning are at an early stage and  ... 
doi:10.3390/app11125489 fatcat:igvmivf2jnb4jj2olkoammdala

Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic [article]

Ye Jiang, Xingyi Song, Carolina Scarton, Ahmet Aker, Kalina Bontcheva
2021 arXiv   pre-print
The dataset not only allows social behaviours analysis but also suitable for both evidence-based or non-evidence-based misinformation classification task.  ...  However, the study of the social behaviours related to misinformation is often neglected.  ...  Several studies apply machine learning methods to model the semantic feature in the misinformation.  ... 
arXiv:2106.11702v4 fatcat:wv47li7u4nb65oruqybn2piohy
« Previous Showing results 1 — 15 out of 9,686 results