6,490 Hits in 3.9 sec

A Survey of Data Augmentation Approaches for NLP [article]

Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy
2021 arXiv   pre-print
In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.  ...  We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches.  ...  Conclusion In this paper, we presented a comprehensive and structured survey of data augmentation for natural language processing (NLP).  ... 
arXiv:2105.03075v5 fatcat:fplvosp5h5g5xk7n7yxm7ay7he

A Survey of Data Augmentation Approaches for NLP

Steven Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy
2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021   unpublished
In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.  ...  We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches.  ...  Conclusion In this paper, we presented a comprehensive and structured survey of data augmentation for natural language processing (NLP).  ... 
doi:10.18653/v1/2021.findings-acl.84 fatcat:z2oolb3hovfdzio2xgm3tstht4

Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study

Timothy C Guetterman, Tammy Chang, Melissa DeJonckheere, Tanmay Basu, Elizabeth Scruggs, VG Vinod Vydiswaran
2018 Journal of Medical Internet Research  
Methods: We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription  ...  Objective: The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods.  ...  Acknowledgments We wish to acknowledge Melissa Plegue for her assistance in compiling demographic data of respondents.  ... 
doi:10.2196/jmir.9702 pmid:29959110 pmcid:PMC6045788 fatcat:6qgkjpt47vbl7inyassale3cwq

Meta Learning for Natural Language Processing: A Survey [article]

Hung-yi Lee, Shang-Wen Li, Ngoc Thang Vu
2022 arXiv   pre-print
Efficacy of approaches has been shown in many NLP tasks, but there is no systematic survey of these approaches in NLP, which hinders more researchers from joining the field.  ...  Then we summarize task construction settings and application of meta-learning for various NLP problems and review the development of meta-learning in NLP community.  ...  Data augmentation becomes task augmentation because the "training examples" in meta-learning are a collection of tasks.  ... 
arXiv:2205.01500v2 fatcat:hx27xnkkvrah5m2w65fvrf4iei

Putting Humans in the Natural Language Processing Loop: A Survey [article]

Zijie J. Wang, Dongjin Choi, Shenyu Xu, Diyi Yang
2021 arXiv   pre-print
We present a survey of HITL NLP work from both Machine Learning (ML) and Human-Computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks  ...  There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself.  ...  Data Augmentation One popular approach is to consider the feedback as a new ground truth data sample. For example, a user's answer to a model's question can be a data sample to retrain a QA model.  ... 
arXiv:2103.04044v1 fatcat:bnwj25lwofcwrnjtvlta64niq4

Measure and Improve Robustness in NLP Models: A Survey [article]

Xuezhi Wang, Haohan Wang, Diyi Yang
2022 arXiv   pre-print
In this paper, we aim to provide a unifying survey of how to define, measure and improve robustness in NLP.  ...  Correspondingly, we present mitigation strategies that are data-driven, model-driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models.  ...  Acknowledgements The authors would like to thank reviewers for their helpful insights and feedback. This work is funded in part by a grant from Google.  ... 
arXiv:2112.08313v2 fatcat:eomiyatwozbs7oih255ssiwf54

Astraea: Grammar-based Fairness Testing [article]

Ezekiel Soremekun and Sakshi Udeshi and Sudipta Chattopadhyay
2022 arXiv   pre-print
ASTRAEA was evaluated on 18 software systems that provide three major natural language processing (NLP) services. In our evaluation, ASTRAEA generated fairness violations with a rate of ~18%.  ...  We propose a grammar-based fairness testing approach (called ASTRAEA) which leverages context-free grammars to generate discriminatory inputs that reveal fairness violations in software systems.  ...  This work was partially supported by the University of Luxembourg, Ezekiel Soremekun acknowledges the financial support of the Institute for Advanced Studies of the University of Luxembourg through an  ... 
arXiv:2010.02542v5 fatcat:n6ka7pbchrdczpnsgcjpomybfm

Recent Advances in Retrieval-Augmented Text Generation

Deng Cai, Yan Wang, Lemao Liu, Shuming Shi
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Retrieval-augmented text generation has already attracted increasing attention from both the NLP and IR community.  ...  Any audience who may be interested in recent advances of natural language generation, information retrieval, dialogue systems, machine translation, etc, would find it very inspiring and valuable in attending  ...  Finally, as the conclusion, we also point out some limitations and shortcomings for recent approaches such that it will be easier for participants to push forward the research about retrieval-augmented  ... 
doi:10.1145/3477495.3532682 fatcat:qznlox35afbytmasjx45gwdbfi

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching [article]

Zhengxiang Wang
2021 arXiv   pre-print
To investigate the role of linguistic knowledge in data augmentation (DA) for Natural Language Processing (NLP), particularly, whether more linguistic knowledge leads to a better DA approach, we designed  ...  trained on them to mediate the negative impact of false matching augmented text pairs and improve performances, a limitation of random text editing perturbations used a DA approach.  ...  A Survey of Data Augmentation Approaches for NLP. ArXiv:2105.03075 [Cs]. Hou, Y., Liu, Y., Che, W., & Liu, T. (2018).  ... 
arXiv:2111.14709v2 fatcat:4yvzhul4njcx7b32btka3juy6u

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios [article]

Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow
2021 arXiv   pre-print
A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting.  ...  As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings.  ...  DAGA: Data augmentation with a generation approach for low-resource tagging Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. tasks.  ... 
arXiv:2010.12309v3 fatcat:26dwmlkmn5auha2ob2qdlrvla4

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias [article]

Anoop K., Manjary P. Gangan, Deepak P., Lajish V. L
2022 arXiv   pre-print
Bias in NLP is found to originate from latent historical biases encoded by humans into textual data which gets perpetuated or even amplified by NLP algorithm.  ...  We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur in these models, and various ways in which these biases could be quantified and mitigated  ...  Data augmentation Data augmentation techniques debias the training corpus by supplying additional data to support the target groups with comparatively fewer data in the corpora and thereby creating a balanced  ... 
arXiv:2204.10365v1 fatcat:6stysk6km5aqflgx3hw3qvw5mq

A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods [article]

Zhihan Zhang, Wenhao Yu, Mengxia Yu, Zhichun Guo, Meng Jiang
2022 arXiv   pre-print
In this survey, we review recent advances of multi-task learning methods in NLP, with the aim of summarizing them into two general multi-task training methods based on their task relatedness: (i) joint  ...  We present examples in various NLP downstream applications, summarize the task relationships and discuss future directions of this promising topic.  ...  There are a great number of tasks in NLP, from syntax parsing to information extraction, from machine translation to question answering: each requires a model dedicated to learning from data.  ... 
arXiv:2204.03508v1 fatcat:xgyp3mwcc5aolgh3bsoxtov2oi

Text Data Augmentation for Deep Learning

Connor Shorten, Taghi M. Khoshgoftaar, Borko Furht
2021 Journal of Big Data  
We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data.  ...  In this survey, we consider how the Data Augmentation training strategy can aid in its development.  ...  Opinions, findings, conclusions, or recommendations in this paper are the authors' and do not reflect the views of the NSF.  ... 
doi:10.1186/s40537-021-00492-0 fatcat:bcbaqkpicnd6dcwc34pdijosby

An Analysis of Simple Data Augmentation for Named Entity Recognition [article]

Xiang Dai, Heike Adel
2020 arXiv   pre-print
Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem.  ...  Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks.  ...  We survey previously used data augmentation techniques for sentence-level and sentence-pair NLP tasks and adapt some of them for the NER task.  ... 
arXiv:2010.11683v1 fatcat:zsv2uqqmafej3jq7aip53cuqkm

A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives [article]

Nils Rethmeier, Isabelle Augenstein
2021 arXiv   pre-print
However, in NLP, automated creation of text input augmentations is still very challenging because a single token can invert the meaning of a sentence.  ...  data-efficiency and specific NLP end-tasks.  ...  Compared to input-input models described below, these approaches allow for encoding a large number of augmented views, i.e. labels, very compute efficiently via a small label-encoder.  ... 
arXiv:2102.12982v1 fatcat:ivzglgl3zvczddywwwjdqewkmi
« Previous Showing results 1 — 15 out of 6,490 results