Filters








2,750 Hits in 4.9 sec

Natural Backdoor Attack on Text Data [article]

Lichao Sun
2021 arXiv   pre-print
In this paper, we first propose the natural backdoor attacks on NLP models.  ...  Moreover, we exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.  ...  We are going to evaluate our attack approach on other NLP applications and study the defense against natural backdoor attacks.  ... 
arXiv:2006.16176v4 fatcat:xsbpf6dfgvgqjhyx5mktei5h7i

Textual Backdoor Defense via Poisoned Sample Recognition

Kun Shao, Yu Zhang, Junan Yang, Hui Liu
2021 Applied Sciences  
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%.  ...  In order to enhance the natural language processing model's defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition.  ...  Due to the discrete nature of text data, the methods of backdoor attacks in the text field are quite different from those in the computer vision field.  ... 
doi:10.3390/app11219938 fatcat:gubtkohgabglri2ueo42coi5f4

Hidden Backdoors in Human-Centric Language Models [article]

Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, Jialiang Lu
2021 arXiv   pre-print
In this paper, we create covert and natural triggers for textual backdoor attacks, hidden backdoors, where triggers can fool both modern language models and human inspection.  ...  , and finally 91.12% ASR against QA updated with only 27 poisoning data samples on a model previously trained with 92,024 samples (0.029%).  ...  PRELIMINARIES In this section, we describe backdoor attacks on Natural Language Processing (NLP) models and present preliminary backgrounds for our hidden backdoor attacks.  ... 
arXiv:2105.00164v3 fatcat:pgooo3npujf7pm6eu25uyraucm

Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing [article]

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi
2022 arXiv   pre-print
The most threatening backdoor attack is the stealthy backdoor, which defines the triggers as text style or syntactic.  ...  It has been shown that natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack, which utilizes a 'backdoor trigger' paradigm to mislead the models  ...  Formulation of Backdoor Attack Without loss of generality, we take the typical text classification model as the victim model to formalize textual backdoor attacks based on training data poisoning, and  ... 
arXiv:2201.02993v2 fatcat:jq46uakjnffvhgbcxh5abxuprq

Textual Backdoor Attacks with Iterative Trigger Injection [article]

Jun Yan, Vansh Gupta, Xiang Ren
2022 arXiv   pre-print
The backdoor attack has become an emerging threat for Natural Language Processing (NLP) systems.  ...  Experiments on sentiment analysis and hate speech detection show that our proposed attack is both stealthy and effective, raising alarm on the usage of untrusted training data.  ...  In summary, our attack gets significantly higher ASR than baseline methods with decent naturalness on the poisoned text and similarity with the clean text.  ... 
arXiv:2205.12700v1 fatcat:6fgw7ywi2vfmflhtmfw3jvibhm

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models [article]

Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, Chun Fan
2021 arXiv   pre-print
Previous NLP backdoor attacks mainly focus on some specific tasks. This makes those attacks less general and applicable to other kinds of NLP models and tasks.  ...  However, NLP models have been shown to be vulnerable to backdoor attacks, where a pre-defined trigger word in the input text causes model misprediction.  ...  Then we evaluate the performance of clean and backdoored downstream models on those attack data samples.  ... 
arXiv:2110.02467v1 fatcat:fekccp75frauba4fedciefpnni

Can Adversarial Weight Perturbations Inject Neural Backdoors? [article]

Siddhant Garg, Adarsh Kumar, Vibhor Goel, Yingyu Liang
2020 arXiv   pre-print
Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input.  ...  We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks.  ...  [4] consider backdoor attacks through data poisoning attacks.  ... 
arXiv:2008.01761v1 fatcat:fdjal2xzffbpzo23aodir6fx5y

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification [article]

Chuanshuai Chen, Jiazhu Dai
2021 arXiv   pre-print
Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification.  ...  LSTM-based text classification by data poisoning.  ...  Backdoor attack which is a malicious attack on training data has been reported as a new threat to neural networks.  ... 
arXiv:2007.12070v3 fatcat:4yjoxlskmjdvlc6yfkdr52fpfe

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models [article]

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun
2021 arXiv   pre-print
Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing  ...  Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks  ...  Universal adversarial attacks with natural triggers for text classification. arXiv preprint arXiv:2005.00174. Lichao Sun. 2020. Natural backdoor attack on text data.  ... 
arXiv:2110.07831v1 fatcat:r2tdsrtrafhsjhqlrrouiykjvi

Dynamic Backdoors with Global Average Pooling [article]

Stefanos Koffas and Stjepan Picek and Mauro Conti
2022 arXiv   pre-print
Outsourced training and machine learning as a service have resulted in novel attack vectors like backdoor attacks.  ...  In this work, we are the first to show that dynamic backdoor attacks could happen due to a global average pooling layer without increasing the percentage of the poisoned training data.  ...  One of them is the backdoor attack [4] . A backdoored model misclassifies trigger-stamped inputs to an attacker-chosen target but operates normally in any other case.  ... 
arXiv:2203.02079v1 fatcat:qn6fxo5po5d7dl5ri32v4obq3m

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks [article]

Yangyi Chen, Fanchao Qi, Zhiyuan Liu, Maosong Sun
2021 arXiv   pre-print
When a deep neural model is injected with a backdoor, it will behave normally on standard inputs but give adversary-specified predictions once the input contains specific backdoor triggers.  ...  Current textual backdoor attacks have poor attack performance in some tough situations. In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful.  ...  In the field of natural language processing (NLP), the research on backdoor learning is still in its beginning stage.  ... 
arXiv:2110.08247v1 fatcat:fevl3baaefhflnmnnpcyrapnju

BadNL: Backdoor Attacks Against NLP Models [article]

Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang
2020 arXiv   pre-print
Previous backdoor attacks mainly focus on computer vision tasks.  ...  In this paper, we present the first systematic investigation of the backdoor attack against models designed for natural language processing (NLP) tasks.  ...  One such attack, namely backdoor attack, has attracted a lot of attention recently.  ... 
arXiv:2006.01043v1 fatcat:a627azfbfzam5ck4sx6gfyye34

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer [article]

Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun
2021 arXiv   pre-print
Text style is a feature that is naturally irrelevant to most NLP tasks, and thus suitable for adversarial and backdoor attacks.  ...  Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer -- the attack success rates can exceed 90% without much effort.  ...  Backdoor Attacks on Text Research into backdoor attacks on text is still in the beginning stages.  ... 
arXiv:2110.07139v1 fatcat:4fzkkr4xfrdovkbkcfpud7fyna

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning [article]

Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu
2021 arXiv   pre-print
The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may provide hints for  ...  Pre-Trained Models have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers.  ...  This work was supported by the National Key Research and Development Program of China (No. 2020AAA0106702) and National Natural Science Foundation of China (No. 62022027).  ... 
arXiv:2108.13888v1 fatcat:ylmwogxaq5fldjerpgxhxqseua

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning [article]

Jinyuan Jia and Yupei Liu and Neil Zhenqiang Gong
2021 arXiv   pre-print
In this work, we propose BadEncoder, the first backdoor attack to self-supervised learning.  ...  ., Google's image encoder pre-trained on ImageNet and OpenAI's Contrastive Language-Image Pre-training (CLIP) image encoder pre-trained on 400 million (image, text) pairs collected from the Internet.  ...  Text: Some studies [51, 62, 63] showed that natural language classifiers are also vulnerable to backdoor attacks. For instance, Zhang et al.  ... 
arXiv:2108.00352v1 fatcat:s7minotidfacrgetljasu2hs34
« Previous Showing results 1 — 15 out of 2,750 results