8,391 Hits in 5.5 sec

A Differentiable Language Model Adversarial Attack on Text Classifiers [article]

Ivan Fursov, Alexey Zaytsev, Pavel Burnyshev, Ekaterina Dmitrieva, Nikita Klyuchnikov, Andrey Kravchenko, Ekaterina Artemova, Evgeny Burnaev
2021 arXiv   pre-print
A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model.  ...  One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model.  ...  . • We propose a new black-box adversarial attack based on a masked language model (MLM) and a differentiable loss function to optimise during an attack.  ... 
arXiv:2107.11275v1 fatcat:ava6j4azlzagfm2ibbcks32n7q

CAPE: Context-Aware Private Embeddings for Private Language Learning [article]

Richard Plant, Dimitra Gkatzia, Valerio Giuffrida
2021 arXiv   pre-print
Obtaining text representations or embeddings using these models presents the possibility of encoding personally identifiable information learned from language and context cues that may present a risk to  ...  Deep learning-based language models have achieved state-of-the-art results in a number of applications including sentiment analysis, topic labelling, intent classification and others.  ...  Sentiment analysis from text reviews represents a popular task to which pre-trained language models are well suited.  ... 
arXiv:2108.12318v1 fatcat:qs254y4bdvhejhpk4uuaz33uva

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation [article]

Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Li, Jilin Chen, Alex Beutel, Ed Chi
2020 arXiv   pre-print
In this work, we present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to  ...  Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts, compared to many existing adversarial text generation approaches.  ...  language model with one or more simple attribute classifiers to guide text generation, and Shen et al. (2017) propose to achieve style transfer using non-parallel text.  ... 
arXiv:2010.02338v1 fatcat:o2liixpt4ffaxhpabiqvignbvq

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers [article]

I. Fursov, A. Zaytsev, N. Kluchnikov, A. Kravchenko, E. Burnaev
2020 arXiv   pre-print
We instead use a fine-tuning of a language model for adversarial attacks as a generator of adversarial examples.  ...  To optimize the model, we define a differentiable loss function that depends on a surrogate classifier score and on a deep learning model that evaluates approximate edit distance.  ...  For each configuration we generated 10000 adversarial examples and evaluated 3 quality metrics: the accuracy drop for a classifier, the word error rate, and our metric NAD.  ... 
arXiv:2006.11078v1 fatcat:rkq5cxoeurcltkk5dzv2otmlja

A Survey on Recent Advances in Privacy Preserving Deep Learning

Siran Yin, Leiming Yan, Yuanmin Shi, Yaoyang Hou, Yunhong Zhang
2020 Journal of Information Hiding and Privacy Protection  
Deep learning based on neural networks has made new progress in a wide variety of domain, however, it is lack of protection for sensitive information.  ...  The large amount of data used for training is easy to cause leakage of private information, thus the attacker can easily restore input through the representation of latent natural language.  ...  Here is an example of adversarial attack against NLP, the attacker eavesdropped on the text classifier, captures hidden information from the classifier and attempts to recover information about the input  ... 
doi:10.32604/jihpp.2020.010780 fatcat:4443ngibn5dbbkodwlma6u6t2a

Repairing Adversarial Texts through Perturbation [article]

Guoliang Dong, Jingyi Wang, Jun Sun, Sudipta Chattopadhyay, Xinyu Wang, Ting Dai, Jie Shi, Jin Song Dong
2021 arXiv   pre-print
text that the neural network correctly classifies.  ...  Furthermore, depending on the applied perturbation method, an adversarial text could be repaired in as short as one second on average.  ...  The column "advR" refers the attack success rate on the adversarially retrained model, the column "Ori" refers the attack success rate on original model, and the last column "Ours" is the attack success  ... 
arXiv:2201.02504v1 fatcat:htsahzlq5vadbmqzx7kh5i6rga

Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers [article]

Evan Crothers, Nathalie Japkowicz, Herna Viktor, Paula Branco
2022 arXiv   pre-print
attacks have on human judgement of text quality.  ...  To this end, we evaluate neural and non-neural approaches on their ability to detect computer-generated text, their robustness against text adversarial attacks, and the impact that successful adversarial  ...  Following an assessment of each model's relative performance at classifying computergenerated text, the models are then evaluated for robustness in the presence of text adversarial attacks. A.  ... 
arXiv:2203.07983v1 fatcat:2rawqul7lvclfhhgxvztobwbb4

Gradient-based Adversarial Attacks against Text Transformers [article]

Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela
2021 arXiv   pre-print
We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.  ...  We propose the first general-purpose gradient-based attack against transformer models.  ...  In this paper, we propose a general-purpose framework for gradient-based adversarial attacks, and apply it against transformer models on text data.  ... 
arXiv:2104.13733v1 fatcat:2zvdsicmtveyrnagzaxur2nrrq

A survey in Adversarial Defences and Robustness in NLP [article]

Shreya Goyal, Sumanth Doddapaneni, Mitesh M.Khapra, Balaraman Ravindran
2022 arXiv   pre-print
In contrast with image data, generating adversarial attacks and defending these models is not easy in NLP because of the discrete nature of the text data.  ...  These methods are not just used for defending neural networks from adversarial attacks, but also used as a regularization mechanism during training, saving the model from overfitting.  ...  Propose RanMASK, a certifiably robust defense method against text adversarial attacks based on a new randomized smoothing technique for NLP models.  ... 
arXiv:2203.06414v2 fatcat:2ukd44px35e7ppskzkaprfw4ha

A Hybrid Adversarial Attack for Different Application Scenarios

Xiaohu Du, Jie Yu, Zibo Yi, Shasha Li, Jun Ma, Yusong Tan, Qinbo Wu
2020 Applied Sciences  
Firstly, we propose a novel black-box attack method of generating adversarial examples to trick the word-level sentiment classifier, which is based on differential evolution (DE) algorithm to generate  ...  Adversarial attack against natural language has been a hot topic in the field of artificial intelligence security in recent years.  ...  Methods Black-Box Attack on Text In the black-box attack, we adopt the method based on a differential evolution (DE) algorithm to generate an adversarial example, a differential evolution algorithm with  ... 
doi:10.3390/app10103559 fatcat:oxnccc4luje5xijx44hnnfc6rq

Identifying Adversarial Attacks on Text Classifiers [article]

Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Sameer Singh, Daniel Lowd
2022 arXiv   pre-print
Overall, this represents a first step towards forensics for adversarial attacks against text classifiers.  ...  As a third contribution, we demonstrate the effectiveness of three classes of features for these tasks: text properties, capturing content and presentation of text; language model properties, determining  ...  adversarial attacks against text classifiers.  ... 
arXiv:2201.08555v1 fatcat:bknr7chhaza2bhnrwveufhot2m

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations [article]

Na Liu, Mark Dras, Wei Emma Zhang
2022 arXiv   pre-print
Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations.  ...  : we adapt one from the image processing literature (Local Intrinsic Dimensionality (LID)), and propose a novel one (MultiDistance Representation Ensemble Method (MDRE)).  ...  We construct a detection classifier by using a logistic regression model with language model scores as inputs; the model acts to learn a threshold on scores to distinguish adversarial examples.  ... 
arXiv:2204.13853v1 fatcat:ibie2udqgrbcdo67ye5eq7xjw4

Threats to Pre-trained Language Models: Survey and Taxonomy [article]

Shangwei Guo, Chunlong Xie, Jiwei Li, Lingjuan Lyu, Tianwei Zhang
2022 arXiv   pre-print
Pre-trained language models (PTLMs) have achieved great success and remarkable performance over a wide range of natural language processing (NLP) tasks.  ...  two types of model transferability (landscape, portrait) that facilitate attacks. (3) Based on the attack goals, we summarize four categories of attacks (backdoor, evasion, data privacy and model privacy  ...  For example, adversarial examples that fool a language translation model can also misdirect a text summa-rization model.  ... 
arXiv:2202.06862v1 fatcat:ofudrcza7zb6hb34w3enfxuhha

Universal Rules for Fooling Deep Neural Networks based Text Classification [article]

Di Li, Danilo Vasconcellos Vargas, Sakurai Kouichi
2019 arXiv   pre-print
Here, we go beyond attacks to investigate, for the first time, universal rules, i.e., rules that are sample agnostic and therefore could turn any text sample in an adversarial one.  ...  Hopefully, the results from this work will impact the development of yet more sample and model agnostic attacks as well as their defenses, culminating in perhaps a new age for artificial intelligence.  ...  They successfully crafted adversarial samples for DNN-based natural language text classifiers. Here, we propose a technique that aim to overcome some of the limitations of previous ones.  ... 
arXiv:1901.07132v2 fatcat:nsdxdivblvcftasb5rd2lsx65a

Adversarial Reprogramming of Text Classification Neural Networks

Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
We demonstrate the application of our model and the vulnerability of neural networks by adversarially repurposing various text-classification models including LSTM, bi-directional LSTM and CNN for alternate  ...  We propose a context based vocabulary remapping method that performs a computationally inexpensive input transformation to reprogram a victim classification model for a new set of sequences.  ...  Adversarial attacks of image-classification models often use gradient descent on an image to create a small perturbation that causes the machine learning model to mis-classify it (Szegedy et al., 2014  ... 
doi:10.18653/v1/d19-1525 dblp:conf/emnlp/NeekharaHDK19 fatcat:d3pt5crq7bfj5gmvu5y7zrwcxi
« Previous Showing results 1 — 15 out of 8,391 results