2,012 Hits in 5.2 sec

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability [article]

Mahmoud Hossam, Trung Le, He Zhao, Dinh Phung
2021 arXiv   pre-print
In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task.  ...  In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack.  ...  In this paper, we propose Explain2Attack 1 , a black-box adversarial attack on text classification, that employs crossdomain interpretability to learn word importance for crafting adversarial examples.  ... 
arXiv:2010.06812v4 fatcat:uxumseq4wbgfloxs2bnao5y3q4

CRank: Reusable Word Importance Ranking for Text Adversarial Attack

Xinyi Chen, Bo Liu
2021 Applied Sciences  
To address this issue, we aim to improve the efficiency of word importance ranking, making steps towards realistic text adversarial attacks.  ...  Among these methods, word importance ranking is an essential part that generates text adversarial examples, but suffers from low efficiency for practical attacks.  ...  Positive Threat Model We study text adversarial examples against text classification under the black box setting, meaning that the attacker is not aware of the model architecture, parameters, or training  ... 
doi:10.3390/app11209570 fatcat:fabdy3b5pnds5pjizknmucrzfm

Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks [article]

Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, Ben Y. Zhao
2022 arXiv   pre-print
Blacklight detects query-based black-box attacks by detecting highly similar queries, using an efficient similarity engine operating on probabilistic content fingerprints.  ...  We propose Blacklight, a new defense against query-based black-box adversarial attacks.  ...  be vulnerable to query-based Blacklight's detection and mitigation results on query-based black-box attacks for text classification.  ... 
arXiv:2006.14042v3 fatcat:qy6fj3k3ejbxhotqizzwz4v7lq

Improved and Efficient Text Adversarial Attacks using Target Information [article]

Mahmoud Hossam, Trung Le, He Zhao, Viet Huynh, Dinh Phung
2021 arXiv   pre-print
There has been recently a growing interest in studying adversarial examples on natural language models in the black-box setting.  ...  towards the attacking agent.  ...  CONCLUSION In this paper, we studied the effect of incorporating the target model domain data and outputs on attack rates and query efficiency in state-of-the-art black-box text attacks.  ... 
arXiv:2104.13484v2 fatcat:4or24vqb2be7xcdp3lsivgxn7e

GenAttack: Practical Black-box Attacks with Gradient-Free Optimization [article]

Moustafa Alzantot, Yash Sharma, Supriyo Chakraborty, Huan Zhang, Cho-Jui Hsieh, Mani Srivastava
2019 arXiv   pre-print
Deep neural networks are vulnerable to adversarial examples, even in the black-box setting, where the attacker is restricted solely to query access.  ...  Against MNIST and CIFAR-10 models, GenAttack required roughly 2,126 and 2,568 times fewer queries respectively, than ZOO, the prior state-of-the-art black-box attack.  ...  Unlike us, (Brendel, Rauber, and Bethge 2018) focus on attacking black-box models with only partial access to the query results, however they do not address the query efficiency problem.  ... 
arXiv:1805.11090v3 fatcat:5btwodz4ybafrlzlto5y6t5ubi

TextDecepter: Hard Label Black Box Attack on Text Classifiers [article]

Sachin Saxena
2020 arXiv   pre-print
In this paper, we present a novel approach for hard-label black-box attacks against Natural Language Processing (NLP) classifiers, where no model information is disclosed, and an attacker can only query  ...  Over the years, researchers have successfully attacked image classifiers in both, white and black-box settings. However, these methods are not directly applicable to texts as text data is discrete.  ...  tells us the efficiency of the attack model.  ... 
arXiv:2008.06860v6 fatcat:n7oxsrbtsretzge6n27ncebjzm

Textual Adversarial Attacking with Limited Queries

Yu Zhang, Junan Yang, Xiaoshuai Li, Hui Liu, Kun Shao
2021 Electronics  
Compared to character- and sentence-level textual adversarial attacks, word-level attack can generate higher-quality adversarial examples, especially in a black-box setting.  ...  However, existing attack methods usually require a huge number of queries to successfully deceive the target model, which is costly in a real adversarial scenario.  ...  While the query efficiency of black-box attacks in the text domain has been widely studied, previous studies have not provided targeted research on effective solutions.  ... 
doi:10.3390/electronics10212671 fatcat:x7xxcrd2fngnbavabthu6fp6ku

Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition [article]

Marie Biolková, Bac Nguyen
2022 arXiv   pre-print
Experimental results show that NP-Attack achieves competitive results with other state-of-the-art black-box adversarial attacks while requiring a significantly smaller number of queries.  ...  Due to this limited information, existing black-box methods often require an excessive number of queries to attack a single audio example.  ...  A more practical attack treats the target ASR model as a black box, i.e., the adversary may only observe the transcribed text [12] .  ... 
arXiv:2203.09849v1 fatcat:43433x6albgebirrfyhkyndv3u

Gradient-based Adversarial Attacks against Text Transformers [article]

Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela
2021 arXiv   pre-print
We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.  ...  Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.  ...  The adversarial distribution can be sampled efficiently to query different target models in a black-box setting.  ... 
arXiv:2104.13733v1 fatcat:2zvdsicmtveyrnagzaxur2nrrq

To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers [article]

Bijeeta Pal, Shruti Tople
2020 arXiv   pre-print
On binary classification tasks trained using the GloVe teacher model, we achieve an average attack accuracy of 97% for the IMDB Movie Reviews and 80% for the Fake News Detection.  ...  Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice.  ...  CONCLUSION We present the first attack algorithms for generating adversarial inputs for text-classification tasks in a transfer-learning setting.  ... 
arXiv:2001.02438v1 fatcat:5tx2lyo44raixau4tpixllgeve

Learning-based Hybrid Local Search for the Hard-label Textual Attack [article]

Zhen Yu, Xiaosen Wang, Wanxiang Che, Kun He
2022 arXiv   pre-print
Extensive evaluations for text classification and textual entailment using various datasets and models show that our LHLS significantly outperforms existing hard-label attacks regarding the attack performance  ...  In particular, we find that the changes on prediction label caused by word substitutions on the adversarial example could precisely reflect the importance of different words.  ...  Evaluation on Attack Efficiency The attack efficiency, which often refers to the query budget for target model, plays a key role in evaluating the effectiveness of black-box attacks, since the victim could  ... 
arXiv:2201.08193v1 fatcat:2caam7rfm5dkzctcapf2yuvi4a

Towards Security Threats of Deep Learning Systems: A Survey [article]

Yingzhe He and Guozhu Meng and Kai Chen and Xingbo Hu and Jinwen He
2020 arXiv   pre-print
In particular, we focus on four types of attacks associated with security threats of deep learning: model extraction attack, model inversion attack, poisoning attack and adversarial attack.  ...  In order to unveil the security weaknesses and aid in the development of a robust deep learning system, we undertake an investigation on attacks towards deep learning, and analyze these attacks to conclude  ...  DeepWordBug [67] generate adversarial text sequences in black-box settings.  ... 
arXiv:1911.12562v2 fatcat:m3lyece44jgdbp6rlcpj6dz2gm


Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh
2017 Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec '17  
We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models.  ...  Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to  ...  Therefore, the effectiveness of such black-box adversarial attacks heavily depends on the attack transferability from the substitute model to the target model.  ... 
doi:10.1145/3128572.3140448 dblp:conf/ccs/ChenZSYH17 fatcat:6q26yubpwbhupnvq62wxp62ija

Generalized Adversarial Distances to Efficiently Discover Classifier Errors [article]

Walter Bennette, Sally Dufek, Karsten Maurer, Sean Sisti, Bunyod Tusmatov
2021 arXiv   pre-print
Given a black-box classification model and an unlabeled evaluation dataset from some application domain, efficient strategies need to be developed to evaluate the model.  ...  In this paper we propose a generalization to the Adversarial Distance search that leverages concepts from adversarial machine learning to identify predictions for which a classifier may be overly confident  ...  In the Adversarial Distance search from [7] they rely on a black-box attack for image classification called the Boundary Attack [10] .  ... 
arXiv:2102.12844v1 fatcat:5d4fv4izo5axljv5orqujjghum

Enabling Trust in Deep Learning Models: A Digital Forensics Case Study

Aditya K, Slawomir Grzonkowski, NhienAn Lekhac
2018 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)  
Hence, in this research, we design and implement a domain-independent Adversary Testing Framework (ATF) to test the security robustness of black-box DNN's.  ...  Consequently, the accuracy of artifacts found also relies on the performance of techniques used, especially DL models.  ...  [17] Black box Targeted Face Recognition Systems [18] Black box Non-targeted Text classification systems [19] Black box Non-targeted • Multiple sub models -In a black-box setting, output to a query  ... 
doi:10.1109/trustcom/bigdatase.2018.00172 dblp:conf/trustcom/KGL18 fatcat:5toez53hgfao3nwjynqkecmmo4
« Previous Showing results 1 — 15 out of 2,012 results