Towards Hidden Backdoor Attacks on Natural Language Processing Models

Irtiza, Saquib

Towards Hidden Backdoor Attacks on Natural Language Processing Models

Files

IRTIZA-THESIS-2021.pdf (1.84 MB)

Date

2021-05-05

Authors

Irtiza, Saquib

URI

https://hdl.handle.net/10735.1/9327

Abstract

Over the years, machine learning techniques have been used in a wide variety of security sensitive applications due to the high reliability and accuracy of its results. But recent findings in the domain of adversarial machine learning have shown that such deep learning models could be potentially vulnerable to attacks. A backdoor attack is one such attack where malicious data containing a predefined perturbation is added to the training data so that when the model is trained on it, a backdoor is created. This backdoor is generally hidden and can only be activated when the attacker adds the perturbation to the test data. In the domain of natural language processing, such poisoned data is generally created by adding a sequence of trigger words and changing the label of the data to the target class. But these attacks can be easily detected by visual inspection since the context of the poisoned text does not resemble its label. That is why to hide the poisoned data better, we have come up with a novel approach to generate poisoned data that modifies the text in such a way that the label fits the context of the poisoned text. Our attack algorithm called SentMod can achieve an attack success ratio of 97% by poisoning only 2% of the training data. We run extensive experiments on multiple deep learning models using different datasets to verify the effectiveness of our attack method.

Keywords

Machine learning, Malware (Computer software), Computer algorithms

Collections

UTD Theses and Dissertations

Full item page

Towards Hidden Backdoor Attacks on Natural Language Processing Models

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections