A Multi-Domain Trojan Detector for Deep Neural Networks

Asokraj, Surudhi

A Multi-Domain Trojan Detector for Deep Neural Networks

dc.contributor.advisor	Poovendran, Radha
dc.contributor.author	Asokraj, Surudhi
dc.date.accessioned	2023-04-17T18:03:31Z
dc.date.issued	2023-04-17
dc.date.submitted	2023
dc.description	Thesis (Master's)--University of Washington, 2023
dc.description.abstract	Backdoor attacks have been demonstrated to compromise the functioning of machine learning models that utilize deep neural networks (DNNs). An adversary carrying out a backdoor attack embeds a predefined perturbation called a Trojan trigger into a small subset of input samples. The DNN can then be trained in a manner such that the presence of the trigger in the input results in an output label that is different from the correct label. At the same time, outputs of the DNN corresponding to inputs without the trigger remain unaffected. Backdoor attacks, where an attacker can negatively affect the DNN's behavior, might have severe repercussions in safety-critical applications. Existing defenses in the literature against backdoor attacks involve pruning or retraining DNN models, which can be computationally expensive. In addition, researchers have demonstrated the success of these solutions on input domains based on images. The performance of such defenses on other inputs needs to be understood better. In this thesis, we propose and develop MDTD, a multi-domain Trojan detector. MDTD for DNNs has several distinguishing characteristics, including (i) not requiring retraining DNN models (ii) not requiring knowledge of the trigger or the embedding strategy of the attacker, (iii) is computationally inexpensive (iv) capable of being applied to image and graph-based inputs. To the best of our knowledge, MDTD is the first Trojan detection mechanism proposed for graph-based inputs. MDTD uses the insight that input samples containing a Trojan trigger are located relatively further away from a decision boundary than clean input samples. Initially, MDTD estimates the distance to a decision boundary using adversarial learning methods. These methods estimate the smallest magnitude of noise required for the model to misclassify a sample. MDTD uses this information to infer whether a given sample is Trojaned or not. More precisely MDTD learns a threshold for the distance to the decision boundary using a small set of clean labeled samples and uses this threshold to flag a sample as possibly Trojaned. We evaluate MDTD against state-of-the-art (SOTA) Trojan detection methods across five image-based datasets - CIFAR100, CIFAR10, GTSRB, SVHN and Flowers102- and four graph-based datasets - AIDS, WinMal, Toxicant and COLLAB. Our results show that MDTD effectively identifies samples that contain different types of Trojan triggers. We also show that an adversary who trains robust DNN models using a combination of clean and Trojaned samples does not cause a significant deterioration in MDTD performance without significantly reducing the classification accuracy of the DNN model.
dc.embargo.lift	2028-03-21T18:03:31Z
dc.embargo.terms	Restrict to UW for 5 years -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Asokraj_washington_0250O_25185.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49902
dc.language.iso	en_US
dc.rights	none
dc.subject
dc.subject	Electrical engineering
dc.subject.other	Electrical engineering
dc.title	A Multi-Domain Trojan Detector for Deep Neural Networks
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Asokraj_washington_0250O_25185.pdf
Size:: 1.26 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrical engineering