A Multi-Domain Trojan Detector for Deep Neural Networks

dc.contributor.advisorPoovendran, Radha
dc.contributor.authorAsokraj, Surudhi
dc.date.accessioned2023-04-17T18:03:31Z
dc.date.issued2023-04-17
dc.date.submitted2023
dc.descriptionThesis (Master's)--University of Washington, 2023
dc.description.abstractBackdoor attacks have been demonstrated to compromise the functioning of machine learning models that utilize deep neural networks (DNNs). An adversary carrying out a backdoor attack embeds a predefined perturbation called a Trojan trigger into a small subset of input samples. The DNN can then be trained in a manner such that the presence of the trigger in the input results in an output label that is different from the correct label. At the same time, outputs of the DNN corresponding to inputs without the trigger remain unaffected. Backdoor attacks, where an attacker can negatively affect the DNN's behavior, might have severe repercussions in safety-critical applications. Existing defenses in the literature against backdoor attacks involve pruning or retraining DNN models, which can be computationally expensive. In addition, researchers have demonstrated the success of these solutions on input domains based on images. The performance of such defenses on other inputs needs to be understood better. In this thesis, we propose and develop MDTD, a multi-domain Trojan detector. MDTD for DNNs has several distinguishing characteristics, including (i) not requiring retraining DNN models (ii) not requiring knowledge of the trigger or the embedding strategy of the attacker, (iii) is computationally inexpensive (iv) capable of being applied to image and graph-based inputs. To the best of our knowledge, MDTD is the first Trojan detection mechanism proposed for graph-based inputs. MDTD uses the insight that input samples containing a Trojan trigger are located relatively further away from a decision boundary than clean input samples. Initially, MDTD estimates the distance to a decision boundary using adversarial learning methods. These methods estimate the smallest magnitude of noise required for the model to misclassify a sample. MDTD uses this information to infer whether a given sample is Trojaned or not. More precisely MDTD learns a threshold for the distance to the decision boundary using a small set of clean labeled samples and uses this threshold to flag a sample as possibly Trojaned. We evaluate MDTD against state-of-the-art (SOTA) Trojan detection methods across five image-based datasets - CIFAR100, CIFAR10, GTSRB, SVHN and Flowers102- and four graph-based datasets - AIDS, WinMal, Toxicant and COLLAB. Our results show that MDTD effectively identifies samples that contain different types of Trojan triggers. We also show that an adversary who trains robust DNN models using a combination of clean and Trojaned samples does not cause a significant deterioration in MDTD performance without significantly reducing the classification accuracy of the DNN model.
dc.embargo.lift2028-03-21T18:03:31Z
dc.embargo.termsRestrict to UW for 5 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherAsokraj_washington_0250O_25185.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49902
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectElectrical engineering
dc.subject.otherElectrical engineering
dc.titleA Multi-Domain Trojan Detector for Deep Neural Networks
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Asokraj_washington_0250O_25185.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format