Towards Securing Untrusted Deep Neural Networks
Deep Neural Network (DNN) models have achieved remarkable success in various domains, ranging from image recognition to natural language processing. However, the increasing reliance on cloud-based services and the proliferation of machine learning applications have raised concerns regarding the security and privacy of these models. Protecting untrusted DNN models from malicious manipulation and exploitation has become a critical challenge. This dissertation addresses the issue of protecting untrusted DNN models from malicious manipulation (ie, Trojan attack) and proposes a framework to enhance their security against Trojan (backdoor) attacks. The framework consists of multiple dimensions of defenses that collectively aim to safeguard the integrity of the models. First, we introduce the background of Trojan attack and the settings of each proposed method. Specifically, we propose two kinds of defense approaches against current Trojan attacks, one is for model-level and another is for input-level. Both two proposed defense approaches are built upon the most practical scenario, i.e., Black-box and Hard-Label scenario. The Black-box means that the defender can not access the detailed parameters for the target model; while the hard-label implies that the defender can only access the final prediction label for the target DNN models. To the best knowledge, such a scenario is one of the most practical and challenging scenarios for Trojan Defense. To tackle the Trojan attacks, we propose two methods, i.e., AEVA and SCALE-UP. AEVA is a novel Trojan detection approach that is implemented upon suspicious models. As for SCALE-UP, it is an input-level Trojan defense technique, which is implemented upon the input data during the inference phase. Both two techniques are inspired by certain intriguing properties of DNN models and shown effective in the backdoor defense task. Lastly, we discuss the potential adaptive attacks against our defense approaches and evaluate their effectiveness. We find that our defense approach can still perform robustness against potential adaptive attacks.