Ensuring Integrity, Privacy, and Fairness for Machine Learning Using Trusted Execution Environments

Date

December 2021

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

Abstract

In this day and age, numerous decision-making systems increasingly rely on machine learning (ML) and deep learning to deliver cutting-edge technologies to the members of society. Due to potential security, privacy and bias issues with respect to these ML methods, currently, end users cannot fully trust these systems with their private data, and their prediction outcome. For instance, in many cases, it is not clear how an individual’s medical record is being used for building tools for medical diagnosis? Is the data always encrypted at rest? When they are decrypted, is there a guarantee that only a trusted application can have access to the private data to eliminate potential misuse? Throughout this dissertation, solutions that leverage various security and integrity capabilities provided by hardware assisted Trusted Execution Environments (TEE) are proposed to make these ML based systems more reliable and trustworthy so that end users can have a greater trust in these systems. As a starting point, we first address the privacy and integrity issues in ML model learning in the cloud setting. Training of a deep learning model that only relies on a TEE is not very attractive to businesses that need to continuously train their models in a remote cloud setting. This is due to the fact that special hardware such as Graphical Processing Units (GPU) are much more efficient in training ML models compared to CPU based TEEs. In this dissertation, we propose an integrity-preserving solution that combines TEEs, and GPUs in order to provide an efficient solution. In this solution, we focus on the ML model training task using the efficient GPU while ensuring the detect any deviation from the ML model learning protocol with a high probability using the TEE capabilities. Using our solution, we can ascertain (with high probability) the model is trained with the correct training dataset using the correct training hyperparameters, and correct code execution flow. Once we provide an integrity preserving ML model training solution, we focus on how to use the learned ML model privately and securely in practice. To provide privacy-preserving inference on sensitive data, wherein ML model owner and data owner do not trust each other, the dissertation proposes a solution that the inference task is run inside a TEE and the result is sent to the data owner(s). The most important benefit of our solution is that the data owner can ensure their data will not be used for any other purposes in the future and no information other than the agreed model inference result is disclosed. Furthermore, we show the efficacy of our solution in the context of genomic data analysis. Next we focus on the bias and unfairness embedded in certain ML models. It is has been reported that the ML models can unfairly treat certain subgroups, and it is hard to test for such issues in application deployment settings where both the ML model and the input data to the ML model is sensitive (i.e., both the model and the data cannot be disclosed to public for auditing directly). This dissertation proposes a privacy-preserving solution for fairness analytics using TEEs. In this setting, the model owner and the fairness test set owner do not trust each other, therefore they do not want their input to be disclosed. The end goal is for the fairness analyst to conduct tests about the quality and fairness of the model’s outcome with respect to a set of predefined minority groups or subgroups and compare and contrast them with privileged group(s). This way, models can be analyzed, and the analyst can shed light on the potential latent biases in the ML model in a privacy-preserving manner. Even if the ML model is trained, and deployed securely, due to data poisoning, the final model may still contain hidden backdoors (which in the literature is referred to as trojan attacks). Finally, in this dissertation, we develop novel techniques to detect such attacks. We design experiments that first creates a multitude of models that carry a trojan, and another set that does not have any trojan. Then, we build classifiers to see if we can tell them apart. Our results show that ML models could be used to detect trojan attacks against other ML models.

Description

Keywords

Computer Science

item.page.sponsorship

Rights

Citation