Secure Cloud Data Analytics with Trusted Processors
Abstract
Abstract
Over the last few years, data storage in cloud-based services has been very popular due to
the easy management and monetary advantages of cloud computing. Recent developments
showed that such data could be leaked due to various attacks. To address some of these
attacks, encrypting sensitive data before sending to the cloud emerged as an important
protection mechanism. Still, indexing, querying and running complex data analytics tasks
on the encrypted data remained as important challenges. In this dissertation, we address
some of the encrypted data processing challenges using two different but complementary
approaches. First, we explore what kind of data querying functionality we can provide for
encrypted data even if we have no support from the server. Later, we provide solutions for
the use cases where the cloud server provides a trusted processor for processing some of the
encrypted data.
For the cloud deployments where there is only limited support from the cloud server, we
provide a new searchable encryption scheme, i.e., a type of encryption technique that allows
querying on encrypted data. Unlike, most of the existing searchable encryption schemes that
are developed for keyword searches, our proposed scheme does not require running some
code on the cloud servers. Furthermore, we provide an extensible framework for supporting
complex search queries over encrypted multimedia data. Before any data is uploaded to the
cloud, important features are extracted to support different query types (e.g., extracting
facial features to support face recognition queries) and complex queries are converted to
series of object retrieval tasks for the cloud service.
Later, we explore the setting where the cloud servers provide support for processing encrypted data using trusted processors. In this setting, we can execute code in a trusted
processor in a secure manner, i.e, the adversary cannot temper with the code without detection, and data is always encrypted outside the trusted processor.
Over the past few years, efficient and secure data analytics tools (e.g., map-reduce framework, machine learning models, and SQL querying) that can be executed over encrypted data
using the trusted processors have been developed. However, these prior efforts do not provide a simple, secure and high-level language-based framework that is suitable for enabling
generic data analytics for non-security experts who do not have important security concepts
such as “oblivious execution”. We thus provide such a framework that allows data scientists
to perform the data analytic tasks with secure processors using a Python/Matlab-like highlevel language. Also, we perform block size optimization and provide security guarantees for
data obliviousness.
Similarly, systems to accesses encrypted inverted index using trusted processes have been
developed before. However, none of these works proposed a mechanism to build the index in
the cloud securely. All of these works assume that some form of unencrypted inverted index
is already available. Building an inverted index can be very memory consuming task for
big data on memory constraint platforms. So we propose a system to build the encrypted
inverted index in the cloud using trusted processors for text as well as multimedia data in
an oblivious and secure manner. We design our index to support TF-IDF based ranked
document retrieval. Our system also supports indexing for answering complex queries such
as face recognition.