Towards a High-Performance and Secure Memory System and Architecture for Emerging Applications
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
In the 5G era, diverse types of artificial intelligence (AI) and Internet of Things (IoT) applications emerge in our life, such as smart homes, virtual reality, and autonomous vehicles. These applications typically impose diversified requirements in real deployments in terms of latency, privacy, security, etc., and stimulate the evolution and prosperity of heterogeneous computing. In this dissertation, heterogeneous computing indicates the scheme in which the different computing Processing Units (PUs) with differentiated computing capacities are effectively coordinated and managed to achieve computing gains. The representative PUs include CPU, Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuits (ASIC), and etc. As GPU has become one of the most promising and prevalent platforms to deploy emerging AI-enabled applications, this dissertation sets to discuss some key challenges and solutions of GPU-based heterogeneous system/architecture, especially the memory subsystem and management, matching the deployment requirements of emerging applications from the performance and security perspective. Regarding the challenges, the applications typically process huge volumes of data and computations and are memory-hungry, and can exhibit diverse computation properties and memory access patterns. In contrast, the GPU-based heterogeneous system/architecture, especially the GPU device, has limited memory capacity. Also, the CPU PU and GPU PU in the heterogeneous system have fundamentally different computing architectures and differentiated memory subsystems. Thus, there exists a ”memory wall” caused by the mismatch between the diversified applications properties and the GPU-based system heterogeneity, which damages the applications’ performance. On the other hand, applications face a variety of security and privacy risks during deployments. However, the GPU-based heterogeneous system, especially the memory subsystem, can expose multiple security vulnerabilities, damaging applications’ privacy. To address the challenges, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications’ performance and privacy. Specifically, 1) we propose a task-aware and dynamic memory management mechanism to co-optimize applications’ latency and memory footprint, especially in multitasking scenarios. 2) We propose a novel latency-aware memory management framework that analyzes the application characteristics and hardware features to reduce applications’ initialization latency and response time. 3) We develop a new model extraction attack that explores the vulnerability of the GPU unified memory system to accurately steal private DNN models. 4) We propose a CPU/GPU Co-Encryption mechanism that can defend against a timing-correlation attack in an integrated CPU/GPU platform to provide a secure execution environment for the edge applications. This dissertation aims at developing a high-performance and secure memory system and architecture in GPU heterogeneous platforms to deploy emerging AI-enabled applications efficiently and safely.