Predictable GPGPU Computing in DNN-Driven Autonomous Systems
MetadataShow full item record
Graphics processing units (GPUs) are being widely used as co-processors in many domains to accelerate general-purpose workloads that are data-parallel and computationally intensive, i.e., GPGPU. An emerging usage domain is adopting GPGPU to accelerate inherently computationintensive Deep Neural Network (DNN) workloads in autonomous systems. Such autonomous systems are usually time-sensitive, especially for autonomous driving systems. When driving alongside human drivers, loss of life or property may result if the computing systems of the autonomous vehicles fail to respond to events before its deadline. Much research has been conducted to algorithmically optimize the accuracy and performance of deep neural networks, but limited attention has been given to optimizing the execution of GPU-accelerated DNN workloads from the scheduling angle, especially in a time-constrained multi-tasking environment. Adopting GPGPU to accelerate DNN workloads in time-sensitive autonomous systems that are often resource-constrained presents a series of challenges: (1) GPUs are designed to execute nonpreemptively, which may cause priority inversion; (2) How to optimize the execution of GPUaccelerated DNN workloads at the system level in a real-time multi-tasking environment; (3) How to simultaneously achieve two (often) conflicting goals in a resource-constrained embedded CPUGPU heterogeneous platform: timing predictability and energy efficiency, that are essential for any DNN-based autonomous driving system. The goal of the research presented in this dissertation is to solve or remedy the aforementioned challenges. Specifically, we propose GPES, a runtime system that allows GPU executions to be interruptible and preemptable in a multi-tasking environment. We proposed S 3DNN, a systemic solution that optimizes the execution of DNN workloads on GPU in a soft real-time multi-tasking environment. We proposed PredJoule, a runtime system which presents a layer-based approach that controls the timing and optimizes energy efficiency by exploiting each layer’s performance/energy characteristics. In addition to the runtime systems we proposed, we investigate the problem of mapping multiple applications implemented using kernel graphs in a heterogeneous system, and present a theoretical framework that formulates this problem as an integer program and a set of practically efficient mapping algorithms. Furthermore we present a reuse-based approach to further improve the predictability of GPU computing.