Framework for Mapping Convolutional Neural Networks on FPGAs
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Artificial Intelligence (AI) applications are on the rise. Recent advances in machine learning and deep learning have created various applications for medicine/healthcare, financial markets, security, entertainment, and social sciences. Deep Learning, especially, has demonstrated tremendous opportunities in computer vision, autonomous driving, natural language processing, and many more. Deep learning allows machines to solve complex problems using Artificial Neural Networks (ANNs), and the learning itself can be supervised or semisupervised. Multilayered artificial neural networks are called Deep Neural Networks (DNNs). These deep computational models are composed of multiple sequentially processing layers that help learn the representations within a given data set. Convolutional Neural Networks (CNN) are a particular class of deep networks that use convolution to extract features from (usually a time-domain or frequency-domain) data and then use the extracted features to classify that data for final inferencing. Several software tools and frameworks are available to facilitate the deep learning community with the fast development and high-performance execution of DNNs. Tool flows, such as PyTorch, Caffe, Theano, and TensorFlow, aim to increase the productivity of CNN software developers by providing a pathway for implementing deep networks on high-performance multi-core CPUs, GPUs, and DSPs. GPUs, especially, provide easy access to floating point operations and also allow very high memory bandwidths. Some of the latest Nvidia GPUs (Nvidia GeForce RTX2080) consume as much as 300 watts of power. Excessive power dissipation can make GPUs an unfavorable candidate for implementing CNNs for a variety of applications. Field Programmable Gate Arrays (FPGAs) provide a high degree of customized parallelization and offer far superior performance per watt. We believe that FPGA-based accelerators are ideal platforms for implementing Convolutional Neural Networks for computer vision and related applications. Software engineers with minimal hardware design skills demand tremendous support within the tool-flows, and FPGA vendors are fully embracing new methodologies like high-level synthesis, where the designs can be described as a program written in languages like C/C++. However, commercial FPGAs are resource-scarce, the CNN mapping design space is enormous, and efficient mapping of CNN can quickly become a challenging task. The requirement of FPGA resources, latency, and power is affected by many parameters, including the CNN architecture and the level of computational parallelism. In practice, a software designer first explores various CNN architectures in software to improve architecture’s validation accuracy. Once an architecture has been finalized, the designer ports the architecture design to FPGA for inference acceleration. The mapping process undergoes performance optimization by tweaking many design-related parameters during the design space exploration and changing the operating frequencies. The entire process is highly time-consuming. This dissertation describes a fully automated end-to-end design framework for implementing CNNs on FPGAs. The framework allows a designer to express the CNNs in commonly preferred Python language descriptions and provides a guided tool flow to generate a custom Intellectual Property (IP) block. In addition, the framework allows easy and complete exploration for selecting final design implementations based on optimization parameters that include Performance, Power, and Area (PPA).