From Single Component to System Level Approximate Computing
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Most Integrated Circuits (ICs) are now heterogeneous Systems-on-Chip (SoC) that contain a variety of hardware accelerators. These dedicated accelerators execute applications that have large amounts of parallelism, e.g., Digital Signal Processing (DSP) and image processing applications. This approach can substantially increase the performance, while decreasing the energy consumption of the SoC. One orthogonal approach, approximate computing, has emerged as a powerful alternative to further reduce the power of SoCs. In approximate computing the error at the output is relaxed to simplify the hardware or the software that run on the processor and thus, achieve lower power. It has proven to be an effective method to achieve low power for error tolerant applications (e.g., digital signal processing and image processing) by trading off the accuracy of the circuit vs. area/power/energy. Most work in approximate computing focuses on basic approximation primitives that fails to reduce area/power/energy beyond a certain level. Also, most work on approximate com- puting has focused on specific components within a system. This severely limits the approx- imation potential as most Integrated Circuits (ICs) are now complex heterogenous systems. Another additional limitation of current work in this domain is they assume that the training data matches the actual workload. This is nevertheless not always true as these complex Systems-on-Chip (SoCs) are used for a variety of different applications. To address these limitations, this dissertation, presents new approximation primitives to further approximate SoC components. In addition, it proposes approximation methodology to perform system level approximation that is to approximate the entire SoC. Finally, to address the issue of input data distribution change, this dissertation proposes method to ad- just approximation based on input data distribution type for both hardware accelerators and entire SoC. The effectiveness of these proposed methods has been verified from experimental results and comparison with similar state of the art research.