A Machine Learning Based Hard Fault Recuperation Model for Approximate Hardware Accelerators

Taher, Farah Naz; Callenes-Sloan, J.; Schaefer, Benjamin Carrion

A Machine Learning Based Hard Fault Recuperation Model for Approximate Hardware Accelerators

Files

JECS-6682-279798.38-LINK.pdf (164.95 KB)

Authors

Taher, Farah Naz

Callenes-Sloan, J.

Schaefer, Benjamin Carrion

Publisher

The Association for Computing Machinery

URI

https://hdl.handle.net/10735.1/6683

Abstract

Continuous pursuit of higher performance and energy efficiency has led to heterogeneous SoC that contains multiple dedicated hardware accelerators. These accelerators exploit the inherent parallelism of tasks and are often tolerant to inaccuracies in their outputs, e.g. image and digital signal processing applications. At the same time, permanent faults are escalating due to process scaling and power restrictions, leading to erroneous outputs. To address this issue, in this paper, we propose a low-cost, universal fault recovery/repair method that utilizes supervised machine learning techniques to ameliorate the effect of permanent fault(s) in hardware accelerators that can tolerate inexact outputs. The proposed compensation model does not require any information about the accelerator and is highly scalable with low area overhead. Experimental results show, the proposed method improves the accuracy by 50% and decreases the overall mean error rate by 90% with an area overhead of 5% compared to execution without fault compensation.

Description

Full text access from Treasures at UT Dallas is restricted to current UTD affiliates (use the provided Link to Article). Non UTD affiliates will find the web address for this item by clicking the "Show full item record" link, copying the "dc.relation.uri" metadata and pasting it into a browser.

Keywords

Fault-tolerant computing, Machine learning, Supervised learning (Machine learning), Artificial intelligence, Computer-aided design, Signal processing--Digital techniques, Energy consumption, Computers, Self-organizing systems, Systems on a chip, Parallel programming (Computer science)

Rights

Collections

Schaefer, Benjamin Carrion

Full item page

A Machine Learning Based Hard Fault Recuperation Model for Approximate Hardware Accelerators

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections