Browsing by Author "Taher, Farah Naz"

Now showing 1 - 3 of 3

A Machine Learning Based Hard Fault Recuperation Model for Approximate Hardware Accelerators
(The Association for Computing Machinery) Taher, Farah Naz; Callenes-Sloan, J.; Schaefer, Benjamin Carrion; Taher, Farah Naz; Schaefer, Benjamin Carrion
Continuous pursuit of higher performance and energy efficiency has led to heterogeneous SoC that contains multiple dedicated hardware accelerators. These accelerators exploit the inherent parallelism of tasks and are often tolerant to inaccuracies in their outputs, e.g. image and digital signal processing applications. At the same time, permanent faults are escalating due to process scaling and power restrictions, leading to erroneous outputs. To address this issue, in this paper, we propose a low-cost, universal fault recovery/repair method that utilizes supervised machine learning techniques to ameliorate the effect of permanent fault(s) in hardware accelerators that can tolerate inexact outputs. The proposed compensation model does not require any information about the accelerator and is highly scalable with low area overhead. Experimental results show, the proposed method improves the accuracy by 50% and decreases the overall mean error rate by 90% with an area overhead of 5% compared to execution without fault compensation.
Common-Mode Failure Mitigation: Increasing Diversity Through High-Level Synthesis
(Institute of Electrical and Electronics Engineers Inc., 2019-03-25) Taher, Farah Naz; Joslin, Matthew; Balachandran, A.; Zhu, Zhiqi; Schaefer, Benjamin Carrion; Taher, Farah Naz; Joslin, Matthew; Zhu, Zhiqi; Schaefer, Benjamin Carrion
Fault tolerance is vital in many domains. One popular way to increase fault-tolerance is through hardware redundancy. However, basic redundancy cannot cope with Common Mode Failures (CMFs). One way to address CMF is through the use of diversity in combination with traditional hardware redundancy. This work proposes an automatic design space exploration (DSE) method to generate optimized redundant hardware accelerators with maximum diversity to protect against CMFs given as a single behavioral description for High-Level Synthesis (HLS). For this purpose, this work exploits one of the main advantages of C-based VLSI design over the traditional RT-level design based on low-level Hardware Description Languages (HDLs): The ability to generate micro-architectures with unique characteristics from the same behavioral description. Experimental results show that the proposed method provides a significant diversity increment compared to using traditional RTL-based exploration to generate diverse designs. © 2019 EDAA.
Fault Tolerance in Hardware Accelerators: Detection and Mitigation
(2019-12) Taher, Farah Naz; 0000-0003-1317-3755 (Taher, FN); Schaefer, Benjamin Carrion
In the age of self-driving cars and space adventures, fault tolerance has become a first order design metric. Thus, it is vital to incorporate fault tolerance coherently into the Very Large Scale Integrated (VLSI) design process. This is especially the case in state-of the-art complex heterogeneous Systems-on-Chip (SoC), which typically contain a variety of dedicated hardware accelerators. These SoCs have to be taped out at shorter and shorter periods while their complexity keeps increasing. This is driving designers to finally embrace the use of C-based VLSI design called as High-Level Synthesis (HLS). HLS has shown to significantly reduce the design and verification time compare to the use of low-level hardware descriptions languages. Moreover, one significant advantage of raising the level of abstraction is that C-based VLSI design allows to generate a variety of micro-architectures with different trade-offs from the same untimed behavioral description. Fault-tolerance at the hardware level has so far has been mainly based around building N-modular redundant (NMR) systems like duplication and Triple Modular Redundancy (TMR), where the hardware channel is identical. In this work, we exploit HLS’s advantage to generate micro-architectures with different characteristics from the same behavioral description for automatically generating fault-tolerant systems. In particular, we first propose an automated framework that given a single behavioral description generates a set of NMR systems with different area and performance trade-offs by choosing different mixes of micro-architectures. Secondly, we leverage this advantage to generate redundant systems that minimize common mode failure (CMF). CMFs imply that multiple modules in the redundant system are affected at the same time by a fault. Hence, it has been shown that adding diversity in the hardware channels can make the system more tolerant to these type of faults. We also leverage the power of machine learning to estimate the diversity through fast and efficient predictive methods, thus, significantly speeding up the redundant system generation. A previously reported design diversity metric called Diversity Metric based on circuit Path analysis (DIMP) or RT-level fault injection based method is investigated to check if they can achieve similar results compared to the gate-netlist fault injection based diversity calculation. Lastly, a low-cost, universal fault-recovery/repair method that utilizes supervised machine learning techniques to ameliorate the effect of permanent fault(s) in hardware accelerators that can tolerate inexact outputs is proposed.