# METHODS FOR ON-BOARD CONDITION MONITORING OF SIC MOSFET BASED CONVERTERS

by

Bhanu Teja Vankayalapati

APPROVED BY SUPERVISORY COMMITTEE:

Bilal Akin, Chair

Dongsheng (Brian) Ma

Joseph Friedman

Matthew Gardner

Copyright © 2022 Bhanu Teja Vankayalapati All Rights Reserved This thesis is dedicated to my parents Shri. Udaya Sankar and Smt. Jyothi, my wife Akansha, and, my brother Chandra Kant.

# METHODS FOR ON-BOARD CONDITION MONITORING OF SIC MOSFET BASED CONVERTERS

by

# BHANU TEJA VANKAYALAPATI, BTech, MTech

## DISSERTATION

Presented to the Faculty of The University of Texas at Dallas in Partial Fulfillment of the Requirements for the Degree of

# DOCTOR OF PHILOSOPHY IN

# ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT DALLAS

December 2022

#### ACKNOWLEDGMENTS

Some four and a half years back, had someone asked the naive me, "How hard do you think getting a PhD would be?" I would have said, "Not much harder than the other degrees I got". To be fair, I had assumed "challenges" in getting an academic degree would be mostly technical. However, with the benefit of hindsight, I now realize how wrong I was. The last few years had a few ups, several downs, and many new experiences, but they were anything but easy. The fact that I have managed to make it through is solely a testament to the kind and generous support I have received from the people in my life and several others with whom I was fortunate to cross paths. This is a humble attempt at acknowledging them and my apologies to those whom I might miss.

First and foremost, I express my immense gratitude to my "Prof", Dr. Bilal Akin. His guidance, mentorship, words of support, and wisdom were critical in enabling the successful completion of my studies at UTD. His technical insights, insistence on simplicity and practicality, and equal emphasis on effective presentation had a significant impact on shaping the work in this dissertation. More importantly, I know Prof as a great human with very mild manners, infinite patience, kindness, and exceptional drive. Working and learning under him was truly an honor and an experience I will always cherish.

I thank my committee members, Prof. Brian Ma, Prof. Matthew Gardner, and Prof. Joseph Friedman, for agreeing to be on the committee and providing ideas on the research. Special thanks to Dr. Friedman, whose course on "Beyond CMOS Computing" provided a fun yet insightful break from the world of power electronics. I also thank him for his counsel and words of encouragement during these years. I also extend my thanks to the wonderful staff at UTD ECE, who are always prompt in providing support for the research activities. During my years at UTD, largely thanks to Prof's efforts, I have had the opportunity to interact with many amazing people from the larger power industry. Every one of them was happy to engage us, provide insights, and indulge our curiosity. I especially thank Dr. Hui Tan, Dr. Zhen Yu, Dr. Na Kong, Dr. Mrinal Das, Dr. Manish Bharadwaj, Mr. Ramesh Ramamoorthy, Dr. Han Zhang, Dr. Sridhar Sana, and Dr. Enis Tuncer. Special thanks to Hui, our liaison from Texas Instruments, who provided important insights for a key project.

I am also grateful to my friends and colleagues, past and present, from the Power Electronics and Drives Lab at UTD. I have learned from every one of them, and they will forever remain good friends. In no particular order, thank you, Vignesh, Kudra, Masoud, Chi, Rahman, Chen, Nathan, Fei, Saurabh, Mojtaba, Ajmal, Akshay, Sritam, Ahmad, Shi, Enes, and Feyzullah. Special thanks to Vignesh for his friendship in and outside the lab. I will be grateful for his support and friendship during the initial years of my PhD. I will cherish the lab lunches which, more than anything, brought out glimpses of diversity among a bunch of nerds in both eating food and engaging in interests outside work. Outside lab, I must thank my friends, Akshay, Vignesh, Akash, Shubham, Ravi, Aditya, Venkat, and Javad, for the long conversations, fun, and food we had together.

I owe my life and my achievements to my parents, Shri. Udaya Sankar, and Smt. Jyothi. They have showered me with unconditional love, provided support, and instilled in me the most important values of life. My father is my role model and the primary reason I am an engineer. His encouragement taught me to think from first principles at a very early age. My mother's calm confidence and words of encouragement have seen me through the hardest of times. This dissertation would literally not be possible without the help and support of my wife, Akansha, who, as I write this, is proofreading the document. She is my dearest friend, my source of strength, my harshest critic, my confidant, and the only person who has chosen to spoil me with unconditional love purely out of her own will. I am lucky to have her in my life. She has put up with me in more ways than I can thank her for. Her energy, her antics, and her mere presence lessen my troubles and make my life so much better. I also thank my dearest brother, Chandu, for his love and support. His kindness, calmness, smartness, mild manners, and sense of humor are unmatched. I am grateful to my parents-in-law, Shri. Alok Kumar Jain and Smt. Meenu Jain, for their love and kind words of support. I also thank my sister-in-law, Dr. Damini Jain, and my brother-in-law, Shaurya Jain, for their confidence in me and the fun phone conversations. Lastly, special thanks to my cousin, Pooja, and her dear husband, Kiran, for their love, support, and selfless help in times of need. Visiting them always makes me happy and super full!

November 2022

# METHODS FOR ON-BOARD CONDITION MONITORING OF SIC MOSFET BASED CONVERTERS

Bhanu Teja Vankayalapati, PhD The University of Texas at Dallas, 2022

Supervising Professor: Bilal Akin

The power electronics industry is continuously striving to improve the efficiency and density of power converters. At the same time, with increasing electrification and automation across application domains, the power electronic systems are expected to meet stringent reliability requirements, especially in safety-critical applications such as aerospace, autonomous vehicles, data centers, etc. Silicon Carbide (SiC) power semiconductor devices promise significantly superior electro-thermal performance to traditional silicon IGBTs and MOSFETs. However, given their relative nascence, the field reliability of SiC devices is unproven and certain fundamental reliability challenges exist. This dissertation aims to study on-board condition monitoring methods as a potential solution to addressing reliability challenges with SiC MOSFET based converters. The dissertation first presents a detailed architecture for a modular, highly-scalable accelerated testing platform for SiC MOSFETs. The proposed testing setup enables rapid aging of large batches of SiC MOSFETs for the purpose of generating large datasets to study long-term reliability, and identify electrical precursors that can be used for on-board condition monitoring of SiC devices. Testing on a batch of discrete SiC MOSFETs using the developed test platform revealed the frequent occurrences of gate-open failure in discrete SiC MOSFETs. Therefore, in this dissertation, gate-open failures are systematically studied in the context of SiC MOSFETs, and potential causes for SiC MOSFETs' increased susceptibility to gate-open failures is discussed. Importantly, a robust cycle-by-cycle gate-open failure detection solution is presented and its superior performance over traditional protection schemes is experimentally validated. Lastly, this dissertation proposes an end-to-end practical online condition monitoring solution for SiC MOSFET-based traction inverters using device on-state resistance ( $R_{ds-on}$ ) as an aging precursor. The proposed solution includes accurate on-board on-state resistance ( $R_{ds-on}$ ) measurement circuits along with code-efficient data acquisition and filtering algorithms. Importantly, the presented solution uses a stochastic Bayesian state-of-health estimation algorithm. The algorithm presents an elegant solution to the fundamental problem of separating aging-related  $R_{ds-on}$  change from operating conditions-related changes by exploiting the symmetrical nature of the inverter's operation. In particular, the presented solution is highly scalable as it automatically accounts for device and system level variations and eliminates the need for extensive system/device specific calibration.

# TABLE OF CONTENTS

| ACKNC  | WLED    | OGMENTS                                                   |
|--------|---------|-----------------------------------------------------------|
| ABSTR  | ACT     |                                                           |
| LIST O | F FIGU  | JRES                                                      |
| LIST O | F TABI  | LES                                                       |
| CHAPT  | ER 1    | INTRODUCTION 1                                            |
| 1.1    | Advan   | tages of Silicon Carbide Power Semiconductors             |
| 1.2    | Reliab  | ility Challenges in Adopting SiC Devices                  |
| 1.3    | On-Bo   | ard Condition Monitoring                                  |
| 1.4    | Thesis  | Outline and Challenges Addressed                          |
| 1.5    | List of | Publications                                              |
| CHAPT  | ER 2    | A HIGHLY SCALABLE, MODULAR TEST BENCH ARCHITECTURE        |
| FOR    | LARG    | E-SCALE DC POWER CYCLING OF SIC MOSFETS                   |
| 2.1    | Backg   | round                                                     |
| 2.2    | Review  | v of Considerations in DC Power Cycling of SiC Devices 10 |
|        | 2.2.1   | Review of DC Power Cycling 10                             |
|        | 2.2.2   | Junction Temperature Measurement                          |
|        | 2.2.3   | Multi-Device DC Power Cycling                             |
| 2.3    | Propos  | sed Architecture                                          |
|        | 2.3.1   | Module Design                                             |
|        | 2.3.2   | High Level Architecture                                   |
|        | 2.3.3   | Overview of Operation                                     |
|        | 2.3.4   | Closed Loop Control of $\Delta T_j$                       |
|        | 2.3.5   | Fault Detection and Isolation    26                       |
| 2.4    | Result  | s and Discussion                                          |
|        | 2.4.1   | Verification of Smooth Switch-Over                        |
|        | 2.4.2   | Verification of Closed Loop $\Delta T_j$ Control          |
|        | 2.4.3   | Verification of $R_{ds-on}$ Measurement Accuracy          |
|        | 2.4.4   | Verification of Thermal Isolation                         |

|                     | 2.4.5                    | Device Aging Analysis                                                                                                 | 36 |
|---------------------|--------------------------|-----------------------------------------------------------------------------------------------------------------------|----|
|                     | 2.4.6                    | Comparison to Existing Architectures                                                                                  | 37 |
| 2.5                 | Conclu                   | usion                                                                                                                 | 38 |
| CHAPT<br>TRO<br>BIL | FER 3<br>OL OF<br>JTY AS | MODEL BASED CLOSED-LOOP JUNCTION TEMPERATURE CON-<br>SIC MOSFETS IN DC POWER CYCLING FOR ACCURATE RELIA-<br>SESSMENTS | 40 |
| 3.1                 | Introd                   | uction                                                                                                                | 40 |
| 3.2                 | Closed                   | l-loop $T_i$ Control                                                                                                  | 44 |
|                     | 3.2.1                    | On-Board Junction Temperature Estimation                                                                              | 46 |
|                     | 3.2.2                    | Device Thermal Model                                                                                                  | 48 |
|                     | 3.2.3                    | Kalman Filter                                                                                                         | 50 |
|                     | 3.2.4                    | $T_j$ Profile Control                                                                                                 | 52 |
|                     | 3.2.5                    | Aging Correction                                                                                                      | 52 |
| 3.3                 | Exper                    | imental Results                                                                                                       | 53 |
|                     | 3.3.1                    | Verification of Closed-loop $\Delta T_j$ Control Algorithm                                                            | 55 |
| 3.4                 | Conclu                   | usion                                                                                                                 | 56 |
| CHAP'<br>FAI        | TER 4<br>LURE 1          | INVESTIGATION AND ON-BOARD DETECTION OF GATE-OPEN<br>IN SIC MOSFETS                                                   | 58 |
| 4.1                 | Introd                   | uction                                                                                                                | 58 |
| 4.2                 | Gate-0                   | Open Failure Analysis and On-Board Characterization                                                                   | 61 |
|                     | 4.2.1                    | MOSFET's Behaviour Under Various Gate-Open Failure Scenarios .                                                        | 61 |
|                     | 4.2.2                    | DC Power Cycling Test Methodology                                                                                     | 67 |
|                     | 4.2.3                    | On-Board Failure Characterization                                                                                     | 69 |
| 4.3                 | Detail                   | ed Failure Analysis                                                                                                   | 72 |
|                     | 4.3.1                    | Non-destructive C-SAM Analysis                                                                                        | 72 |
|                     | 4.3.2                    | Optical Microscopy                                                                                                    | 74 |
|                     | 4.3.3                    | Cross-sectioning and SEM Analysis                                                                                     | 74 |
|                     | 4.3.4                    | FEA Analysis of Gate-bond Failure Mechanism                                                                           | 76 |
| 4.4                 | On-Bo                    | oard Detection of Gate-open Failure                                                                                   | 79 |
|                     | 4.4.1                    | Proposed On-Board Detection Circuit                                                                                   | 79 |

|              | 4.4.2             | Failure Detection Logic                                                              | 83  |
|--------------|-------------------|--------------------------------------------------------------------------------------|-----|
| 4.5          | Exper             | imental Verification                                                                 | 86  |
|              | 4.5.1             | Characterization and Verification of Gate-Open Failure Emulation<br>Technique        | 88  |
|              | 4.5.2             | Verification of Failure Detection under Q1 Operation                                 | 89  |
|              | 4.5.3             | Verification of Failure Detection under Q3 (Synchronous) Operation $% \mathcal{A}$ . | 91  |
|              | 4.5.4             | Comparison of Proposed Technique to Traditional DESAT Protection<br>Scheme           | 93  |
| 4.6          | Concl             | usion                                                                                | 96  |
| CHAPT<br>FOR | ר ER 5<br>R SIC ר | A PRACTICAL SWITCH CONDITION MONITORING SOLUTION<br>TRACTION INVERTERS               | 97  |
| 5.1          | Introd            | luction                                                                              | 97  |
| 5.2          | Propo             | osed Online $R_{ds-on}$ Measurement Solution                                         | 102 |
|              | 5.2.1             | Measurement Circuit Design                                                           | 102 |
|              | 5.2.2             | Kalman Filter Design                                                                 | 108 |
| 5.3          | Propo             | sed Bayesian SoH Estimation Solution                                                 | 111 |
|              | 5.3.1             | Input Feature Design                                                                 | 112 |
|              | 5.3.2             | Bayesian Formulation                                                                 | 114 |
|              | 5.3.3             | Modelling Likelihood Function                                                        | 115 |
| 5.4          | Exper             | rimental Verification and Discussion                                                 | 116 |
|              | 5.4.1             | Verification of Online $R_{ds-on}$ Measurement                                       | 118 |
|              | 5.4.2             | Verification of Bayesian SoH Estimation Solution                                     | 120 |
| 5.5          | Concl             | usion                                                                                | 127 |
| CHAPT        | TER 6             | CONCLUSIONS, CONTRIBUTIONS, AND FUTURE WORK                                          | 130 |
| 6.1          | Concl             | usions and Contributions                                                             | 130 |
| 6.2          | Futur             | e Work                                                                               | 132 |
| REFER        | ENCE              | S                                                                                    | 133 |
| BIOGR        | APHIC             | CAL SKETCH                                                                           | 141 |
| CURRI        | CULUI             | M VITAE                                                                              |     |

# LIST OF FIGURES

| 1.1  | Process flow in developing an on-board condition monitoring process                                                                                                    | 3  |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2  | Process flow in developing an on-board condition monitoring solution                                                                                                   | 3  |
| 2.1  | DC power cycling illustration.                                                                                                                                         | 11 |
| 2.2  | C-SAM image of an aged device from backside showing die attach solder layer delamination.                                                                              | 13 |
| 2.3  | Schematics of multi-device power cycling architectures a) Parallel architecture b)<br>Independent architecture c) Staged parallel architecture d) Series architecture. | 16 |
| 2.4  | a) Schematic of the module b) Schematic of the complete high-level architecture.                                                                                       | 20 |
| 2.5  | State flow diagrams describing the operation of the rack master and module. $\ .$ $\ .$                                                                                | 23 |
| 2.6  | Illustration of proposed switch-over technique based on "indirect-sensing"                                                                                             | 24 |
| 2.7  | Proposed algorithm.                                                                                                                                                    | 27 |
| 2.8  | Pictures of the developed module a) top view b) bottom view                                                                                                            | 29 |
| 2.9  | Picture of fully assembled setup with 8 racks in standard 19" rack cabinet                                                                                             | 31 |
| 2.10 | Experimental switch-over waveform during the transition from DUT 1 to DUT 2 $$                                                                                         | 32 |
| 2.11 | a) Decapsulated device with exposed die for testing. b) IR camera image of the DUT during testing.                                                                     | 33 |
| 2.12 | Comparison of on-board $R_{ds-on}$ measurements with device characterization results.                                                                                  | 34 |
| 2.13 | CFD simulation results for fan airflow. Air speed contours for a) Horizontal cut plane b) Vertical cut plane                                                           | 35 |
| 2.14 | C-SAM images of a) New device b) Aged device M1 c) Aged device M2                                                                                                      | 36 |
| 3.1  | C-SAM image of an aged device from backside showing die attach solder layer delamination.                                                                              | 41 |
| 3.2  | High-level architecture of the DC power cycling test bench used in the study                                                                                           | 45 |
| 3.3  | High-level control block diagram.                                                                                                                                      | 46 |
| 3.4  | SiC MOSFET's $R_{ds-on}$ dependence on $I_d$ and $T_j$                                                                                                                 | 47 |
| 3.5  | $R_{ds-on}$ 's sensitivity to $T_j$ change at different values of drain current $I_d$                                                                                  | 47 |
| 3.6  | Fitted $R_{ds-on}$ vs $T_j$ curves. Nodes represent experimental data                                                                                                  | 48 |
| 3.7  | A lumped equivalent thermal model of a discrete MOSFET                                                                                                                 | 49 |
| 3.8  | Fitted $V_{sd}$ vs $T_j$ curve at $I_{sd} = 300 m A$ . Markers represent experimental data                                                                             | 53 |

| 3.9  | Experimental setup used to verify the proposed $T_j$ control algorithm. $\ldots \ldots 54$                                                                                                                                                                                                           |    |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.10 | Picture of decapsulated device under test. The die is painted black to improve IR emmissivity.                                                                                                                                                                                                       | 54 |
| 3.11 | Infrared image of the decapsulated device die                                                                                                                                                                                                                                                        | 55 |
| 3.12 | Experimental $T_j$ waveform.                                                                                                                                                                                                                                                                         | 56 |
| 4.1  | Illustration of gate-open failure in discrete SiC MOSFET due to a) a heel crack b) bond-wire liftoff                                                                                                                                                                                                 | 59 |
| 4.2  | SPICE simulation circuit for analysis of MOSFET's behavior under gate-open fault.                                                                                                                                                                                                                    | 61 |
| 4.3  | Parasitic capacitance in a MOSFET from device datasheet [1]                                                                                                                                                                                                                                          | 62 |
| 4.4  | $C_{gd}$ vs $V_{dg}$ from device datasheet [1]-fix font size                                                                                                                                                                                                                                         | 62 |
| 4.5  | Simulation waveforms under conduction fault                                                                                                                                                                                                                                                          | 64 |
| 4.6  | Simulation waveforms under Q1 open fault.                                                                                                                                                                                                                                                            | 65 |
| 4.7  | Simulation waveforms under Q3 open fault.                                                                                                                                                                                                                                                            | 66 |
| 4.8  | a) DC power cycling schematic b) typical testing cycle                                                                                                                                                                                                                                               | 68 |
| 4.9  | Operation of a single leg of the DC power cycling test setup during a) heating interval for a healthy DUT b) cooling $T'_n$ interval for a healthy DUT c) cooling $T_n$ interval for a healthy DUT d) cooling $T_n$ interval for a DUT showing OFF fault e) heating interval for a DUT with ON fault | 70 |
| 4.10 | On-board characterization result for DUT showing intermittent OFF fault (Case 1).                                                                                                                                                                                                                    | 71 |
| 4.11 | On-board characterization result for DUT showing intermittent OFF fault (Case 2).                                                                                                                                                                                                                    | 71 |
| 4.12 | C-SAM images of a) healthy device b) close-up of healthy device die c) DUT 1-A d) DUT 1-B e) DUT 2-A f) DUT 2-B                                                                                                                                                                                      | 73 |
| 4.13 | Optical microscopy images of a) DUT 1-B b) DUT 2-B                                                                                                                                                                                                                                                   | 74 |
| 4.14 | a) SEM image of decapsulated DUT die showing gate bond pad b) close-up image of gate bond pad showing gate bond liftoff c) cross-sectional SEM of gate bond clearly showing a clean lift-off d) close-up of gate bond showing a 35 $\mu$ m liftoff height.                                           | 75 |
| 4.15 | Highly magnified cross-sectional SEM image of gate bond                                                                                                                                                                                                                                              | 76 |
| 4.16 | Device model for FEA analysis a) entire model b) with EMC hidden. $\ldots$ .                                                                                                                                                                                                                         | 76 |
| 4.17 | Temperature distribution across device from transient thermal simulation for a) entire device b) with EMC hidden.                                                                                                                                                                                    | 78 |
| 4.18 | Device deformation under heating for a) entire device b) with EMC hidden.<br>Wireframe represents an undeformed device                                                                                                                                                                               | 78 |

| 4.19 | a) Maximum shear stress; b) maximum elastic shear strain at the gate bond site with EMC and drain-tab hidden for $CTE_{EMC} = 10 \text{ ppm/}^{\circ}C$ ; c) maximum shear stress at gate-bond wire for $CTE_{EMC} = 10 \text{ ppm/}^{\circ}C$ ; d) maximum shear stress at gate-bond wire for $CTE_{EMC} = 5 \text{ ppm/}^{\circ}C$ . | 80  |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.20 | Schematic of proposed gate failure detection circuit                                                                                                                                                                                                                                                                                   | 81  |
| 4.21 | Device operating points on the output curve to analyze choice of detection circuit threshold parameters.                                                                                                                                                                                                                               | 82  |
| 4.22 | Overall schematic of CLB based fault detection logic                                                                                                                                                                                                                                                                                   | 83  |
| 4.23 | State-transition diagrams for a) Blanking FSM b) Fault FSM                                                                                                                                                                                                                                                                             | 85  |
| 4.24 | (a) Schematic and (b) Actual prototype of proposed gate-open failure detection circuit board.                                                                                                                                                                                                                                          | 87  |
| 4.25 | Verification of fault emulation technique.                                                                                                                                                                                                                                                                                             | 88  |
| 4.26 | Synchronous boost converter used for experimental validation of proposed gate-<br>open failure detection technique.                                                                                                                                                                                                                    | 89  |
| 4.27 | Experimental verification of a) Q1 open fault detection and b) fault detection timing.                                                                                                                                                                                                                                                 | 90  |
| 4.28 | Experimental verification of a) conduction fault detection and b) conduction fault detection timing.                                                                                                                                                                                                                                   | 92  |
| 4.29 | Experimental verification of a) Q3 open fault detection and b) fault detection timing.                                                                                                                                                                                                                                                 | 93  |
| 4.30 | Comparison of proposed fault detection circuit with conventional DESAT protec-<br>tion scheme                                                                                                                                                                                                                                          | 95  |
| 5.1  | Representation of fixed conservative failure threshold vs adaptive failure threshold with respect to healthy device characteristics.                                                                                                                                                                                                   | 99  |
| 5.2  | High-level schematic of the $V_{ds}$ measurement circuit                                                                                                                                                                                                                                                                               | 102 |
| 5.3  | Variation in forward voltage mismatch between high-voltage diodes with forward current.                                                                                                                                                                                                                                                | 104 |
| 5.4  | Task execution in typical motor control firmware implementations                                                                                                                                                                                                                                                                       | 105 |
| 5.5  | Schematic of traction drive system used in this study                                                                                                                                                                                                                                                                                  | 106 |
| 5.6  | Illustration of out-of-order equivalent time sampling technique                                                                                                                                                                                                                                                                        | 107 |
| 5.7  | Comparison of out-of-order vs sequential equivalent time sampling techniques                                                                                                                                                                                                                                                           | 108 |
| 5.8  | Variation of device $R_{ds-on}$ with drain current at $V_{gs} = 15$ V for the device used in this study.                                                                                                                                                                                                                               | 109 |

| 5.9  | Continuous conditional probability distributions for the feature variables for different failure scenarios.                                                | 117 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.10 | Picture of a) gate driver board for single leg with on-board $V_{ds}$ measurement circuit b) fully assembled $3\phi$ inverter setup.                       | 118 |
| 5.11 | Experimental results for on-board $R_{ds-on}$ measurement for single switch                                                                                | 120 |
| 5.12 | Experimental results for on-board $R_{ds-on}$ measurement for six switches                                                                                 | 121 |
| 5.13 | Experimental results for on-board $R_{ds-on}$ measurement for six switches under 20 step load variation with maximum load current = 2.8 A                  | 122 |
| 5.14 | Comparison of $R_{ds-on}$ deviation in switches from own baseline vs RMS deviation from other switches due to load                                         | 124 |
| 5.15 | Picture of locally decapsulated discrete SiC MOSFET a) under healthy condition b) with one source bond-wire broken                                         | 125 |
| 5.16 | $R_{ds-on}$ shift in switches $S_3, S_4$ under load due to single bond-wire liftoff                                                                        | 125 |
| 5.17 | Experimental results from proposed Bayesian SoH estimation solution under different failure scenarios.                                                     | 126 |
| 5.18 | Experimental results for on-board $R_{ds-on}$ measurement for six switches at maximum load current = 12.9 A. Switches S3, S4 have single bond-wire liftoff | 128 |
| 5.19 | Experimental results from proposed Bayesian SoH estimation solution under different failure scenarios at maximum load current = $12.9$ A                   | 129 |

## LIST OF TABLES

| 1.1 | Comparison of SiC MOSFET vs Si IGBT                                                              | 2   |
|-----|--------------------------------------------------------------------------------------------------|-----|
| 2.1 | Common cycles-to-failure Analytical Models                                                       | 12  |
| 2.2 | Fault Tree for Detection of OC and SC Faults                                                     | 28  |
| 2.3 | Experimental Results of Closed Loop $\Delta T_j$ Control Verification                            | 32  |
| 2.4 | Comparison of Proposed Architecture and Extrapolated Existing Architectures for 48 Devices       | 37  |
| 4.1 | DC Power Cycling Test Results                                                                    | 68  |
| 4.2 | Failure Conditions for On-board Gate-open Fault Characterization                                 | 70  |
| 4.3 | Material Properties Used for FEA Simulation                                                      | 77  |
| 4.4 | Fault Identification Table                                                                       | 84  |
| 4.5 | Characterization of Relay Release Time                                                           | 88  |
| 5.1 | Qualitative Comparison of Known Aging Precursors                                                 | 98  |
| 5.2 | Statistical Analysis of Online $R_{ds-on}$ Measurement Data $\ldots \ldots \ldots \ldots \ldots$ | 119 |
| 5.3 | Bayesian Inference Model Parameters                                                              | 123 |

## CHAPTER 1

## INTRODUCTION

#### 1.1 Advantages of Silicon Carbide Power Semiconductors

Silicon carbide (SiC) power MOSFETs have superior conduction, switching and thermal properties compared to silicon (Si) MOSFETs and IGBTs [2]. Superior properties are fundamentally enabled by the wider bandgap of SiC semiconductors. The wider bandgap results in a higher breakdown strength. Since most power devices have vertical structures, the higher breakdown strength of the semiconductor enables thinner dies with a smaller cross-section area. Table 1.1 shows the comparison between a SiC MOSFET and a similarly rated Si IGBT. The equivalent on-state resistance figure for the Si IGBT is derived from the specified  $V_{CE-Sat}$ . It is seen that despite having a higher current rating than Si IGBT, the SiC MOSFET has significantly lower input, output, and reverse transfer capacitance values. Since the capacitance values are proportional to the physical overlap, the smaller values are indicative of the significantly smaller die size of the SiC MOSFET. The lower capacitance values significantly increase SiC MOSFET's switching speed resulting in reduced switching losses. Moreover, the lower specific on-resistance of SiC MOSFETs also reduces their conduction losses.

#### 1.2 Reliability Challenges in Adopting SiC Devices

However despite the many advantages SiC MOSFETs have over Si devices, their long-term reliability is not well understood. Moreover, given that SiC technology has only started witnessing wide-scale deployment, currently available field data is limited. Although SiC technology has progressed over the past several years, certain fundamental reliability concerns remain with the current generation of SiC devices. From the package point of view, SiC's

| Property                                 | SiC MOSFET<br>(C3M0030090K) | Si IGBT<br>(IXFN56N90P) |
|------------------------------------------|-----------------------------|-------------------------|
| Breakdown voltage                        | 900 V                       | 900 V                   |
| Continuous drain current                 | 63 A                        | 56 A                    |
| On-state resistance $(R_{ds-on})$        | $30 \ \mathrm{m}\Omega$     | $145~\mathrm{m}\Omega$  |
| Input capacitance $(C_{iss})$            | $1747~\mathrm{pF}$          | $23 \mathrm{nF}$        |
| Output capacitance $(C_{oss})$           | 131  pF                     | $1385~\mathrm{nF}$      |
| Reverse transfer capacitance $(C_{rss})$ | $8 \mathrm{pF}$             | 106  pF                 |

Table 1.1: Comparison of SiC MOSFET vs Si IGBT

Young's modulus is roughly three times that of Si (501 GPa vs. 162 GPa) [3]. The resulting stiffness of SiC creates higher mechanical stress on the package for the same temperature gradient. In addition, current packaging technologies limit the upper limit of the operating temperature range of SiC and its specific on-resistance [4]. Therefore, the device market faces growing pressure to reach the theoretical levels of SiC by decreasing the reliability-oriented package margins. From the chip point of view, some of the issues related to the extrinsic defects, Basal plane dislocations, and stacking fault (SF) have been addressed over recent years [5,6]. However, there are still some challenges regarding gate oxide weakness which are mostly related to the higher density of interface traps and smaller band offset [7].

#### **1.3 On-Board Condition Monitoring**

As shown in Fig. 1.1 the challenges with SiC devices' reliability can possibly be addressed through two complementary solutions: 1) developing a large accelerated aging dataset of SiC devices under various conditions to understand their long-term reliability and guiding future device development, 2) using on-board, in-system prognostics, and device health monitoring techniques to predict imminent device failures well ahead of time, thus ensuring reliable system operation [8]. The high-level process flow for developing an online condition



Figure 1.1: Process flow in developing an on-board condition monitoring process.



Figure 1.2: Process flow in developing an on-board condition monitoring solution.

monitoring (OCM) solution is shown in Fig. 1.2 Developing an OCM solution starts with the selection of a suitable precursor. An ideal precursor for OCM is (1) sensitive to device aging, (2) insensitive to changes in junction temperature/operating conditions, and (3) easy to measure online. However, these requirements are often conflicting which makes developing a practical OCM solution challenging. In particular, compensating for precursor change due to variable operating conditions and isolating aging-related change is difficult [9]. This dissertation proposes several practical solutions to address several of these challenges as highlighted further.

#### 1.4 Thesis Outline and Challenges Addressed

- In Chapter 2, a modular, highly-scalable DC power cycling (DC PC) test bench architecture is proposed. Firstly, crucial aging parameters in DC PC, junction temperature measurement  $(T_i)$  techniques, and high-level architectures are reviewed. Based on this understanding, the design of a module which is the fundamental unit of the proposed architecture is introduced. The module is self-sufficient in terms of processing, gate drive capability,  $T_j$  measurement, and on-board parameter measurement. Specifically, device body-diode voltage drop is used for  $T_j$  measurement. Thereafter, an independent parallel high-level architecture and a scalable, decentralized approach to its operation are proposed. In the context of the architectural choices, a switch-over technique for current transfer among the paralleled modules to ensure accurate on-board parameter measurement is highlighted. Further, a hybrid feedforward hysteresis control algorithm for accurate control of junction temperature swing is proposed. Additionally, the module design is leveraged to propose a robust fault detection, isolation, and classification scheme. The proposed architecture is validated on an actual test bench that can simultaneously age 48 discrete SiC MOSFETs. Also, thermal isolation when aging multiple devices is verified through computational fluid dynamics (CFD) simulations of the designed test bench.
- An improved model based aging independent closed-loop junction temperature profile control method is presented in Chapter 3. Specifically, the temperature ramp rate and dwell time at the maximum junction temperature are controlled. The device's on-state resistance measurements are used to accurately estimate its junction temperature. A code and memory efficient technique is presented for mapping measured resistance to junction temperature. The proposed technique also considers the variation in aging-related on-state resistance change. Specifically, SiC MOSFET's body-diode

forward voltage drop  $(V_f)$  at negative gate voltage and a small current is used as an aging independent temperature sensitive electrical parameter (TSEP) to adjust the temperature reference to compensate for aging-related shifts. The detailed algorithm, filter, and controller design methods are presented in detail. The proposed algorithm is validated on a custom DC power cycling test bench.

- Intermittent gate-open failures are comprehensively investigated in the context of discrete SiC MOSFETs in Chapter 4. First, the MOSFET's behavior under various possible gate-open failure scenarios is analyzed through SPICE simulations. Several SiC MOSFETs are aged on a DC power cycling setup and the gate-open failure mechanism is verified through systematic multi-step failure analysis which includes on-board characterization, non-destructive confocal scanning acoustic microscopy (C-SAM) analysis, decapsulation and optical inspection followed by scanning electron microscopy (SEM) analysis of the failed devices. To understand the potential mechanism behind gate-open failure in SiC MOSFETs, thermo-mechanical finite element analysis (FEA) is performed on a high-fidelity model which shows interfacial shear stress at gate-bond. Further, a robust on-board technique for reliable cycle-by-cycle detection of gate-open faults is proposed. The proposed technique is experimentally verified for all possible fault scenarios and shown to detect faults in as low as 150 ns. It is shown that compared to the traditional desaturation (DESAT) protection scheme, the proposed mechanism can prevent potential shoot-through events that may be caused by gate-open failure.
- In Chapter 5, an end-to-end practical online condition monitoring solution based on switch on-state resistance is proposed. Specifically, a sensing circuit is proposed which enables accurate online on-state resistance  $(R_{ds-on})$  measurement for all six switches of the inverter. To address the challenge of periodic data acquisition alongside higherpriority motor control tasks, a fast, code-efficient out-of-order equivalent time sampling

technique is also proposed. The obtained periodic, high-resolution  $R_{ds-on}$  data is filtered by a Kalman filter stage. With the proposed measurement solution,  $R_{ds-on}$  obtained at the motor current peak has an error of < 1.5%. Furthermore, the symmetrical nature of the inverter's operation is exploited to propose a Bayesian inference solution for independent online state-of-health (SoH) estimation for all six switches. This technique isolates aging-related  $R_{ds-on}$  change from operating conditions related changes. In particular, by automatically accounting for device and system level variations in the model, the proposed Bayesian SoH estimation solution eliminates the need for extensive system/device specific calibration. The efficacy and robustness of the proposed solution are tested by inducing bond-wire failure in several decapsulated discrete SiC MOSFETs.

• The key contributions, conclusions of this thesis and potential future work is briefly presented in Chapter 6

#### 1.5 List of Publications

This dissertation is largely based on research work which is already been published or will be published in IEEE journals, and conferences. The list of publications is given below

- M1: B. T. Vankayalapati, F. Yang, S. Pu, M. Farhadi and B. Akin, "A Highly Scalable, Modular Test Bench Architecture for Large-Scale DC Power Cycling of SiC MOSFETs: Towards Data Enabled Reliability," *in IEEE Power Electronics Magazine*, vol. 8, no. 1, pp. 39-48, March 2021.
- J1: B. T. Vankayalapati, S. Pu, F. Yang, M. Farhadi, V. Gurusamy and B. Akin, "Investigation and On-Board Detection of Gate-Open Failure in SiC MOSFETs," in IEEE Transactions on Power Electronics, vol. 37, no. 4, pp. 4658-4671, April 2022.

- J2: B. T. Vankayalapati, M. Farhadi, R. Sajadi, H. Tan, and B. Akin, "A Practical Switch Condition Monitoring Solution for SiC Traction Inverters," in *IEEE Journal of Emerging and Selected Topics in Power Electronics*, 2022.
- J3: B. T. Vankayalapati, M. Ajmal CN, A. V. Deshmukh, M. Farhadi, R. Sajadi, and B. Akin, "Model Based Closed-loop Junction Temperature Control of SiC MOSFETs in DC Power Cycling for Accurate Reliability Assessments", in IEEE Transactions on Industrial Applications. (In preparation).
- C1: B. T. Vankayalapati and B. Akin, "Closed-loop Junction Temperature Control of SiC MOSFETs in DC Power Cycling for Accurate Reliability Assessments," 2021 IEEE 13th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (SDEMPED), 2021, pp. 209-215.

#### CHAPTER 2

# A HIGHLY SCALABLE, MODULAR TEST BENCH ARCHITECTURE FOR LARGE-SCALE DC POWER CYCLING OF SIC MOSFETS

#### 2.1 Background

As discussed in the introduction, existing reliability challenges with SiC devices can potentially be addressed by either validating their long-term reliability through large accelerated aging datasets or by using on-board device condition monitoring techniques. In both these cases, large-scale reliability testing of SiC MOSFETs is needed. For example, the effectiveness of on-board health monitoring systems depends on the accuracy of the remaining useful life (RUL) estimation algorithm. Generally, RUL estimation methods for power devices use empirical models fitted to the accelerated aging data [10–14]. Consequently, the accuracy of RUL models directly corresponds to the size and variety of available accelerated aging datasets. Therefore, it is evident that large accelerated aging datasets are crucial for understanding and monitoring the reliability of SiC power MOSFETs, thus enabling their quicker adoption.

Accelerated aging methodologies must balance the tradeoff between mimicking real converter operation (multiple stressors present simultaneously) and correlating a particular stressor to corresponding aging mechanisms. DC power cycling test is a widely used accelerated aging test that fits this tradeoff well [15]. In this test, a current is passed through the device in on-state and its self-heating is used to vary the device junction temperature  $(T_j)$  between two set points which, to an extent, mimics real converter operation. In this test, although, the device aging is accelerated by applying a large  $\Delta T_j$ , it still takes on the order of several weeks or months for the device to fail. However, applying very large stress to cause faster aging can trigger unnatural degradation mechanisms. Consequently, obtaining a large, comprehensive, and rich accelerated aging dataset can take several years. Therefore, to obtain a large and comprehensive dataset as quickly as possible, it is imperative to be able to age a large number of devices simultaneously.

From the preceding discussion, it is evident that there is a need for a test bench architecture that can enable simultaneous testing of a large number of devices under different aging conditions. In this chapter, first, to clearly understand the requirements of a largescale DC power cycling test bench architecture, various factors influencing the quality of the test results are reviewed. Different lifetime models are analyzed to highlight the variables of interest. Especially, the importance of accurate junction temperature control is highlighted. However, accurate measurement of junction temperature is generally challenging, and various potential techniques are reviewed in this context. Further, high-level architectures of several DC power cycling test setups proposed in the literature are reviewed in terms of their scalability, practicality, flexibility, and fault isolation capabilities. Their merits and challenges are analyzed in detail. Based on this theoretical understanding, a fully modular, highly scalable, and practical DC power cycling architecture for discrete SiC MOSFETs is proposed. The proposed architecture allows the simultaneous aging of 48 devices while enabling independent control of the aging conditions of each device under test (DUT). The fundamental unit of the proposed architecture is a self-contained module that has its own processing, gate drive circuits, junction temperature measurement capability, and signal acquisition and condition circuits for on-board device on-state resistance measurement. Based on the review of available techniques, body diode voltage drop  $(V_{sd})$  at negative gate voltage is chosen as the temperature sensitive electrical parameter (TSEP) for on-board  $T_j$  measurement due to its aging independent characteristic and its universality. Each module also has an external communication interface that allows software enabled operation of the test bench. Further, the high-level independent parallel architecture that incorporates the above modules is illustrated. In this architecture, multiple modules are paralleled across a power supply that operates in constant current mode. A decentralized operational approach is proposed

and discussed that enables easy scalability. Basically, the modules are turned on sequentially which allows independent control over their aging conditions. However, in this approach, it needs to be ensured that the outgoing and incoming modules have a small overlap period during which both the corresponding DUTs are on to prevent measurement errors due to voltage spikes caused by current discontinuity. Therefore, a switch-over technique is proposed to optimize the overlap period and to implement this in a scalable decentralized manner. However, the use of  $V_{sd}$  for  $T_j$  measurement, means measurement can only be performed when the device is off. This implies real-time  $T_j$  is unavailable. To overcome this issue, a hybrid feed-forward hysteresis control algorithm for accurate control of  $\Delta T_j$  is proposed. Since power cycling tests inherently cause devices to fail, fault isolation is a crucial requirement for a test bench. Therefore, a robust fault detection, isolation, and fault classification technique for the proposed architecture is proposed. Finally, the performance of the test bench is validated on actual hardware. Further, when multiple devices are aged simultaneously, it is also important to minimize any potential thermal interaction. Thermal isolation in the proposed setup is verified through CFD simulation. The proposed DC power cycling test bench architecture is also compared to other architectures in the literature.

## 2.2 Review of Considerations in DC Power Cycling of SiC Devices

### 2.2.1 Review of DC Power Cycling

In a typical DC power cycling test, the DUT is subjected to repeated cycles of temperature swing between two set values using the device self-heating as shown in Fig. 2.1.

DC power cycling accelerates package related degradation mechanisms. Fatigue in bond wire attachments and die attach solder layer is caused due to gradual mechanical stress accumulation caused by coefficient of thermal expansion (CTE) mismatch at the interfaces. This can eventually lead to failures such as wire bond liftoff or die-attach solder



Figure 2.1: DC power cycling illustration.

delamination [16, 17]. In addition to package degradation, DC power cycling also accelerates gate oxide degradation. The gate oxide layer in SiC devices is relatively more susceptible to degradation when compared to Si devices [18]. The lower thickness of the oxide layer in addition to greater defect density leads to oxide and interface trapped charges [19]. These trapped charges lead to time-dependent die electric breakdown (TDDB) which could manifest as a gate to source short. The high temp interval during the on time as shown in Fig. 2.1 is mainly responsible for accelerating gate oxide degradation. During this interval, the gate oxide is simultaneously subjected to a high electric field and high temperature which cause an increase in defect density. Therefore, to a large extent, DC power cycling mimics the electro-mechanical stresses experienced by a device during real converter operation. Table 2.1 lists the commonly used cycles-to-failure analytical models used for device lifetime estimation based on power cycling data. Although these models are focused on package degradation-related failure in Si devices, they are representative of SiC devices also due to similarity in failure mechanisms. Here,  $N_f$  is the cycles to failure,  $\Delta T_j$  is the junction temperature swing,  $t_{on}$  is the DUT on-time,  $I_d$  the drain current per bond wire, V is the switch blocking voltage and D is the bond wire diameter,  $T_{min}$  is the minimum junction temperature,  $E_a$  is the activation energy,  $k_B$  is the Boltzman constant and A,  $\beta_2$ ,  $\beta_3$ ,  $\beta_4$ ,

| Model Name                        | Lifetime Model                                                                                         |
|-----------------------------------|--------------------------------------------------------------------------------------------------------|
| Coffin-Manson Model [10]          | $N_f = A\Delta T_j^{-n}$                                                                               |
| General Coffin-Manson Model [13]  | $N_f = A(\Delta T_j - \Delta T_0)^{-n}$                                                                |
| Modified Coffin-Manson Model [14] | $N_f = A\Delta T_j^{-n} exp(\frac{E_a}{k_B T_m})$                                                      |
| Bayerer's Model [12]              | $N_f = A\Delta T_j^{-n} exp(\frac{E_a}{T_{min}}) t_{on}^{\beta_3} I^{\beta_4} V^{\beta_5} D^{\beta_6}$ |

Table 2.1: Common cycles-to-failure Analytical Models

 $\beta_5$ ,  $\beta_6$  are empirical constants . The models are shown in increasing order of complexity. Bayerer's model, for example, considers many more factors in estimating the device life than the relatively simple Coffin-Manson model. However, from every model, it is seen that  $N_f$  and  $\Delta T_j$  are directly related. Qualitatively, this is because a larger  $\Delta T_j$  creates larger mechanical stress accumulation at the interfaces leading to a quicker failure. Therefore, during a DC power cycling test, accurate control of DUT junction temperature swing is crucial. Any unintentional variation in  $\Delta T_j$  during testing can lead to an error in the predicted life of the device. Further, it is also crucial to be able to independently control parameters such as  $t_{on}$ ,  $I_d$ . Furthermore, it is also evident that comprehensive data in terms of devices with different specifications is likely to result in a more accurate lifetime model.

#### 2.2.2 Junction Temperature Measurement

From the preceding discussion, it is clear that accurate  $T_j$  is imperative for accurate lifetime estimation. The junction temperature of a device can either be measured directly or indirectly. A DC power cycling setup is presented in [20], where a fiber optic temperature probe is used to directly measure the die temperature in SiC power modules. While the accuracy of the  $T_j$ is high, such techniques are limited to packages with direct access to the die. Therefore, it is challenging to implement direct measurement techniques in discrete devices where the die is encapsulated.



Figure 2.2: C-SAM image of an aged device from backside showing die attach solder layer delamination.

Alternatively, there are two main ways to indirectly measure the  $T_j$  of a DUT. In the first technique, the case temperature  $(T_{case})$  of the device is measured using a thermocouple, resistance temperature detector (RTD), or a temperature measurement chip. The junction temperature is then estimated from junction to case thermal impedance  $(R_{th-JC})$  values provided by the manufacturer in the datasheet and estimated power loss in the device. In |21|and [22] this technique is used for  $T_j$  estimation during DC power cycling. However, during DC power cycling, die-attach solder delamination is often observed. Figure 3.1 shows a C-SAM (Confocal Scanning Acoustic Microscopy) image of a SiC MOSFET device aged under DC power cycling from the backside. The dark areas under the die represent healthy die attach solder. Lighter areas under the die show voids or delamination. It is seen that die attach solder delamination is observed near the corners of the die. Such delamination leads to a gradual increase in the junction to case thermal impedance. Therefore, unless compensated for aging,  $T_j$  estimation using  $T_{case}$  and  $R_{th-JC}$  can lead to inaccurate values. However, such compensation techniques can either be cumbersome or have extensive computational requirements [23]. Another popular indirect method of  $T_j$  measurement of a device is to use a temperature sensitive electrical parameter (TSEP). Device on-state resistance  $(R_{ds-on})$ , gate threshold voltage  $(V_{th})$ , and body-diode on-state voltage  $(V_{sd})$  are popular choices for  $T_j$ measurement due to their strong temperature dependency and ease of measurement. In [24], a DC power cycling setup is proposed where  $R_{ds-on}$  is used for  $T_j$  measurement.  $R_{ds-on}$  of a SiC MOSFET is given by (2.1), where  $R_{ch}$  is the channel resistance,  $R_j$  is the JFET region resistance,  $R_d$  is the drift region resistance,  $R_s$  is the substrate resistance and  $R_{pk}$  is the package resistance.

$$R_{ds-on} \approx R_{ch} + R_j + R_d + R_s + R_{pk}.$$
(2.1)

Among these,  $R_{ch}$  and  $R_d$  are usually dominant and are strongly temperature dependent. However,  $R_{ch}$  can also vary with device aging. In SiC devices, the device threshold voltage usually increases with aging, and a corresponding increase in  $R_{ch}$  is also observed. Furthermore, during DC power cycling, package degradation is observed. This can lead to an increase in  $R_{pk}$ . Both of the previous factors can result in aging-related  $R_{ds-on}$  change. Therefore,  $R_{ds-on}$  based  $T_j$  measurement can lead to inaccuracies with aging. As mentioned previously,  $V_{th}$  although temperature dependent, is also affected by aging. Therefore, accurate  $T_j$ measurement over aging using  $R_{ds-on}$  and Vth as TSEPs is challenging. Body diode forward voltage drop is well-known TSEP for  $T_j$  measurement [25–27].  $V_{sd}$  based  $T_j$  measurement is used in [28] for DC power cycling of SiC devices. However, to eliminate the effect of package degradation on  $V_{sd}$  over aging, it is necessary for the injected current to be small. Further, in most SiC devices, unlike Si devices, the channel can conduct in the reverse direction even at zero gate voltage due to the body effect. To eliminate this effect, the measurement current needs to be injected at a negative gate voltage. At negative gate voltages, the channel is fully off. Therefore, the effect of gate oxide degradation on  $V_{sd}$  is negligible at negative gate voltages. Moreover,  $V_{sd}$  also shows a very linear temperature characteristic. Therefore,  $V_{sd}$ is reliable aging independent, linearly varying TSEP. However, one challenge with the use of  $V_{sd}$  for  $T_j$  measurement is the requirement for the device to be turned off during the measurement.

#### 2.2.3 Multi-Device DC Power Cycling

Generally, in DC power cycling tests it is required to age multiple devices simultaneously. As previously mentioned, to obtain a comprehensive aging dataset for accurate lifetime modeling, a test setup must be able to age devices of different specifications at different aging conditions. It is also important to accurately measure the device parameters during testing. Further, to obtain a large dataset it should be able to age a large number of devices simultaneously. However, scaling a DC power cycling test setup to age multiple devices while being able to precisely control the aging conditions is a non-trivial task. In this context, the merits and challenges of various DC power cycling test architectures proposed in the literature are reviewed as follows.

### Parallel Architecture

In [21], a power cycling setup for MOSFETs is proposed. In this architecture, multiple DUTs are connected across a power supply in parallel as shown in Fig. 2.3(a). When the DUTs are turned on, the power supply changes from the constant voltage mode of operation to the constant current (CC) mode. Provided all the devices have the same specifications, the current is approximately equal in every device. Once a DUT reaches the desired  $T_{j-max}$ , it is turned off for cooling. However, merely turning off a DUT would change the current flowing through all the other DUTs since the supply current is constant. Therefore, to maintain a constant  $I_d$  in every device throughout aging, each DUT also has a bypass switch connected in parallel. The switching signals to the DUT and the corresponding bypass switch are complimentary. This ensures that the current through each DUT is constant. In this method, however, the current sharing in all paralleled DUTs is not perfectly equal due to small manufacturing differences and aging-related parametric changes. Since the current through each DUT cannot be exactly controlled, it is challenging to precisely control the aging conditions. Moreover, it is required for all DUTs and corresponding bypass switches to



Figure 2.3: Schematics of multi-device power cycling architectures a) Parallel architecture b) Independent architecture c) Staged parallel architecture d) Series architecture.

have the same specifications during a test to ensure predictable current sharing. This can limit the number of different devices that can be tested simultaneously. Further, short circuit fault isolation can also be challenging in this architecture. However, the architecture is easy to scale and very practical in terms of the number of power supplies required.

### Independent Architecture

Many of the challenges associated with the parallel architecture can be addressed by connecting each DUT to an individual power supply as shown in Fig. 2.3(b) [22,28]. A bypass switch is also connected along with every DUT. In this architecture, the bypass switch is used to ensure the continuity of the power supply current. Otherwise, a sudden discontinuity in the supply current can cause the power supply to switch from CC to CV mode during turn-off and CV to CC mode during turn-on. Such transitions can cause ringing across the DUT and lead to online measurement errors. Therefore, the signals to the DUT and the bypass switch are complimentary with a slight overlap to ensure continuity of supply current. Detailed analysis of switching-induced ringing across DUT, its impact on online measurement accuracy, and ways to mitigate it through the use of damping capacitors is discussed in [28]. The presence of an individual power supply for each DUT enables independent and precise control over the DUT aging conditions. It also enables aging devices with different specifications simultaneously. However, aging a large number of devices simultaneously can be practically challenging in this architecture given the number of power supplies required.

#### Staged Parallel Architecture

A power cycling setup with multiple paralleled DUTs connected across a current controlled DC-DC stage, as shown in Fig. 2.3(c), is proposed in [24]. In this architecture, the DUTs are turned on sequentially. The current through the DUT is ramped up and down smoothly by the DC-DC stage. This technique eliminates the need for bypass switches and also ensures there are no transients across DUT during switching, thus improving the accuracy of online parameter measurement. Further, the setup offers high flexibility in terms of controlling the applied current profile to each DUT, therefore, enabling precise individualized control of aging conditions. This architecture is also quite practical in terms of the power supplies required to age a number of devices simultaneously. However, although this method addresses many of the previous concerns, the scalability of the setup for high-current devices could be challenging given the use of a DC-DC stage. Moreover, since all the devices share a common bus, isolation of short circuit faults can be challenging.

## Series Architecture

Alternatively, in [20], multiple strings of DUTs in series are paralleled across the power supply as shown in Fig. 2.3(d). Every string also has a main switch connected in series. This switch isolates the string from the supply bus during cooling. All the DUTs in every string are always kept on. The current is passed through each string sequentially by controlling the main switch. This architecture allows the aging of a large number of devices simultaneously. The throughput in terms of the number of devices aged is also high. In this architecture, however, the devices in a string have the same current flowing through them. Moreover, since all the devices in a string are in series, independent  $t_{on}$  control is not possible. Consequently, small differences in device parameters from manufacturing or aging can lead to different  $\Delta T_j$ in each of the devices. From the previous discussion, device lifetime is strongly correlated to  $\Delta T_j$ . Therefore, accurate and independent control of aging conditions is challenging with the series architecture. Furthermore, in this architecture, the devices in a string experience similar albeit slightly different  $\Delta T_j$ . Therefore, devices to be aged at different  $\Delta T_j$  need to be placed in different strings. This requirement limits the capability of the series architecture in terms of simultaneously aging devices at different conditions. While a DUT short circuit fault wouldn't be a problem, open circuit fault isolation is challenging in the series architecture.

When aging a large number of devices simultaneously, physical proximity between the devices is generally necessary with most of the above architectures. Further, DUTs are often cooled using forced air convection through a fan. In this scenario, it is important to have thermal isolation among the DUTs. For example, if an air draft from a DUT that is cooling interferes with a DUT that is simultaneously heating, it can affect the DUT's  $t_{on}$  or  $\Delta T_j$ . Therefore, good thermal isolation among the devices is necessary to prevent any unwanted effect on a DUT's aging conditions due to other DUTs.

With a large number of DUTs, another, often overlooked, the challenge is that of highresolution online data collection. To overcome this problem, in [21,22] online data is recorded once between several hundred cycles. The use of an external data acquisition system (DAQ) is also possible. However, with a large number of devices, running individual measurement wires to each of the devices from a central DAQ can be cumbersome and introduce measurement noise. Therefore, it is beneficial to have the ability to collect high-resolution data online.

#### 2.3 Proposed Architecture

The details of various sub-components of the proposed large scale test bench architecture are discussed further.

### 2.3.1 Module Design

The fundamental unit of the proposed architecture is a module, the schematic of which is shown in Fig. 2.4(a). As shown, each module is a fully self-contained unit with its own processing (microcontroller), sensing circuits, cooling, and an external communication interface. Every module is capable of aging one device. A cut-off switch called the "main switch" is connected in series with the DUT. The DUT and the main switch are connected across the DC bus. The purpose of the main switch (MSW) is twofold. It is used to isolate the DUT from the DC bus in case of failure, thus providing on-board fault protection. It also turns off when the DUT is cooling to enable on-board measurements without interference from other modules' DUTs as will be discussed in detail later.

Each module has a low-cost on-board microcontroller that generates the necessary signals to control the switches. Further, the on-chip ADC of the microcontroller is used to sense the DUT drain-source voltage  $(V_{ds})$  and drain current  $(I_d)$  signals. These signals are sensed and conditioned by on-board sensing and conditioning circuits. Localized sensing minimizes measurement noise and improves measurement accuracy. The DUT is switched by a gate driver supplied by positive (turn-on) and negative (turn-off) programmable power supplies. Software programmability of the gate supply offers flexibility in changing the device aging conditions.


Figure 2.4: a) Schematic of the module b) Schematic of the complete high-level architecture.

As previously discussed, accurate junction temperature  $(T_j)$  estimation of the DUT is crucial in DC power cycling tests. Based on the discussion in Section 2.2,  $V_{sd}$  is a suitable TSEP due to its insensitivity to device aging and linear temperature characteristic. Therefore, in the proposed architecture  $V_{sd}$  at low current and negative gate turn-off voltage is used for  $T_j$  estimation. For this purpose, each module has a small reverse current injection circuit to measure DUT  $V_{sd}$  for  $T_j$  measurement as discussed earlier. In the proposed setup, 300 mA is used as the value of the injected current. This value ensures good measurement resolution while ensuring that the effect of package degradation on  $V_{sd}$  is negligible. Additionally, every module also has a thermocouple-based temperature measurement functionality to measure the DUT case temperature. This can potentially enable thermal impedance measurements and the implementation of over-temperature protection features. Further, every module is also connected to a PWM controlled cooling fan. This enables controlled active cooling of the DUT. This feature is crucial in being able to vary the aging conditions.

The on-board microcontroller is also connected to an external  $I^2C$  communication bus [29]. The module acts an  $I^2C$  slave while the rack master, as will be shown later, is the  $I^2C$  master and controls the bus. While the on-board microcontroller is responsible for the control signals and sensing, the high-level parameters and instructions are passed to it by the rack master. The  $I^2C$  bus allows the rack master to configure the module settings as necessary and also read the on-board measurement values.

#### 2.3.2 High Level Architecture

The high-level schematic of the proposed modular, the scalable test bench is shown in Fig. 2.4(b). It is an independent parallel high-level architecture. In this architecture, a number of the previously shown modules are connected in parallel across a power supply to form a "rack". Although any number of modules can be connected in a rack, for the purpose of this study, each rack has 6 modules. The common  $I^2C$  bus which is connected

to every module also connects to a "rack master" which controls the bus. The test bench can contain multiple such racks. In this study, the bench contains 8 racks. All the rack masters are connected to a TCP/IP switch through an ethernet connection. A host computer also connects to the switch. All the rack masters and the host computer communicate using the MQTT protocol. MQTT was chosen because of its robustness and relative ease of implementation. The host computer runs an interface for the user to control the system. The host computer is also connected to the internet and thus enables access to the test bench from anywhere in the world.

The proposed independent parallel architecture has several advantages over previously proposed architectures. It allows multiple DUTs to share a common power supply while enabling independent control of aging conditions for individual DUTs. Further, it enables the isolation of faults in individual DUTs without interfering with the operation of the rest of the bench. Scaling the system is easy since it only requires the addition of modules or racks as necessary. The same software runs on every module and every rack master. Additionally, the software is designed to be flexible and no hardcoding during tests is required. It must be noticed that there is no bypass switch used in the architecture. The bypass functionality is achieved using the available modules, the details of which are discussed later in the chapter.

#### 2.3.3 Overview of Operation

Fig. 2.5 shows the state flow representations of rack master and module operation. At startup, the master and all the modules are in standby mode. When a test session is initiated by the user from the host, a start command with the test configuration for each module is sent to the rack master. Then, each active module is configured by the rack master. The master waits for confirmation of configuration from all the corresponding modules. After all the modules are configured, a START signal is sent to the first module by the master. At this point, the module controller starts a timer and turns the DUT on. Once the timer reaches



Figure 2.5: State flow diagrams describing the operation of the rack master and module.

 $t_{on}$ , the module signals the end of heating to the master by sending an END signal and waits for the next module to turn on. The master upon receiving the END signal sends a START to the next module. Once the next module turns on, the previous module turns off the DUT and turns on the cooling fan. At this point, the timer is also reset by the module. Further, the outgoing module updates all the measurements for that particular cycle. Thereafter, the master reads the measurements from the outgoing module and sends them to the host. This ensures that data transfers only occur during non-crucial times to minimize delays in the processing of crucial events. The master then waits for the END signal from the next module. Once the previous module timer reaches  $t_{off}$ , it turns off the cooling fan and waits for the next START signal. The whole process repeats cyclically. The algorithm to determine  $t_{on}$ and  $t_{off}$  for a particular cycle is presented in the next subsection.

As previously shown, in the proposed architecture there are no bypass switches. However, since the same power supply is shared between multiple paralleled devices, it is necessary to maintain the continuity of the power supply current during the switch-over from an outgoing DUT to the incoming DUT. Since the main power supply operates in CC mode, a dead time



Figure 2.6: Illustration of proposed switch-over technique based on "indirect-sensing".

during switch-over can lead to a large voltage spike across the DUTs. Such spikes can in turn lead to inaccurate  $R_{ds-on}$  measurement of the incoming DUT. To overcome this challenge without an additional bypass switch, a switch-over technique is proposed. Consider Module 1 (DUT 1) as the outgoing module and Module 2 (DUT2) as the incoming module as shown in Fig. 2.6. The two modules are located adjacent to each other in the rack. To avoid a large voltage spike across the DUTs, an overlap period between the outgoing and incoming DUTs is required. The overlap period ensures current continuity and maintains the main power supply in CC mode of operation. As previously described, at the end of DUT1's heating period i.e. interval 1, Module 1 sends the END signal to the master to request switch-over. The master then sends a START command to Module 2. However, owing to the time required for communication and processing, there is a small delay ( $t_{delay}$ ) between the transmission of the END signal by module 1 and DUT2 actually turning on. Therefore, if DUT1 turns off immediately after sending the END signal, it will likely turn off before DUT2 turns on, thus leading to a voltage spike. Moreover, the communication delay can be arbitrary depending on the program and interrupt execution sequence of each of the microcontrollers. One way to address this problem is to hard program a known large delay between DUT1 sending the END and DUT1 turning off. However, such a method could lead to unnecessary additional overlap between DUT1 and DUT2 which could result in a change in the heating profile of the switches.

Therefore, to minimize the overlap period between the two switches, a switch-over technique based on "indirect sensing" is proposed. As shown in Fig. 2.6, after module 1 sends the END signal to the master, the master then commands module 2 to turn on DUT2. Assuming DUT1 and DUT2 are the same devices with almost equal  $R_{ds-on}$  values, the supply current is now shared almost equally between the two DUTs. Module 1 detects the fall in DUT1 current based on the on-board current sensor reading and "indirectly senses" the turn-on of DUT2. At this point, DUT1 is turned off. Therefore, in the proposed technique, the need for a separate bypass switch is eliminated by using the incoming DUT as a temporary bypass device. Additionally, the overlap period during switch-over is optimized without additional control complexity.

#### **2.3.4** Closed Loop Control of $\Delta T_j$

As previously discussed, from Table 2.1, it is evident that accurate closed-loop control of  $\Delta T_j$  is crucial for accurate device lifetime modeling using DC power cycling. In the proposed architecture,  $V_{sd}$  measurement of the DUT is used to estimate  $T_j$ . of the DUT.  $V_{sd}$ measurement is only possible through the injection of a small current when the DUT is off. Therefore, real-time  $T_j$  information cannot be obtained while the DUT is on. To overcome this challenge and achieve closed-loop control of  $\Delta T_j$  a hybrid feedforward hysteresis control algorithm as shown in Fig. 2.7 is proposed. On-time  $(t_{on})$  is defined as the duration between the turn-on of the DUT and the generation of the "END" signal by the module. Off time  $(t_{off})$  is defined as the duration for which, the DUT is actively cooled by the fan.

The test is started with an arbitrary value of DUT on-time  $(t_{on})$ . However, too large a value may cause excessive device heating. The best choice for initial  $t_{on}$  would be a relatively small arbitrary value. At the end of the heat-up period, after the DUT turns off, the junction temperature of the DUT is measured and compared to  $T_{j-max}$  reference value. If the measured  $T_j$  is less than the reference,  $t_{on}$  is increased by  $\gamma$  or decreased by  $\delta$  otherwise. This updated value of  $t_{on}$  is used for the next heating cycle. Similarly, right before the DUT is turned on for heating,  $T_j$  of the DUT is measured and compared with the  $T_{j-min}$  reference value. If the measured  $T_j$  is more than the reference,  $t_{off}$  is increased by  $\alpha$  or otherwise decreased by  $\beta$ . This  $t_{off}$  value is used as the off time for the next cooling cycle. Every module calculates the time between the first turn-off event and the second turn-on event and uses it as an initial value for  $t_{off}$ . The values of  $\alpha, \beta, \gamma$ , and  $\delta$  parameters need to be chosen such that the DUT reaches the set values of  $T_{j-max}$  and  $T_{j-min}$  within as few cycles as possible. However, large values of the parameters can cause oscillations about the reference  $T_j$  values. Therefore, parameters can also be adjusted dynamically. When the deviation from the reference values is high, the parameters can be chosen to be large and when the deviation is small, the parameter values can be made smaller for finer adjustment.

Therefore, although hysteresis like control is used to determine the  $t_{on}$  and  $t_{off}$ , the information used is not real-time. The values at the end of a cycle are used to determine the parameters for the next cycle (feedforward). The control algorithm runs in the on-board microcontroller.

#### 2.3.5 Fault Detection and Isolation

Oftentimes, the DUTs are aged to failure during DC power cycling tests. Failure can be defined in multiple ways including open and short circuit failure of the device. Open circuit



Figure 2.7: Proposed algorithm.

| $M_{sw}$ Status | DUT<br>Status | Condition | Fault<br>Type          |  |  |  |
|-----------------|---------------|-----------|------------------------|--|--|--|
| ON              | ON            | $I_d = 0$ | $M_{sw}$ and/or DUT OC |  |  |  |
| OFF             | ON            | $I_d > 0$ | $M_{sw}$ SC            |  |  |  |
| ON              | OFF           | $I_d > 0$ | DUT SC                 |  |  |  |
| OFF             | OFF           | $I_d > 0$ | $M_{sw}$ and DUT SC    |  |  |  |

Table 2.2: Fault Tree for Detection of OC and SC Faults

failure can be caused by bond wire failure, die attachment failure, or loss of gate control. Short circuit failure on the other hand can be caused by loss of gate control for turn-off, mold compound issues, or catastrophic heating of the die. Therefore, it is imperative for a power cycling setup to be able to detect and isolate faulty devices and prevent potential interference with the normal operation of the setup.

In the proposed DC power cycling architecture, the module design enables easy fault detection and isolation. Further, it is also possible to classify the type of fault. Since most fault mechanisms manifest as open circuit or short circuit of the DUT, the proposed fault detection algorithm focuses on the same. As discussed, each module has a main switch in series with the DUT which is connected to the supply bus. Using the main switch, the DUT is periodically tested for fault. Moreover, the main switch itself is also tested since it is susceptible to failure as well. The technique is summarized in Table 2.2. When the DUT is commanded to turn on by the master, the controller turns on the two switches and waits to detect current through the DUT. If no current is detected, it implies either the DUT or the main switch has failed open. Similarly, when the device is not actively, heating up, the main switch and DUT are turned on alternatively while the other is kept off. In either case, if a current is detected, a short circuit fault has occurred. In this situation, depending on the case for which the current is detected, the faulty switch can be identified.



Figure 2.8: Pictures of the developed module a) top view b) bottom view.

Finally, the controller also constantly monitors the DUT current when both the main switch and the DUT are off. The presence of a current during this interval indicates a short circuit failure of both the main switch and the DUT. Unlike other cases, where the fault is isolated by turning off the switches, it is not possible to isolate this kind of fault. The advantage of the proposed technique is that, under normal conditions, no current needs to flow through the circuit to detect a fault. Also, the method is more robust compared to other methods that rely on detecting a fall in on-state resistance to detect a short circuit.

#### 2.4 Results and Discussion

The proposed DC power cycling test bench was implemented in hardware. Figure 2.8 shows the module hardware. Various sections of the module are highlighted. The fully assembled test bench with 8 racks is shown in Fig. 2.9. Each rack has 6 modules. The bench is designed to be assembled into a standard 19" rack tower. This enables mechanical modularity and easy maintenance. The rack masters are connected over the same ethernet network to the host computer. The host is also connected to the internet and therefore enables web access to the test bench. The racks are constructed out of high density polyethylene plastic (HDPE). Since it is a poor thermal conductor, it prevents any conductive heat transfer between modules. Further, the modules are enclosed by walls on three sides and only the front is open for access to the DUT and cooling air exit. This is designed to prevent the cooling air of one module from interacting with another DUT. The proposed bench is experimentally validated as below.

#### 2.4.1 Verification of Smooth Switch-Over

Figure 2.10 shows the experimental switch-over the transition waveform between DUT1 and DUT2. From the figure, it can be seen that during interval 1, DUT1 is conducting current. Interval 2 is the transition interval during which both DUT 1 and DUT 2 are conducting and in interval 3 only DUT 2 is conducting the supply current. It is seen that there is a  $\sim 5$  ms overlap between the two switches. For the practical implementation, the controller is programmed to turn off the outgoing switch after it detects a reduced but almost constant value of  $I_d$ . This generalization allows switches of different ratings to be aged in the same rack without requiring  $I_d$  to strictly fall by half. Further, the overlap time can be adjusted by changing the number of consecutive samples required to trigger the turn-off. It is also seen that there are no oscillations in the current waveform and the transition is smooth. It is



Figure 2.9: Picture of fully assembled setup with 8 racks in standard 19" rack cabinet.

also observed from the gate voltage waveform  $V_{g-DUT2}$  that during interval 1 although DUT 2 is not conducting any current, it is periodically turning on. During this interval, the DUT



Figure 2.10: Experimental switch-over waveform during the transition from DUT 1 to DUT 2

Table 2.3: Experimental Results of Closed Loop  $\Delta T_j$  Control Verification

| Test<br>Case | $\begin{array}{c} T_{j-min} \\ \text{Ref (°C)} \end{array}$ | $\begin{array}{c} T_{j-max} \\ \text{Ref (°C)} \end{array}$ | $T_{j-min} V_{sd} \operatorname{Ref} (V)$ | $T_{j-max} V_{sd} \operatorname{Ref} (V)$ | $T_{j-min}$<br>Meas. (°C) | $T_{j-max}$<br>Meas. (°C) | $T_{j-max} V_{sd}$<br>Meas. (V) | $T_{j-max} V_{sd}$<br>Meas. (V) |
|--------------|-------------------------------------------------------------|-------------------------------------------------------------|-------------------------------------------|-------------------------------------------|---------------------------|---------------------------|---------------------------------|---------------------------------|
| 1            | 34                                                          | 130                                                         | -3.33                                     | -2.97                                     | 37.6                      | 127.34                    | -3.339                          | -2.969                          |
| 2            | 55                                                          | 150                                                         | -2.9                                      | -3.23                                     | 58.14                     | 146.7                     | -2.904                          | -3.233                          |

and main switch are in a complimentary manner. These switching operations are part of the real-time fault detection routine. The same is also true for DUT1 during interval 3.

## 2.4.2 Verification of Closed Loop $\Delta T_j$ Control

The closed loop  $T_j$  control algorithm is experimentally verified. In the practical implementation, DUT  $V_{sd}$  values corresponding to the  $T_j$  reference values are sent to the module by the master at the beginning of the test. The  $V_{sd}$  values corresponding to the reference  $T_j$ are obtained by placing the device in a forced air convection oven with precise temperature control and characterizing the DUT. This helps reduce the required on-board computation as the values are pre-calibrated. To obtain the actual  $T_j$  of a DUT to verify the algorithm, one device was decapsulated in order to expose the die as shown in Fig. 3.11(a).



Figure 2.11: a) Decapsulated device with exposed die for testing. b) IR camera image of the DUT during testing.

An IR camera was used to record the die temperature during testing. Since the temperature distribution of the die is slightly non-uniform, the actual die temperature is defined as the mean temperature of the die. The software used for IR temperature measurement calculates the mean temperature of the die area marked in Fig. 3.11(b) by averaging the temperature values obtained from individual pixels inside the die area. The results obtained from the test for two different  $\Delta T_j$  are shown in Table 2.3. It can be seen that experimentally obtained values are close to the reference values. A maximum error of ~10.5% is observed in  $T_{j-min}$  in case 1. Error for  $T_{j-max}$  values in both cases is ~2% and  $T_{j-min}$  of case 2 has an error of ~5.7%. It is observed that the error percentage is relatively higher for  $T_{j-min}$  values which is due to the smaller absolute values being measured. Further, it must be noted that the error includes any calibration error, error in on-board  $V_{sd}$  measurement, and error in temperature measurement using an IR camera.



Figure 2.12: Comparison of on-board  $R_{ds-on}$  measurements with device characterization results.

## 2.4.3 Verification of R<sub>ds-on</sub> Measurement Accuracy

To verify the accuracy of on-board  $R_{ds-on}$  measurement, the device was first characterized using a device characterizer at for different drain current values. The same device was plugged into the test bench and on-board measurements were obtained. The two sets of results are compared in Fig. 2.12. The  $R_{ds-on}$  values shown in Fig. 2.12 are directly calculated by the microcontroller with no external processing or signal conditioning. It is observed the experimental values are very close to the characterization results. An offset of ~10 m $\Omega$  is observed in all values. This is because of the contact resistance of the adapter used to plug the device into the module. This adapter resistance is consistent with the adapter manufacturer's specifications. Therefore, from the results, it can be concluded that on-board measurement values agree with offline characterization data. Since  $R_{ds-on}$  of the device is measured right after DUT turns on, it also verifies that due to smooth switchover between DUTs the effect of potential turn-on ringing on measurement accuracy is mitigated.



Figure 2.13: CFD simulation results for fan airflow. Air speed contours for a) Horizontal cut plane b) Vertical cut plane.

#### 2.4.4 Verification of Thermal Isolation

The thermal isolation between modules is verified through CFD simulations. A 3D model of a rack with 3 modules is created. CFD simulation using fan airflow given by the manufacturer was performed. The simulations for performed for DUT in the TO-247 package. Fig. 2.13(a) shows a contour plot of the airspeed on a cross-sectional plane perpendicular to the device. It is clearly observed that fan air cooling the DUT in module 2 does not interact with adjacent modules 1 and 3. Similarly, Fig. 2.13(b) shows the airspeed contours plot on a cross-sectional plane parallel to the DUT. Because of the narrow opening between the top of the rack and the device, air speed is slightly high there. It is important for this air to not interact with the DUTs in the rack or below. It is observed that, although the cooling air spreads out in front of the rack, it does not flow directly above or below. Therefore, it can be concluded that



Figure 2.14: C-SAM images of a) New device b) Aged device M1 c) Aged device M2.

air from one rack would not interfere with DUTs in the racks above and below. Moreover, power supplies are placed between the racks therefore, any possibility of thermal interaction between different racks is further minimized.

#### 2.4.5 Device Aging Analysis

C-SAM imaging results of 2 representative devices aged on the proposed aging test bench are shown in Fig 2.14. Figure 2.14(a) shows a new device. Figure 2.14(b) and 2.14(c) shows aged devices designated as M1 and M2 respectively. Each of these devices was aged  $I_d = 7$  A and  $T_j$  swing from 55°C - 150°C for 10,000 cycles. The red and yellow areas in the devices indicate delamination between the mold compound and the drain tab with red indicating higher severity. It is observed that a new device shows no sign of delamination. However, M1 and M2 show extensive delamination. In fact, almost the entire mold compound, and drain tab interface shows delamination in M2. In M1, except for a small region, the rest of the mold compound drain tab interface shows delamination. These results are consistent with degradation mechanisms observed in DC power cycling tests. Such delamination can cause bond wire liftoff due to shear caused by relative movement between the die and bond wires. Furthermore, from Fig. 2.14(c) a small area at the top-right corner of the die indicated by

| Reference                                                       | [21]                          | [22]                          | [24]                        | [20]                           | [28]               | Proposed                    |
|-----------------------------------------------------------------|-------------------------------|-------------------------------|-----------------------------|--------------------------------|--------------------|-----------------------------|
| High-level Architecture                                         | Parallel                      | Independent                   | Staged<br>Parallel          | Series                         | Independent        | Independent<br>Parallel     |
| No. of power supplies<br>(theoretical min)                      | 1                             | 48                            | 1                           | 1                              | 48                 | 8                           |
| No. of bypass switches                                          | 48                            | 48                            | 0                           | 0                              | 48                 | 0                           |
| Additional converter stage required                             | No                            | No                            | Yes                         | No                             | No                 | No                          |
| Independent $\Delta T_j$ control                                | Yes                           | Yes                           | Yes                         | No                             | Yes                | Yes                         |
| $T_j$ measurement technique                                     | $T_{case}$ +<br>Thermal Model | $T_{case}$ +<br>Thermal Model | $R_{ds-on}$                 | Fibre Optic Sensor             | $V_{sd}$           | $V_{sd}$                    |
| $T_j$ measurement accuracy<br>over aging                        | Low                           | Low                           | Low                         | High                           | High               | High                        |
| Data throughput<br>(min time between 2 turn-ons<br>of a switch) | $t_{on} + t_{off}$            | $t_{on} + t_{off}$            | $\sum_{n=1}^{N} t_{on}^{n}$ | $\sum_{n=1}^{N_{ps}} t_{on}^n$ | $t_{on} + t_{off}$ | $\sum_{n=1}^{6} t_{on}^{n}$ |
| Data collection method                                          | External DAQ                  | External DAQ                  | On-Board                    | Oscilloscope                   | On-Board           | On-Board                    |
| Data collection scalability                                     | Low                           | Low                           | Medium                      | Low                            | Medium             | Very high                   |

Table 2.4: Comparison of Proposed Architecture and Extrapolated Existing Architectures for48 Devices

the pink circle is also observed to be delaminated. For this device, the area represents the location of the gate bond pad. Excessive stress at this location can cause the relatively thin gate bond wire to lift off. This leads to loss of device control and eventually short circuit or open circuit failure.

#### 2.4.6 Comparison to Existing Architectures

In Table 2.4, the proposed scalable, modular test bench architecture is compared to existing architectures in the literature. The existing architectures are extrapolated for 48 devices to ensure a fair comparison. The proposed architecture combines the benefits from previously proposed parallel and independent architectures through an independent parallel structure. This architecture allows sharing of a power supply between multiple devices while allowing independent control of their aging conditions. This, however, is challenging with the series architecture. The proposed architecture requires 8 power supplies in comparison to the

theoretical minimum requirement of 1 power supply with the parallel, stages parallel and series architectures and 48 required with independent architectures. However, using one power supply with the staged parallel architecture will cause a significant decrease in the aging throughput. The proposed architecture does not require any bypass switch since the modules themselves are used to ensure current continuity. The same is also true for staged parallel and series architectures. However, the independent and parallel architectures need 48 bypass devices for 48 DUTs. Independent  $\Delta T_j$  control is possible with the proposed architecture and also with other architectures except for the series architecture. For  $T_i$ measurement, setups in [21] and [22] use  $T_{case}$  and estimate  $T_j$  from the thermal model and estimated power loss. This method needs periodic recalibration or accurate thermal network tracking to compensate for aging effects. Setup in [24] uses  $R_{ds-on}$  which can drift with device aging. Setup in [20] uses a fiber optic temperature sensor. Although accurate this cannot be applied to discreet packages. The proposed architecture and setup in [28] use a  $V_{sd}$  based  $T_j$  measurement technique which is accurate and aging independent. Data throughput can be defined as the minimum time between two turn-on events of the same switch or string of switches. The independent and parallel structures theoretically have the highest data throughput. For the other architectures, the formulae for data throughput are given in the table. The stages parallel architecture will have the minimum data throughput for 48 devices. Finally, the proposed architecture has extensive on-board data collection capability and very high scalability since every module has its own signal conditioning and processor for data collection. Although setups in [24] and [28] also have on-board data collection capability, scaling may be a challenge given the centralized processing.

#### 2.5 Conclusion

In this chapter, a new independent parallel architecture is proposed for large scale DC power cycling of SiC MOSFETs. The proposed architecture addresses many of the challenges with the previously existing architectures through the use of a modular structure. The detailed design of the fundamental unit called a module and high-level architecture is discussed for the benefit of practicing engineers in the field. Additionally, reconfigurability of the setup through software offers advantages in terms of scalability and operational flexibility. The hardware setup in the chapter is designed for the simultaneous and independent aging of 48 devices. It consists of 6 devices in each of the 8 racks. However, it must be noted that this is not the physical limitation of the architecture and the setup can be easily scaled as necessary. The main contributions and findings of this research work are highlighted below

- 1. The main challenges in the previously proposed architectures were either with practical scalability or independent control of aging conditions. The proposed architecture is highly scalable while allowing accurate independent control of aging conditions.
- 2. For accurate control of  $\Delta T_j$  which crucial in DC power cycling tests  $V_{sd}$  is used as a TSEP. However, its use has certain control challenges. A detailed controlled control algorithm is proposed and experimentally verified in this chapter.
- 3. Apart from the independent architectures, in existing architectures, fault isolation is challenging. It is shown that with the proposed architecture robust fault detection and isolation can be performed. Moreover, the proposed architecture also facilitates the identification of the type of fault.
- 4. Thermal isolation is crucial for the independent aging of multiple devices. The proposed architecture addresses this requirement in the design and the same is verified through CFD simulations.

#### **CHAPTER 3**

# MODEL BASED CLOSED-LOOP JUNCTION TEMPERATURE CONTROL OF SIC MOSFETS IN DC POWER CYCLING FOR ACCURATE RELIABILITY ASSESSMENTS

#### 3.1 Introduction

The long-term reliability of Silicon Carbide (SiC) MOSFETs is of particular interest to practicing power electronics engineers. Extensive reliability testing and development of accurate lifetime models are necessary to allay potential concerns about SiC MOSFETs' reliability and accelerate their adoption [2]. The JEDEC JESD22-A122 document details the power cycling test methodology to assess long-term package reliability of power devices under non-uniform temperature distribution and resulting thermo-mechanical stresses [15]. In applying this test procedure to SiC power devices, it is necessary to consider the implications of fundamental differences in SiC and silicon (Si) material properties. For instance, the investigations in [30] show that SiC MOSFETs maybe relatively more susceptible to gate-open failures, which is an infrequent package failure mode in existing Si devices. Furthermore, the data obtained from DC power cycling tests can be extrapolated using a suitable lifetime prediction model to obtain the expected device lifetime under real application scenarios [10-12]. The Coffin-Manson model described in (3.1) is a typical example of a widely used power cycling lifetime model [10]. Here  $N_f$  is the cycles-to-failure, A is an empirical constant, and  $\Delta T_j$  is the applied junction temperature swing. While the various models proposed in the literature differ in the number and types of input parameters, all of them show a strong correlation between expected  $N_f$  and  $\Delta T_j$ . Therefore, it is evident that the accuracy of long-term lifetime estimation using DC power cycling tests depends on accurate  $\Delta T_j$  control during the test.

$$N_f = A\Delta T_i^{-n} \tag{3.1}$$



Figure 3.1: C-SAM image of an aged device from backside showing die attach solder layer delamination.

In order to control the  $T_j$  profile of the devices-under-test (DUTs) during DC power cycling, it is necessary to accurately sense the DUT's  $T_j$ . Inaccurate  $T_j$  measurements will automatically cause an error in DUT's  $\Delta T_j$ . A DUT's junction temperature  $(T_j)$  can either be measured directly or indirectly. As an example of direct measurement, a fiber optic temperature probe is used to measure the die temperature in SiC power modules in [20]. While this method has the benefit of high measurement accuracy and bandwidth, it is not universally applicable since, in most SiC device packages, the die itself is physically inaccessible.

Alternatively, the junction temperature  $(T_j)$  of a device can be measured indirectly using two different methods. In the first method, the case temperature  $(T_{case})$  of the device is measured using a thermocouple, resistance temperature detector (RTD), or a temperature measurement IC. Given  $T_{case}$  and an estimate for the device power loss,  $T_j$  is estimated using junction to case thermal impedance  $(R_{th-JC})$  values from the manufacturer provided datasheet [22]. However,  $T_j$  estimated using this method may gradually become inaccurate as the device thermal impedance changes due to die attach solder delamination [17]. This is illustrated through Fig. 3.1 where a backside C-SAM (Confocal Scanning Acoustic Microscopy) image of a SiC MOSFET device aged by DC power cycling is shown. The dark areas under the die represent healthy die attach solder. Lighter areas at the corners of the die represent delamination in the die attach solder. Such delamination leads to a gradual increase in  $R_{th-JC}$  which needs to be recalibrated periodically to ensure accurate  $T_j$  estimation. However, periodic recalibration is cumbersome and may significantly slow down the testing process. Moreover, measurement of  $T_{case}$  requires physical attachment of a temperature sensor to the DUT which may result in repeatability issues.

Alternative indirect techniques for  $T_j$  measurement rely on temperature-sensitive electrical parameters (TSEPs). Device on-state resistance  $(R_{ds-on})$ , gate threshold voltage  $(V_{th})$ , and body-diode on-state voltage  $(V_f)$  are common TSEP choices due to their relative ease of measurement [24].  $R_{ds-on}$  of a SiC MOSFET is given by (3.2) where  $R_{ch}$  is the channel resistance,  $R_j$  is the JFET region resistance,  $R_d$  is the drift region resistance,  $R_s$  is the substrate resistance and  $R_{pk}$  is the package resistance [31]. Typically for SiC devices,  $R_{ch}$ and  $R_d$  are dominant and temperature dependent which results in  $R_{ds-on}$ 's sensitivity to  $T_j$ . However, in SiC devices, the device threshold voltage  $(V_{th})$  can change due to aging and this results in a corresponding increase in  $R_{ch}$  [19]. In addition, during DC power cycling, package degradation is also observed which causes an increase in  $R_{pk}$  [17]. Due to the preceding factors, gradual device degradation can result in aging-related  $R_{ds-on}$  change.  $V_{th}$ , in addition to being affected by aging, has lower sensitivity to junction temperature and is also challenging to measure when the device is on [32]. Therefore, while  $R_{ds-on}$  is a relatively better TSEP than  $V_{th}$ , solely relying on  $R_{ds-on}$  for  $T_j$  estimation can lead to aging-related inaccuracy.

$$R_{ds-on} \approx R_{ch} + R_j + R_d + R_s + R_{pk} \tag{3.2}$$

 $V_f$  is a known TSEP for  $T_j$  measurement and has specifically been used in the context of DC power cycling [33]. The general equation for  $V_f$  is given as (3.3). Here,  $V_{pn}$  is the voltage drop across body-diode on junction and varies linearly with temperature.  $R_d$  is series resistance of diode and  $I_{sd}$  is the source to drain current [34]. Since  $R_d$  changes with aging due to package degradation, the effect of aging on  $V_f$  can be eliminated by choosing a small  $I_{sd}$  for measurement. Furthermore, it is important to note that in commercial SiC devices, the relationship given by (3.3) is generally true at negative gate voltages only. Due to body-effect, the channel in SiC MOSEFTs can conduct in the reverse direction even at zero gate voltage [35]. At negative gate voltages, the channel is fully off and the effect of gate oxide degradation on  $V_f$  is negligible. In addition to the preceding characteristics,  $V_f$  also has good sensitivity to  $T_j$ . Therefore,  $V_f$  at negative gate voltage and small injected current is a reliable aging independent TSEP for SiC MOSFETs.

$$V_f \approx V_{pn} + R_{pk} I_{sd} \tag{3.3}$$

It is evident from the preceding discussion that  $R_{ds-on}$  is relatively easy to measure when the device is on but is prone to shifting with device aging. On the other hand,  $V_f$  at small source-drain current and the negative gate voltage is largely aging independent but can only be measured when the device is off. Therefore, theoretically,  $R_{ds-on}$  and  $V_f$  information can be combined to achieve accurate aging-independent junction temperature swing control during DC power cycling. The objective of this research work is to address this very problem. Specifically, procedures for  $T_j$  estimation using  $R_{ds-on}$  and  $V_f$  as TSEPs are described in detail. Practical implementation of  $R_{ds-on}$  based  $T_j$  estimation is particularly challenging given its dependence on both  $I_d$  and  $T_j$  and its relatively low sensitivity. The proposed method is based on a look-up-table of solution parameters for a current dependent quadratic function. Further, a Kalman filter is used to obtain an accurate  $T_j$  estimate from the noisy  $R_{ds-on}$ based measurements. The proposed technique is memory and computationally efficient while ensuring high  $T_j$  estimation accuracy. Further, given an accurate estimate for the device  $T_j$ , it is important to precisely control the device  $T_j$  profile the specified limits. In this chapter, a proportional-integral controller is used for closed-loop  $T_j$  control. The controller reference is generated online to control the ramp time and peak  $T_j$  in each cycle. Moreover,  $V_f$  based  $T_j$  estimate obtained immediately after the end of the heating cycle is used to compensate for errors in the maximum  $T_j$  reached by the device. This enables online compensation for aging-related changes in  $R_{ds-on}$  based  $T_j$  estimate.

#### **3.2** Closed-loop $T_j$ Control

In this section, the proposed closed-loop  $T_j$  control technique is presented in detail. However, prior to that, it is necessary to understand the architecture of the DC power cycling setup used in this study. This architecture is derived from the test bench design presented in [36]. The fundamental unit of the test bench is a module. The schematic of the module is shown in Fig. 3.2. Each module can age one DUT and features a dedicated microcontroller that sets the gate signals, makes on-board measurements, and communicates with an external controller over I<sup>2</sup>C to receive test parameters and send out the measurement data. The module also has dedicated adjustable gate-drive power supplies,  $I_{sd}$  injection circuit, and on-board measurement circuits to measure DUT on-state voltage  $V_{ds}$ , body-diode forward voltage drop  $V_f$  and drain current  $I_d$  during the test. A picture of the assembled module is shown in Fig. 2.8. The module is connected to an external power supply board as shown in the high-level schematic of the test bench in Fig. 3.2. The external power supply board precisely controls the current injected in the module's DUT through a MOSFET-based discrete linear current regulator stage. A bypass switch on the power supply board provides an alternate path for the injected current when the DUT is off. This is useful to ensure a smooth current waveform at DUT turn-on and turn-off. The reference value to the linear current regulation circuit is provided by a high-precision 12-bit digital-analog converter (DAC) whose output value is set through I<sup>2</sup>C commands. This architecture is practical and enables easy scalability along with the ability to precisely control the  $T_j$  of individual DUTs independently.



Figure 3.2: High-level architecture of the DC power cycling test bench used in the study.

The high-level schematic of the proposed  $T_j$  control implementation is shown in Fig. 3.3. The end-to-end solution involves several critical steps, the details of which are discussed further. The module is connected to an external power supply board as shown in the high-level schematic of the test bench in Fig. 3.2. The external power supply board precisely controls the current injected in the module's DUT through a MOSFET based discrete linear current regulator stage. A bypass switch on the power supply board provides an alternate path for the injected current when the DUT is off. This is useful to ensure a smooth current waveform at DUT turn-on and turn-off. The reference value to the linear current regulation circuit is provided by a high precision 12-bit digital-analog converter (DAC) whose output value is set through I<sup>2</sup>C commands. This architecture is practical and enables easy scalability along with the ability to precisely control the  $T_j$  of individual DUTs independently.

The high-level schematic of the proposed  $T_j$  control implementation is shown in Fig. 3.3. The end-to-end solution involves several critical steps, the details of which are discussed further.



Figure 3.3: High-level control block diagram.

#### 3.2.1 On-Board Junction Temperature Estimation

The first step in achieving precise control of  $T_j$  profile is obtaining accurate  $T_j$  feedback. When the DUT is conducting and heating up due to ohmic loss, an estimate of its junction temperature is obtained by using  $R_{ds-on}$  as a TSEP.  $R_{ds-on}$  is calculated by dividing the  $V_{ds}$ and  $I_d$  values measured using on-board sensors. Since the calculation is sensitive to even small measurement noise, the raw  $I_d$  and  $V_{ds}$  measurements are passed through 30 point moving average filters. The delay introduced by the filters does not significantly impact the measurement and control processes as DUT heating is relatively slow.

Estimating  $T_j$  using the calculated  $R_{ds-on}$  is a non-trivial problem. In particular, the code and memory efficiency of the  $R_{ds-on} \to T_j$  mapping algorithm is critical for it to be implemented on the on-board microcontroller. As shown in Fig. 3.4,  $R_{ds-on}$  is a non-linear function of both  $T_j$  and  $I_d$ . A straightforward solution to the mapping problem is to generate a 2D look-up table (LUT) offline and estimate  $T_j$  online based on the measured  $R_{ds-on}$ and  $I_d$  values using bi-linear interpolation. However, depending on the expected range of current and temperature values, this technique is memory inefficient as it requires a large-up look table to be stored. Moreover, using bi-linear interpolation may cause significant errors, especially at low  $T_j$  values where  $R_{ds-on}$ 's sensitivity to  $T_j$  variation (measured in m $\Omega/^{\circ}$ C) is



Figure 3.4: SiC MOSFET's  $R_{ds-on}$  dependence on  $I_d$  and  $T_j$ .



Figure 3.5:  $R_{ds-on}$ 's sensitivity to  $T_j$  change at different values of drain current  $I_d$ .

relatively low as shown in Fig. 3.5. Instead, in the proposed implementation, the typically parabolic nature of  $R_{ds-on}$  vs  $T_j$  curve is exploited. As shown in Fig. 3.6, for a known  $I_d$ , the DUT's on-state resistance follows the relation

$$R_{ds-on}(T_j) = a_r T_j^2 + b_r T_j + c_r (3.4)$$

where  $a_r, b_r, c_r$  are constants which can be easily computed offline by fitting  $R_{ds-on}$  values obtained at different  $T_j$  conditions. However, during DC power cycling, it is necessary to solve the inverse problem i.e. given an  $R_{ds-on}$  value, the DUT's  $T_j$  needs to be estimated.



Figure 3.6: Fitted  $R_{ds-on}$  vs  $T_j$  curves. Nodes represent experimental data.

For this, the feasible solution for quadratic equation in (3.4) is computed as

$$T_j = \left(\frac{-b_r}{2a_r}\right) + \sqrt{\left(\frac{b_r^2 - 4a_rc_r}{a_r^2}\right) + \left(\frac{1}{a_r}\right)R_{ds-on}}.$$
(3.5)

Equation (3.5) is represented in a form such that the three parameters  $\left(\frac{-b_r}{2a_r}\right), \left(\frac{b_r^2-4a_rc_r}{a_r^2}\right), \left(\frac{1}{a_r}\right)$ can be computed offline to reduce the computational burden. These parameters are calculated offline at different  $I_d$  values to create a LUT. As shown in Fig. 3.3, during testing, the filtered  $I_d^f$  value is used to "look-up" the corresponding parameter values, and (3.5) is used to estimate the  $T_j$ . It is important to note that since the function parameters do not vary significantly for adjacent  $I_d$  values, interpolation is not necessary to ensure accurate  $T_j$  estimation. However, even with the quadratic solution based technique,  $T_j$  estimation is noisy due to the relatively low sensitivity. Therefore, in the proposed solution, a Kalman filter is used to reduce the  $T_j$  estimation error. To design an effective Kalman filter, the DUT needs to be modeled accurately as discussed in the next section.

#### 3.2.2 Device Thermal Model

A lumped thermal equivalent model of a discrete MOSFET is shown in Fig. 3.7. In the equivalent model, the voltage across  $C_d$ ,  $R_{dc}$  corresponds to the junction to case thermal



Figure 3.7: A lumped equivalent thermal model of a discrete MOSFET.

resistance,  $C_c$  is the case thermal capacitance,  $R_{ca}$  is case-ambient thermal resistance, and the voltage across  $C_c$  represents the case temperature. While higher-order distributed Foster and Cauer thermal networks maybe used for better accuracy, for the given application, the approximate lumped model is reasonably accurate.  $P_{loss}$  is a current source representing the power loss in the die. It is important to note that the model represents temperature rise above the ambient reference.

The state-space form of the equivalent thermal model is derived by choosing the two capacitor voltages, which represent the junction temperature,  $T_j$ , and case temperature,  $T_c$ respectively, as the state variables.  $P_{loss}$  is the input variable. KCL applied to node 1 gives

$$P_{loss} = C_d \frac{dT_j}{dt} + \frac{T_j - T_c}{R_{dc}}.$$
(3.6)

Similarly, KCL applied to node 2 gives

$$\frac{T_j - T_c}{R_{dc}} = C_c \frac{dT_j}{dt} + \frac{T_c}{R_{ca}}.$$
(3.7)

Rearranging (3.6) and (3.7) gives

$$\frac{dT_j}{dt} = \frac{-1}{R_{dc}C_d}T_j + \frac{1}{R_{dc}C_d}T_c + \frac{1}{C_d}P_{loss},$$
(3.8)

$$\frac{dT_c}{dt} = \frac{1}{R_{dc}C_c}T_j - \left(\frac{1}{R_{dc}C_c} + \frac{1}{R_{ca}C_c}\right)T_c.$$
(3.9)

Equations (3.8) and (3.9) are arranged in matrix form as

$$\begin{bmatrix} T_j' \\ T_c' \end{bmatrix} = \begin{bmatrix} \frac{-1}{R_{dc}C_d} & \frac{1}{R_{dc}C_d} \\ \frac{1}{R_{dc}C_c} & -(\frac{1}{R_{dc}C_c} + \frac{1}{R_{ca}C_c}) \end{bmatrix} \begin{bmatrix} T_j \\ T_c \end{bmatrix} + \begin{bmatrix} \frac{1}{C_d} \\ 0 \end{bmatrix} \begin{bmatrix} P_{loss} \end{bmatrix}.$$
(3.10)

Equation (3.10) is in continuous time state-space form given as  $\mathbf{x}' = A\mathbf{x} + B\mathbf{u}$ , where

$$\mathbf{x} = \begin{vmatrix} T_j \\ T_c \end{vmatrix} \tag{3.11}$$

$$A = \begin{bmatrix} \frac{-1}{R_{dc}C_d} & \frac{1}{R_{dc}C_d} \\ \frac{1}{R_{dc}C_c} & -(\frac{1}{R_{dc}C_c} + \frac{1}{R_{ca}C_c}) \end{bmatrix}$$
(3.12)

$$B = \begin{vmatrix} \frac{1}{C_d} \\ 0 \end{vmatrix}$$
(3.13)

$$\mathbf{u} = \begin{bmatrix} P_{loss} \end{bmatrix} \tag{3.14}$$

Since only the junction temperature  $T_j$  is estimated and  $T_c$  is not directly measured,  $T_j$  is considered as the only output. Therefore, the output equation is given as  $\mathbf{y} = C\mathbf{x}$ , where  $C = [1 \ 0].$ 

In order to use the model, variables  $C_d$ ,  $R_{dc}$ ,  $C_c$ ,  $R_{ca}$  need to be known. Typically, the junction to case thermal resistance,  $R_{dc}$ , and case to ambient thermal resistance,  $R_{ca}$  are readily available in the manufacturer's datasheet.  $C_d$  is representative of the SiC die's thermal capacity, which is typically orders of magnitude smaller than  $C_c$  which represents the case's thermal capacity. It is possible to estimate  $C_d$  from the transient thermal impedance curves provided in the manufacturer's datasheet.

### 3.2.3 Kalman Filter

Given the system model, a Kalman filter can be used to eliminate measurement noise. The recursive nature of the filter makes it memory efficient and easy to implement. The Kalman filter uses a prediction step to estimate the expected state and covariance matrices for a system with known noise parameters. In this study, the discretized form of the previously derived model is in the filter. The state prediction and covariance prediction equations are given as

$$\mathbf{x}_{k|k-1} = A_d \mathbf{x}_{k-1|k-1} + B_d \mathbf{u}_k \tag{3.15}$$

$$P_{k|k-1} = A_d P_{k-1|k-1} A_d^T + Q (3.16)$$

where  $A_d$  and  $B_d$  are the discretized state transition matrix A and input matrix B respectively.  $P_{2\times 2}$  is the state covariance matrix which represents the uncertainty in the estimated state.  $Q_{2\times 2}$  is the process noise matrix that corresponds to modeling uncertainty. The optimal Kalman gain which is necessary for the update step is calculated from the equation-

$$K_k = P_{k|k-1}H^T (HP_{k|k-1}H^T + R)^{-1}$$
(3.17)

where H is the state observation matrix  $H = C = [1 \ 0]$  in this case. R is the measurement noise covariance. Using the Kalman gain, the predicted state and covariance matrices are updated using the equations

$$\mathbf{x}_{k|k} = \mathbf{x}_{k|k-1} + K_k(y_k - H\mathbf{x}_{k|k-1})$$
(3.18)

$$P_{k|k} = (I - K_k H) P_{k|k-1} \tag{3.19}$$

where  $y_k$  is the actual measurement including the noise. It is important to select the process and measurement noise covariances carefully for the filter to perform effectively. In this study, since there is high confidence in the model, the maximum standard deviation is assumed to be 3°C which gives a standard deviation of 9. Therefore,

$$Q = \begin{bmatrix} 9 & 0 \\ 0 & 9 \end{bmatrix}$$
(3.20)

and the measurement is relatively more inaccurate and measurement error is assumed to have a deviation of 10°C which makes R = 100.

#### **3.2.4** $T_j$ Profile Control

The primary objective of the proposed  $T_j$  profile control method is to actively set the ramp rate to  $T_{j-max}$  and dwell time at  $T_{j-max}$ . As shown in Fig. 3.3, user specified  $T_{j-max}$  reference, ramp rate, and dwell time settings are used to generate a corresponding reference  $T_j$  profile. The estimated  $T_j$  feedback value obtained from the previously discussed Kalman filter is compared against the  $T_j$  reference signal and the error is passed to a PI controller. From (3.10), it is evident that  $P_{loss}$  is the input variable. Therefore, the PI controller outputs the power loss reference,  $P_{ref}$  for the DUT. Since the power loss in the DUT is controlled by changing the injected current,  $I_d$ ,  $P_{ref}$  is divided by the filtered  $V_{ds}$  value to obtain the reference injected current value,  $I_{ref}$ . This reference is sent to the DAC on the power supply board to change the current reference to the linear regulator stage.

The continuous time  $\frac{T_j(s)}{P_{loss}(s)}$  is obtained ny solving the following equation.

$$T_j = P_{loss}\left(\frac{1}{sC_d} || \left( R_{dc} + \left( R_{ca} || \frac{1}{sC_c} \right) \right) \right)$$
(3.21)

Using this transfer function, the PI controller is tuned to obtain a critically damped response. The obtained continuous time controller is discretized for implementation on the on-board microcontroller.

#### 3.2.5 Aging Correction

As previously discussed,  $R_{ds-on}$  is prone to aging-related changes. Therefore, as the device degrades, using  $R_{ds-on}$  estimate obtained from the baseline characterization data would cause the temperature profile to drift away from the expected profile. To compensate for the aging-related  $R_{dson}$  shift, the user-provided temperature reference is modified over time. The body-diode forward voltage drop,  $V_f$  at small currents and the negative gate voltage is largely aging independent. Therefore, at the end of the heating cycle, as soon as the DUT is



Figure 3.8: Fitted  $V_{sd}$  vs  $T_j$  curve at  $I_{sd} = 300 mA$ . Markers represent experimental data.

turned off, a small current (300mA) is injected into the DUTs body diode and  $V_f$  is measured using the same  $V_{ds}$  measurement circuit as earlier. As shown in Fig. 3.3,  $V_f \rightarrow T_j$  mapping is also calculated using the parametric solution of a quadratic equation. Although  $V_f$  vs  $T_j$  is relatively linear, due to its higher order, the solution of the quadratic has higher accuracy. The experimentally obtained  $V_f$  vs  $T_j$  is shown in 3.8.

#### 3.3 Experimental Results

The proposed closed-loop  $T_j$  profile control algorithm is experimentally validated using the previously presented DC power cycling setup. The experimental setup is presented in shown in Fig. 3.9. A 3D-printed enclosure is designed for the module. The enclosure constrains the cooling airflow while providing a window for the infrared(IR) thermal camera (FLIR A655SS). To precisely measure the actual junction temperature of the MOSFET, a device is decapsulated carefully to expose the die as shown in Fig. 3.10. An infrared image of the exposed die is shown in Fig. 3.11. Since the device die typically has some temperature gradient, the die temperature is calculated by averaging the temperature information for all the pixels in the die's image.



Figure 3.9: Experimental setup used to verify the proposed  $T_j$  control algorithm.



Figure 3.10: Picture of decapsulated device under test. The die is painted black to improve IR emmissivity.



Figure 3.11: Infrared image of the decapsulated device die.

## 3.3.1 Verification of Closed-loop $\Delta T_j$ Control Algorithm

The experimentally obtained  $T_j$  profile is shown in Fig. 3.12. The rise time is set to 60s. As mentioned previously, the proposed closed-loop control algorithm controls the junction temperature rise,  $\Delta T_j$ . Therefore,  $T_{max}$  is set to a 75°C rise above the ambient temperature. For the shown experiment, the initial ambient temperature is ~ 75°C. The dwell time at the maximum junction temperature is set to 100s. The set dwell time is relatively long and is uncommon in actual DC power cycling tests. However, the longer dwell time enables the verification of shift in the mean  $T_j$  during the dwell time duration. As seen, the drift in the mean  $T_j$  during the dwell time is limited to ~ 3°C. Moreover, as per the JEDEC JESD22-A122 standard, the actual junction temperature must be within  $\pm 5°C$  if the reference temperatures. This is verified to be the case in Fig. 3.12. It must also be noted that the  $T_j$  is


Figure 3.12: Experimental  $T_j$  waveform.

controlled during the heating interval and the cooling fan is fully turned on during the cooling time. Therefore, in this test, the lower temperature limit is set to the ambient temperature.

# 3.4 Conclusion

In this chapter, first, the challenges in accurate  $T_j$  estimation during DC power cycling with the use of existing known precursors are discussed. Through a qualitative comparison with other techniques, it is shown that  $R_{ds-on}$  and  $V_f$  respectively have certain desirable characteristics as a temperature-sensitive electrical parameters. However, since  $R_{ds-on}$  is prone to aging-related shift and  $V_f$  cannot be measured when the DUT is on, closed loop  $\Delta T_j$ control using only one of the TSEPs is challenging. In this context, an aging independent technique for closed-loop  $\Delta T_j$  control is presented. The proposed algorithm is experimentally verified. Specifically,  $R_{ds-on}$  is used for  $T_j$  estimation and control during the heating interval and at the end of the heating interval,  $V_f$  is used to validate the actual value of  $T_{j-max}$  reached and adjust it to account for aging related  $R_{ds-on}$  shifts. While in this work, the cooling profile of the device is not actively controlled, it is possible to implement such a control by by using a variable speed cooling fan.

### **CHAPTER 4**

# INVESTIGATION AND ON-BOARD DETECTION OF GATE-OPEN FAILURE IN SIC MOSFETS

### 4.1 Introduction

Silicon Carbide (SiC) power MOSFETs are expected to enable a significant improvement in the efficiency of power converters across different application areas [2]. However, comprehensively understanding and improving their reliability remains an ongoing challenge [37–39]. To this end, standard accelerated aging tests are often used to proactively test long-term device reliability within a short duration. Among the standard tests, DC power cycling is widely used to accelerate package related aging mechanisms in power MOSFETs [15, 40]. Bond-wire heel cracking, bond-wire liftoff and die-attach solder layer delamination are the common failure modes observed in this test [17, 24, 33, 41, 42]. In addition to the above modes, power MOSFETs could also fail due to gate bond-wire liftoff or cracking leading to a gate-open failure [43]. The consequent loss of gate control can lead to an unwanted drain to source conduction, a large increase in threshold voltage or open circuit failure of the device [44]. This mode of failure, however, is relatively uncommon in silicon (Si) MOSFETs and IGBTs and has not been studied widely in the literature. Like Si MOSFETs and IGBTs, commercial discrete SiC MOSFETs are typically available in TO-247-3, TO-247-4 and TO-263-7 packages [45, 46]. However, SiC MOSFETs generally have a much smaller die and fundamentally different material properties [47]. Therefore, package related failure modes in Si devices cannot be assumed to apply similarly to SiC devices. In particular, relatively thinner and longer gate bond wires due to smaller die and die placement can potentially increase SiC devices' susceptibility to gate-open failures. Moreover, the properties of the epoxy mold compound (EMC) material used in SiC MOSFETs need to be different to enable operation at higher temperatures [48,49]. Therefore, it is crucial to study gate-open failure mode specifically in the context of SiC MOSFETs.



Figure 4.1: Illustration of gate-open failure in discrete SiC MOSFET due to a) a heel crack b) bond-wire liftoff.

Gate-open failures in discrete devices are often intermittent in nature. In a typical discrete SiC MOSFET, the die and gate bond-wire are encapsulated in an epoxy mold compound (EMC) as shown in Fig. 4.1. In the case of gate bond-wire liftoff, the EMC may hold the bond-wire to the pad and cause the contact to exist [50]. However, during device operation, the relative displacement of various components in the package due to thermal changes can lead to intermittent gate contact. The device functions normally except during brief instances of loss of gate contact. Therefore, the intermittency of gate-open faults makes them very challenging to detect reliably. Given their elusive nature, comprehensive failures during DC power cycling tests can lead to incorrect device lifetime estimation [12]. Moreover, in certain converter topologies, temporary disturbances caused by intermittent gate-open failures can be compensated by the control loop and potentially go undetected for a long time. For example, in synchronous converters, if a gate-open failure of the synchronous switch prevents it from turning on, the switch's body-diode starts conducting. Therefore, except for a decrease in its efficiency, the converter appears to operate nominally.

To address the above challenges, the first goal of this research work is to investigate the occurrence of gate-open failures in discrete SiC MOSFETs. To reliably detect gate-open failure during DC power cycling or converter operation, it is important to first understand the electrical behavior of a SiC MOSFET under all possible gate-open failure scenarios. Therefore, the state of the device's gate and channel under gate-open faults is comprehensively analyzed through SPICE simulations and analytical modeling. Further, the devices under test (DUTs) are aged using DC power cycling test. An on-board characterization technique is presented to detect gate-open failures during DC power cycling. Gate-open failure is detected in four of the DUTs. In order to verify the occurrence of gate-open failure in the failed devices. first, non-destructive acoustic microscopy analysis is performed to identify the damaged sites. Thereafter, the failed devices are carefully decapsulated and inspected through optical microscopy and scanning electron microscopy (SEM). To understand the mechanism behind gate-open failures, a thermo-mechanical finite element analysis (FEA) is performed on a high-fidelity model of the DUT. It is shown that deformation caused by the coefficient of thermal expansion (CTE) mismatch between various elements of the package causes interfacial shear stress in the gate bond. The stress is concentrated at the interface causing the gate bond wire to shear off. The simulations are repeated for two different properties of the EMC in order to analyze the impact of EMC's CTE on the gate bond stress. In addition to investigating gate-open failure, this article also proposes a robust on-board technique for cycle-by-cycle detection of gate-open failures. The failure detection circuit and logic are presented in detail. Through experimental verification, it is shown that the proposed technique can detect gate-open failure in as low as 150 ns. This enables the prevention of potentially catastrophic shoot-through events in a conduction type gate-open failure scenario. Furthermore, the proposed technique can reliably detect gate-open failures in third-quadrant operation which is not covered by conventional protection techniques.



Figure 4.2: SPICE simulation circuit for analysis of MOSFET's behavior under gate-open fault.

#### 4.2 Gate-Open Failure Analysis and On-Board Characterization

Given the challenges in capturing intermittent gate-open failures, it is important to first understand the electrical behavior of a SiC MOSFET under gate-open failure.

#### 4.2.1 MOSFET's Behaviour Under Various Gate-Open Failure Scenarios

For analyzing the electrical behavior of a SiC MOSFET under gate-open failure, the circuit shown in Fig. 4.2 is simulated in LTspice. Manufacturer provided SPICE model is used for the DUT, U1. A gate-open fault is simulated by connecting an ideal switch S1 in the gate path of U1. The timing of the S1's opening is changed to simulate different gate-open failure scenarios. Furthermore, switches S3 and S4 are used to change the operational quadrant of the DUT. Specifically, when S3 is closed and S4 is open, U1 operates in the first quadrant (Q1) during its on interval. Similarly, when S4 is closed and S3 is open, U1 acts as the synchronous free-wheeling switch during its on interval and thus operates in the third quadrant (Q3).



Figure 4.3: Parasitic capacitance in a MOSFET from device datasheet [1].



Figure 4.4:  $C_{gd}$  vs  $V_{dg}$  from device datasheet [1]-fix font size.

Before discussing the simulation results, analytical expressions for the DUT gate voltage under fault  $(V_{gs}^f)$  are derived. Figure 4.3 shows the electrical model of the DUT with parasitic capacitances. In case of a gate-open fault, the gate is electrically isolated and floating. Consequently, the charge on  $C_{gd}$  and  $C_{gs}$  is conserved. If the DUT's drain-source voltage after fault  $(V_{ds}^f)$  is different from before fault  $(V_{ds}^{pf})$ , the voltage across  $C_{gs}$  and  $C_{gd}$  changes correspondingly as given by (4.1) - (4.4).

However, the charge on  $C_{gd}$  and  $C_{gs}$  changes by the same amount  $(\Delta Q_f)$  since it is conserved. DUT's  $V_{gs}^f$  under fault, in this case, can be obtained using (4.5) where the relation between  $\Delta Q_f$  and  $V_{dg}$  is given by (4.6) since  $C_{gd}$ , unlike  $C_{gs}$ , is non-linear and a function of  $V_{gd}$  as shown in Fig 4.4. The value of the integral can be obtained by calculating the corresponding area under the  $C_{gd}$  vs  $V_{gd}$  curve obtained from the manufacturer's datasheet. In cases where  $V_{dg}$  is large or  $V_{dg} \leq 0$ ,  $C_{gd}$  is nearly constant and can be approximated by (4.7) [32]. (4.6) then reduces to (4.8) and using (4.9), post fault  $V_{gs}^{f}$  is given by (4.10). Moreover, when  $V_{dg}$  is large, usually  $C_{gd} \ll C_{gs}$ . Therefore, (4.10) can be further approximated to (4.11). These equations are used in conjunction with the SPICE simulation results to understand the MOSFET's behavior under various gate-open fault scenarios as discussed further.

$$\Delta V_{ds} = V_{ds}^f - V_{ds}^{pf} \tag{4.1}$$

$$\Delta V_{dg} = -(V_{gd}^f - V_{gd}^{pf}) \tag{4.2}$$

$$\Delta V_{gs} = V_{gs}^f - V_{gs}^{pf} = \frac{\Delta Q_f}{C_{gs}} \tag{4.3}$$

$$\Delta V_{ds} = \Delta V_{dg} + \Delta V_{gs} \tag{4.4}$$

$$V_{gs}^f = V_{gs}^{pf} + \frac{\Delta Q_f}{C_{gs}} \tag{4.5}$$

where,

$$\Delta Q_f = \int_{V_{dg}^{p_f}}^{V_{dg}^f} C_{gd}(v_{dg}) dv \tag{4.6}$$

If  $V_{dg}$  is large or  $V_{dg} \leq 0$ ,  $C_{gd}$  is almost constant. Then,

$$C_{gd} = C_{gd}(V_{dg}^{pf}) = C_{gd}(V_{dg}^{f})$$
(4.7)

$$\Delta Q_f = C_{gd} \Delta V_{dg} \tag{4.8}$$

$$\Delta V_{ds} = \Delta Q_f \left( \frac{1}{C_{gd}} + \frac{1}{C_{gs}} \right) \tag{4.9}$$

$$V_{gs}^f = V_{gs}^{pf} + \left(\frac{C_{gd}}{C_{gd} + C_{gs}}\right) \Delta V_{ds}$$

$$\tag{4.10}$$

If  $V_{dg}$  is large,  $C_{gd} \ll C_{gs}$ . Therefore,  $V^f_{gs}$  can be approximated to

$$V_{gs}^f = V_{gs}^{pf} + \left(\frac{C_{gd}}{C_{gs}}\right) \Delta V_{ds} \tag{4.11}$$



Figure 4.5: Simulation waveforms under conduction fault.

# Case 1 - Conduction Fault

Firstly, gate-open failure event can occur when the DUT is on. As shown in Fig. 4.5, this is mimicked by opening S1 when DUT is on. In the subsequent off period when gate driver voltage,  $V_g^{GD}$  is low, the DUT's actual gate-voltage,  $V_g^{actual}$  remains high. During this period, the shoot-through current in U1 starts rising as soon as U2 turns on. As shown, this causes  $V_{ds}$  and thus  $V_{dg}$  to increase. From (4.5) and (4.6), it is evident that  $V_{gs}$  increases further. Due to SiC MOSFET's high transconductance, an increase in  $V_{gs}$  results in a significant decrease in U1's on-state resistance thus preventing a further increase in  $V_{ds}$ . From Fig. 4.5, it is seen that  $V_{ds}^f = 13.54$  V and  $\Delta V_{ds} = 13.455$  V. Since,  $V_{dg}^{pf} < V_{dg}^f < 0$ ,  $C_{gd}$  can be assumed constant such that  $C_{gd} = 1000$  pF and  $C_{gs} = 2900$  pF. By using (4.10) and known  $V_{gs}^{pf} = 15$  V,  $V_{gs}^f$ is calculated as  $V_{gs}^f = 18.45$  V ( $\Delta V_{gs} = 3.45$  V). This is very close to the experimentally observed value of 18.70 V. Therefore, a conduction fault causes the gate voltage of the failed



Figure 4.6: Simulation waveforms under Q1 open fault.

device to increase further and prevents it from turning off. Furthermore, this also causes unequal short-circuit energy dissipation between the high-side and low-side devices. Since the majority of the power is dissipated in the complimentary high-side switch, it may be damaged if the fault is not isolated.

# Case 2a - Open Fault in Q1 Operation

Alternatively, gate-open failure can occur when the U1 is off. The DUT gate voltage remains low in this case even in U1's on interval. Since U2 is off in this interval, the body-diode of U2 turns on to provide a free-wheeling path to inductor current ( $I_{L1}$ ). As shown in Fig. 4.6  $\Delta V_{ds} = 4.13$  V which is equal to the forward voltage drop of U2's body-diode. Since  $V_{ds}^f \approx V_{ds}^{pf} = 100$  V,  $C_{gd}$  can be assumed to be constant at  $C_{gd} = 10$  pF. From (4.11),  $\Delta V_{gs} = 0.015$  V and  $V_{gs}^f = -4.985$  V which is very close to experimentally obtained value of



Figure 4.7: Simulation waveforms under Q3 open fault.

-4.983 V. Therefore, in case of an open type gate-open fault in Q1 operation, the DUT's gate voltage remains nearly constant and the device remains off. It is also seen from Fig. 4.6 that the inductor current is decaying due to U2's body-diode loss. Moreover, as the applied voltage increases  $\Delta V_{gs}$  becomes increasingly insignificant.

#### Case 2b - Open Fault in Q3 Operation

Lastly, an open type gate-open fault may occur when the device is operating in Q3. In such a case, the device experiences a gate-open fault when it is off as shown in Fig. 4.7. When the device is subsequently turned on, the device's channel fails to turn on. However, since the device is operating in Q3, its body-diode starts conducting.  $\Delta V_{ds}$  and  $\Delta V_{dg}$  are negative. Consequently, from (4.5) and (4.6), the DUT's gate voltage decreases further. Since  $V_{gs}^{pf}$  is negative,  $V_{gs}^{f}$  becomes more negative as clearly seen in Fig. 4.7. Obtaining  $V_{gs}^{f}$ , requires the solution of (4.6). In Fig. 4.4,  $V_{dg}^{pf} = 100$  V and  $V_{dg}^{f} = 2.1808$  V. By assuming  $C_{gd}$  to be a piece-wise exponential function in these intervals, the approximate value of  $\Delta Q_f$  is obtained as  $\Delta Q_f = 4149.36$  nC.  $V_{gs}^{f}$  thus calculated from (4.5) is  $V_{gs}^{f} = -6.43$  V. This is in close agreement with the value obtained from simulation i.e.  $V_{gs}^{f} = -6.35$  V. Therefore, it can be safely concluded that under gate-open failure in Q3 operation, the DUT remains off with its gate voltage becoming more negative.

The above analysis proves that under all three fault scenarios, the device's state gets latched when a gate-open fault occurs. Specifically for conduction type and Q3-open type fault scenarios, gate-open failure has a positive feedback effect on the device's gate voltage. This implies that the device's operational state under fault is stable and does not slowly change over time. Similar behavior is also observed for the Q1-open failure scenario. However, in this case, although the fault has a negative feedback effect on gate voltage, the magnitude is negligible. This understanding is essential in developing on-board gate-open failure detection technique. It is important to note, however, that due to its intermittent nature, a device with a particular gate-open fault type may temporarily recover and later show another fault type. For example, unless isolated, a device with open type fault may have temporary re-establishment of gate-contact due to bond-wire movement and then show a conduction fault. The intermittent nature, in particular, makes accurate characterization and detection of gate-open faults extremely challenging.

# 4.2.2 DC Power Cycling Test Methodology

Figure 4.8(a) shows the high-level schematic of the DC power cycling setup used in this study. Each leg of the setup has one DUT and the main switch (MSW) in series. The main switch is used for safe fault detection and isolation. Multiple legs are connected in parallel across the main power supply. One leg is on at any given time, heating up the corresponding DUT. As shown in Fig. 4.8(b), when the DUT reaches its maximum junction temperature



Figure 4.8: a) DC power cycling schematic b) typical testing cycle.

| DUT No. | $T_j$ Swing                  | $T_j$ Mean       | $\Delta T_j$   | Cycles to Failure $(N_f)$ |
|---------|------------------------------|------------------|----------------|---------------------------|
| DUT1-A  | $55^{\circ}C - 150^{\circ}C$ | $102.5^{\circ}C$ | $95^{\circ}C$  | 6000                      |
| DUT1-B  | $55^{\circ}C - 150^{\circ}C$ | $102.5^{\circ}C$ | $95^{\circ}C$  | 7200                      |
| DUT2-A  | $35^{\circ}C - 150^{\circ}C$ | $92.5^{\circ}C$  | $115^{\circ}C$ | 8000                      |
| DUT2-B  | $35^{\circ}C - 150^{\circ}C$ | $92.5^{\circ}C$  | $115^{\circ}C$ | 7800                      |

Table 4.1: DC Power Cycling Test Results

 $(T_{j-max})$ , it is turned off for cooling and the next DUT is turned on. The cycles are repeated till the device fails. One of the fundamental advantages of the proposed setup is that allows independent control of  $\Delta T_j$  and the testing conditions of each DUT while enabling scalable testing.

In this study, a total of 8 devices, in two groups of 4 are tested under two different  $\Delta T_j$  conditions. Of these, 2 devices in each batch are detected with gate-open failure as shown in Table 4.1. The devices used are 1000 V, 22 A SiC MOSFET in the TO-247-4 package is selected for this study. The table also shows the recorded cycles to failure  $(N_f)$  corresponding to each of the devices. Here, failure is defined as the first detection of gate-open fault. Comprehensive failure analysis of these devices is discussed further.

#### 4.2.3 **On-Board Failure Characterization**

#### **On-Board Failure Characterization Technique**

Based on the understanding of the electrical behavior of devices with intermittent gate-open failure, an on-board failure characterization technique is proposed. Figure 4.9 shows the operation of the DC power cycling setup. During the heating period, as shown in Fig. 4.9(a), both MSW and the DUT are on. Therefore, the DUT drain current,  $I_{d-DUT} > 0$ . However, during DUT cooling period, the DUT and MSW are turned on alternatively during intervals labeled as  $T'_n$ (Fig. 4.9(b)) and  $T_n$  (Fig. 4.9(c)). In this study,  $T'_n \approx 20$  ms and  $T_n \approx 2$  ms are selected. This process is repeated throughout the DUT cooling period. For a healthy DUT,  $I_{d-DUT} = 0$  during  $T'_n$  and  $T_n$ .

In case of a conduction type gate-open fault, since the DUT fails to turn-off,  $I_{d-DUT} > 0$ during the interval  $T_n$  as shown in Fig. 4.9(d). The on-board controller of the DC power cycling test bench detects this current and identifies the fault.  $T'_n$  interval is necessary to charge the DUT gate to check for intermittent failure. Specifically, the proposed technique verifies the gate function of the DUT by repeatedly charging and discharging the DUT gate. In case the gate contact is temporarily lost, the DUT gate fails to discharge and shows a conduction fault. The MSW isolates the DUT during the testing process. However, in this setup, the DUT can be checked for open fault only at the beginning of the heating period as shown in Fig. 4.9(e). If  $I_{d-DUT} = 0$  when both DUT and MSW are turned on, it implies the DUT has an open fault. Moreover, since the DUT does not operate in Q3 in DC power cycling, it is not checked for. Failure conditions for on-board fault characterization of gate-open faults are summarized in Table 4.2.



Figure 4.9: Operation of a single leg of the DC power cycling test setup during a) heating interval for a healthy DUT b) cooling  $T'_n$  interval for a healthy DUT c) cooling  $T_n$  interval for a healthy DUT d) cooling  $T_n$  interval for a DUT showing OFF fault e) heating interval for a DUT with ON fault.

Table 4.2: Failure Conditions for On-board Gate-open Fault Characterization

| Interval        | MSW<br>Status | DUT<br>GD Status | Expected<br>Condition | Observed<br>Condition | Fault<br>Type       |
|-----------------|---------------|------------------|-----------------------|-----------------------|---------------------|
| Cooling $(T_n)$ | On            | Off              | $I_d = 0$             | $I_d > 0$             | Q1 Conduction Fault |
| Heating         | On            | On               | $I_d > 0$             | $I_d = 0$             | Q1 Open Fault       |



Figure 4.10: On-board characterization result for DUT showing intermittent OFF fault (Case 1).



Figure 4.11: On-board characterization result for DUT showing intermittent OFF fault (Case 2).

# **Results of On-board Characterization**

The result from on-board characterization study of DUT 1-A experiencing intermittent gate-open failure during DC power cycling is shown in Fig. 4.10. As observed, during intervals  $T_1$  and  $T_2$  when DUT is off and MSW is on, a current is flowing through the DUT indicating

a conduction fault. It must be noted, however, that the magnitude of current in the interval  $T_1$  is ~ 0.5 A whereas in  $T_2$  it is ~ 4 A. In fact, the current probe setting for detecting the small current in  $T_1$  leads to probe saturation at high current during  $T_2$ , making the current peak unclear. It implies that drain to source impedance during  $T_1$  is high whereas the channel is fully open during  $T_2$ . Also, no current is observed before the interval  $T_1$ . This clearly shows that the gate-open failure in this case is intermittent in nature. Moreover, the significant decrease in drain-source resistance during  $T_2$  compared to  $T_1$  could be because of a temporary gate contact during  $T'_2$  when the DUT is on. Fig. 4.11 shows the scope result of another on-board characterization study of a failed DUT. In this case, conduction fault is observed in the interval  $T_1$  prior to which the device appears to be healthy, again highlighting the intermittency of the fault. During  $T_1$ , the current through the DUT is ~ 3.5 A. It must be noted that the DUT is tested for failure during its cooling interval when another DUT is heating up. Therefore, in case of a conduction fault, approximately half of the main power supply current flows through the failed DUT. Since the main power supply current setting for this test is 7 A, it implies that the failed DUT's channel is fully on. Therefore, it can be concluded that the gate-open failure occurred when the DUT is on during  $T'_1$ . In this case, since the gate is fully charged, a gate-open failure during  $T'_1$  interval leads to DUT staying on during  $T_1$ .

#### 4.3 Detailed Failure Analysis

#### 4.3.1 Non-destructive C-SAM Analysis

After on-board characterization, non-destructive failure analysis is performed on the failed DUTs using confocal scanning acoustic microscopy (C-SAM) to verify the occurrence of gate-open failure mechanism in the failed devices. Figures 4.12(a),4.12(b) show C-SAM images of a healthy device. The red and yellow areas inside the package represent delamination sites.



Figure 4.12: C-SAM images of a) healthy device b) close-up of healthy device die c) DUT 1-A d) DUT 1-B e) DUT 2-A f) DUT 2-B.

It is seen that there is almost no delamination in a healthy device package. The gate bond pad area is indicated in Fig. 4.12(b) which does not show any delamination either. However, the C-SAM images of all the failed DUTs show delamination over the entire die area as seen in Fig. 4.12(c)-4.12(f). The delamination sites indicate that the mold compound above the die has moved relative to the die. The relative motion between the die and mold compound can exert shear forces on the gate bond-wire causing it to liftoff. This also provides a possible explanation for the intermittency of the failure. During DC power cycling, as the device heats up and cools down, the mold compound expands and contracts relative to the die,



Figure 4.13: Optical microscopy images of a) DUT 1-B b) DUT 2-B

thereby moving the gate bond-wire. Therefore, during these intervals, the gate bond-wire can temporarily have instances of sufficient contact with the gate bond-pad on the die, causing the device to function normally.

# 4.3.2 Optical Microscopy

The failed DUTs were carefully decapsulated to verify the gate-open failure hypothesis. Figure 4.13 shows the images of the DUTs 1-B and 2-B obtained using an optical microscope. The devices were inspected under 1000x magnification. As indicated in the top-right corner of each image, the gate-bond wires clearly show clean liftoff from the gate bond-pad. The power source and kelvin source connections, on the other hand, appear normal. This conclusively proves gate-open failure in these devices.

# 4.3.3 Cross-sectioning and SEM Analysis

To further investigate gate-open failure, DUT 1-A was carefully decapsulated and inspected using scanning electron microscopy (SEM). Figure 4.14(a) shows the SEM image of the exposed DUT die. The close-up image of the gate bond site is shown in Figure 4.14(b).



Figure 4.14: a) SEM image of decapsulated DUT die showing gate bond pad b) close-up image of gate bond pad showing gate bond liftoff c) cross-sectional SEM of gate bond clearly showing a clean lift-off d) close-up of gate bond showing a 35  $\mu$ m liftoff height.

It is clearly seen that the gate bond-wire is slightly lifted off the gate bond pad. Further, the device is carefully encapsulated using a low-viscosity epoxy resin. This prevents the movement of gate bond wire due to the flow of the resin during encapsulation. Thereafter, the encapsulated device is smoothly ground parallel to the gate bond wire plane. Further SEM inspection of the device clearly indicates a clean lift-off of the gate bond-wire as shown in Figure 4.14(c). The close-up of the gate bond in Figure 4.14(d) shows that the bond-wire is lifted-off by  $\sim 35 \ \mu$ m. It must be noted that before the investigation, the DUT showed drain-source open failure without any recovery to normal operation. Figure 4.15 shows a



Figure 4.15: Highly magnified cross-sectional SEM image of gate bond.



Figure 4.16: Device model for FEA analysis a) entire model b) with EMC hidden.

highly magnified SEM image of the gate site. The gate bond weld area is indicated in the figure. From the roughness observed on the gate-bond pad under the weld area, it can be deduced that the separation occurs in the bulk of the gate bond-wire. This type of lift-off is typically caused by interfacial shear stress in the gate-bond [51].

# 4.3.4 FEA Analysis of Gate-bond Failure Mechanism

In order to explain the potential mechanism for gate bond-wire liftoff in SiC MOSFETs observed during DC power cycling, a thermo-mechanical FEA analysis is performed in ANSYS,

| Element              | Material             | Density $(kg/m^3)$ | CTE (ppm/ $^{\circ}$ C) | Young's Modulus (GPa) |
|----------------------|----------------------|--------------------|-------------------------|-----------------------|
| Drain tab, Gate lead | Copper               | 8300               | 18                      | 110                   |
| Gate bond-wire       | Aluminium            | 2770               | 23                      | 71                    |
| EMC                  | -                    | 1780               | 10                      | 30                    |
| Die                  | $\operatorname{SiC}$ | 3100               | 2.75                    | 400                   |

Table 4.3: Material Properties Used for FEA Simulation

the results of which are discussed here. First, a high-fidelity model of the DUT is developed as shown in Fig. 4.16. The external dimensions are obtained from the manufacturer's datasheet. Gate bond-wire diameter and aspect ratio are obtained by combining information from C-SAM images, optical microscopy of decapsulated, cross-sectioned devices and SEM images. To simplify the analysis, only the internal gate lead and gate-bond wire are modeled. The source, drain leads and source bond-wires are ignored as they do not have significant thermal or mechanical implications on gate-bond itself. Furthermore, generally known properties for materials like Copper, Aluminium, and SiC are chosen. This gives fairly accurate results since the properties of these materials do not vary widely for the given application. The material properties of EMC, however, are proprietary knowledge and may vary between manufacturers. Therefore, for each property, a representative value from a known range of values for EMC used in power semiconductor applications is chosen [48, 49]. The chosen material properties are listed in Table 4.3.

In the first step, a transient thermal simulation is performed to obtain the device's temperature data at the end of the heating interval. For this, the SiC die is configured as an internal heat source, the value of which is set as the calculated DUT power loss during the DC power cycling test. Since the DUT is only cooled by natural air convection cooling, a convection coefficient is set for the entire external surface of the device. The simulation is run for 50 s and the results obtained are shown in Fig. 4.17. It is seen that the maximum temperature of 152.16°C is in close agreement with experimental data.



Figure 4.17: Temperature distribution across device from transient thermal simulation for a) entire device b) with EMC hidden.



Figure 4.18: Device deformation under heating for a) entire device b) with EMC hidden. Wireframe represents an undeformed device.

The temperature data obtained from transient thermal simulation is used to perform static structural analysis. For structural analysis, two adjacent corners at the bottom of the device are translationally constrained in all three directions. However, the rotational axes are free to allow warping and deformation. This is similar to the condition when device leads are soldered to the PCB. The physical deformation in the device due to heating is shown in Fig. 4.18. The deformation factor is exaggerated to clearly show the warping of the package. The resulting shear stress on the gate wire-bond is seen in Fig. 4.19(a). It is seen that the bond-wire experiences shear stress at the gate bond interface. This is consistent with

bond-wire liftoff mechanism as observed in the analysis of failed devices [52]. In Fig. 4.16(a), the resulting interfacial shear strain is shown. The strain is relatively high in the bond wire since Aluminium, which is the bond-wire material, has a much lower elastic modulus than SiC which is relatively hard. This can result in fatigue occurring in the bulk of the bond wire which is consistent with the observation in Fig. 4.15. The effect of EMC's properties on the shear stress in the gate bond is studied by varying the EMC's CTE as shown in Fig. 4.19(c),4.19(d). The probed maximum shear stress for EMC  $CTE_{EMC} = 10 \text{ ppm/}^{\circ}C$ is 134.12 MPa whereas it is 139.46 MPa for the case when  $CTE_{EMC} = 5 \text{ ppm}/^{\circ}C$ . Since the deformation mainly occurs due to CTE mismatch between EMC and the Copper drain tab whose  $CTE_{Cu} = 18 \text{ ppm/}^{\circ}C$ , the larger CTE mismatch in the case when  $CTE_{EMC} = 5$ ppm/°C causes greater deformation and thus causes greater shear stress in the gate-bond interface. This also shows that the shear stress in the gate bond interface is a function of the EMC property. This is crucial because of two complementary reasons. First, due to the relatively smaller size of the SiC die, the gate-bond wire itself is thinner compared to traditional Si devices. This reduces the overall gate bond strength and critical shear stress becomes lower. Secondly, due to the relatively higher operating temperatures of SiC devices, the thermo-mechanical properties of EMC used for SiC devices are different [46, 49]. Therefore, it is crucial to consider the effect of EMC's properties on device warping and thus the possibility of gate wire-bond liftoff.

# 4.4 On-Board Detection of Gate-open Failure

### 4.4.1 Proposed On-Board Detection Circuit

#### **Detection Circuit**

Figure 4.20 shows the schematic of the proposed gate-open failure detection circuit. The DUT is represented by the low-side MOSFET and S' is the complementary high-side switch.



Figure 4.19: a) Maximum shear stress; b) maximum elastic shear strain at the gate bond site with EMC and drain-tab hidden for  $CTE_{EMC} = 10 \text{ ppm/}^{\circ}C$ ; c) maximum shear stress at gate-bond wire for  $CTE_{EMC} = 10 \text{ ppm/}^{\circ}C$ ; d) maximum shear stress at gate-bond wire for  $CTE_{EMC} = 5 \text{ ppm/}^{\circ}C$ .

The detection circuit comprises two resistor-sensing networks. The first network consisting of resistors  $R_1$ ,  $R_2$ ,  $R_3$ , and  $D_1$  is used to measure the drain-source voltage across the DUT  $(V_{ds-DUT})$ . D1 blocks the high off-state voltage across the device. By choosing appropriate values of  $R_1$ ,  $R_2$ ,  $R_3$ , it can be ensured that the output of the  $V_{ds-DUT}$  sensing network  $(V_{ds-sense})$  is always positive for both positive and negative values of  $V_{ds-DUT}$ . This allows the comparators to be operated with a single-ended supply derived from the gate driver's supply voltage thus making the design simpler.  $R_4$ ,  $R_5$ , and  $R_6$  form the second network that is used to sense the DUT's gate voltage  $(V_{g-DUT})$ . In this case, as well, the appropriate choice of resistor values ensures that the sensed gate voltage  $(V_{gs-sense})$  is positive even for bipolar gate operation as is common in high-power SiC applications [53]. The relation between  $V_{ds-DUT}$ .



Figure 4.20: Schematic of proposed gate failure detection circuit.

 $V_{gs-DUT}$ , and  $V_{ds-sense}$ ,  $V_{gs-sense}$  is given by (4.12)-(4.13) respectively.

$$V_{ds-sense} = \begin{cases} \frac{V_{cc}R_3}{R_1+R_3} & \text{if } V_{ds-DUT} \le \frac{V_{cc}R_3}{R_1+R_3} + V_{D1} \\ \frac{R_2R_3V_{cc}+R_1R_3(V_{D1}+V_{ds-DUT})}{R_1R_2+R_2R_3+R_3R_1} & \text{if } V_{ds-DUT} > \frac{V_{cc}R_3}{R_1+R_3} + V_{D1} \end{cases}$$
(4.12)

$$V_{gs-sense} = \frac{V_{cc} + V_{gs-DUT}}{(R_1 + R_2)(R_1 ||R_2||R_3)}$$
(4.13)

The output of the  $V_{ds-DUT}$  sensing network is connected to the inverting inputs of comparators  $U_{ch}$  and  $U_{bd}$ . The non-inverting inputs of the comparators are fixed threshold values  $V_{ds-sense}^{ch-th}$  and  $V_{ds-sense}^{bd-th}$  respectively. A third comparator,  $U_g$  is used to detect the state of applied gate voltage. The inverting and non-inverting inputs of this comparator are connected to  $V_{gs}^{th}$  and  $V_{gs-sense}$  respectively. The outputs of all the comparators are passed to the MCU through a digital isolator. The digital isolator ensures galvanic isolation between power and logic side grounds. Also since the proposed circuit relies on a gate drive power supply, it can be used with both high-side and low-side switch configurations.



Figure 4.21: Device operating points on the output curve to analyze choice of detection circuit threshold parameters.

#### **Principle of Operation**

The objective of using the detection circuit is to accurately identify the DUT's state of operation. More specifically, the detection circuit identifies the conduction state of the DUT's channel and the state of the applied gate voltage by sensing  $V_{ds-DUT}$  and  $V_{gs-DUT}$  respectively. To illustrate the operation of the proposed detection circuit, a commercial SiC MOSFET is considered as the DUT [1]. The output V - I curve of the device for  $V_{gs} = 15$  V and  $V_{gs} = -4$  V at junction temperature,  $T_j = 55^{\circ}$ C is plotted in Fig. 4.21. The plots are obtained by using the manufacturer provided SPICE model. As shown, for an application with a maximum instantaneous current  $I_{max}^+ = 15$  A, the device's  $V_{ds} = 1.81$  V. Due to MOSFET's symmetrical structure, the DUT's channel conducts for Q3 operation as well. Consequently, for  $I_{max}^- = -15$  A,  $V_{ds} = -1.81$  V. Therefore, for the given application when -1.81 V  $\leq V_{ds} \leq 1.81$  V, it can be safely concluded that the device's channel is conducting. Choosing  $V_{ds}^{ch-th} = 2.5$  V ensures that when the device channel is conducting, the output of comparator  $U_{ch}$  is high, and thus  $Q_{ch} = 1$ . However, in this case,  $Q_{ch} = 1$  even when



Figure 4.22: Overall schematic of CLB based fault detection logic.

 $V_{ds} < -1.81$  V. This is possible when the device channel is off and the body-diode conducts during Q3 operation. Therefore, additional information is required to determine the state of the device channel during Q3 operation. For this reason, a second comparator  $U_{bd}$  is used. As seen from Fig. 4.21, for the given application, the body-diode forward voltage drop varies between -3.42 V to -5.33 V depending on the instantaneous current value. Therefore, if  $V_{ds}^{bd-th} = -2.5$  V, the output of  $U_{bd}$  and thus  $Q_{bd}$  gives the state of body-diode conduction. In summary, the digital outputs  $Q_{ch}$  and  $Q_{bd}$  provide complete information about the state of the device channel and body-diode. Furthermore, the output of comparator  $U_g$  corresponds to the applied gate voltage.  $Q_g = 1$  when  $V_g$  is high and vice-versa. To ensure accurate fault detection, the thresholds must be carefully chosen depending on the device's operating points in Q1 and Q3 for the particular application.

# 4.4.2 Failure Detection Logic

The outputs of the previously discussed detection circuit are connected to a microcontroller (MCU) for processing. The MCU used in this study is a Texas Instruments TMS320F280041C with a configurable logic block (CLB). A CLB is an MCU peripheral that is functionally similar to an FPGA or CPLD [54]. Therefore, using CLB allows hardware-level logic signal

| $Q_{ch}$ | $Q_{bd}$ | $Q_g$ | Device<br>state-of-health             | Combinational<br>fault logic |
|----------|----------|-------|---------------------------------------|------------------------------|
| 0        | 0        | 0     | No conduction                         |                              |
| 0        | 0        | 1     | Open fault $(F_{open-Q1}^c)$          | $\overline{Q_{ch}Q_{bd}}Q_g$ |
| 0        | 1        | 0     | Invalid                               |                              |
| 0        | 1        | 1     | Invalid                               |                              |
| 1        | 0        | 0     | Q1 conduction fault $(F_{conduct}^c)$ | $Q_{ch}\overline{Q_{bd}Q_g}$ |
| 1        | 0        | 1     | Q1/Q3 channel conduction              |                              |
| 1        | 1        | 0     | Q3 Body-diode conduction              |                              |
| 1        | 1        | 1     | Q3 Open fault $(F^c_{open-Q3})$       | $Q_{ch}Q_{bd}Q_g$            |

 Table 4.4:
 Fault Identification Table

processing instead of software like in a typical MCU. This makes failure detection independent of the main control algorithm and eliminates any related overhead while allowing the failure detection logic to internally and quickly trip PWM outputs. The failure detection logic is implemented using combinational look-up table (LUT) elements and finite state machines (FSMs). The details of the implementation are discussed further.

Figure 4.22 shows the high-level schematic of the proposed fault detection logic. The fundamental idea of the failure detection logic is to check for inconsistency between the DUT's gate and channel operation. As previously discussed, gate-open failure leads to a loss of control over the device's conduction state. Therefore, tracking the device state during a gate transition event can allow the detection of a gate-open failure. For the proposed detection circuit, the state-of-health of the DUT corresponding to different combinations of the comparator outputs  $Q_{ch}$ ,  $Q_{bd}$  and  $Q_g$  is shown in Table 4.4. Combinational fault signals  $(F_{conduct}^c, F_{open-Q1}^c, \text{ and } F_{open-Q3}^c)$  corresponding to each type of fault can be obtained using logic gates. However, merely using combinational signals for detecting gate-open failure may lead to false positives. Delays associated with device switching and signal propagation during



Figure 4.23: State-transition diagrams for a) Blanking FSM b) Fault FSM.

transition events may appear as momentary inconsistencies in device operation. Therefore, it is important to differentiate between true failures and false positives while minimizing the fault detection time. For this reason, a blanking logic is implemented using a 4-state FSM and a counter. The state transition diagram for the blanking FSM is shown in Fig. 4.23(a). When the gate input,  $Q_g$ , changes, the blanking FSM transitions to a blank state and sets the corresponding output *B* high. This event also starts a counter that counts to a preset blanking value. Upon reaching the preset value, a match output (*C*) is set high by the counter. *C* then transitions the blanking FSM out of the blank state i.e. *B* becomes low and also resets the counter. By adjusting the count value, the blanking window can be modified as per application requirements. The blanking FSM automatically provides input hysteresis within the blanking time window thus making the logic immune to noise-related transition events. In addition to the blanking FSM, 4-state FSMs are also used for each of the fault outputs. Every fault FSM has two inputs- the corresponding combinational fault signal ( $F_a^c$ ) and the blank signal B. A transition on B 'arms' the fault FSM. At the end of the blanking window, when B goes low, the fault FSM either transitions to a normal state or a fault state depending on the value of the corresponding  $F_x^c$ . The fault output remains latched until the next gate transition event. This logic enables cycle-by-cycle fault detection and provides reset capability. The state equations for blanking FSM and fault FSM are given in (4.14)-(4.16) and (4.17)-(4.19) respectively.

$$S_{0-next} = \overline{S_0} S_1 Q_g Q_c + S_0 \overline{S_1} Q_g \tag{4.14}$$

$$S_{1-next} = \overline{S_0 S_1} Q_g + \overline{S_0} S_1 \overline{Q_c} + S_0 \overline{S_1 Q_g}$$

$$(4.15)$$

$$B = S_1 \tag{4.16}$$

$$S_{0-next} = (\overline{S_0} + \overline{S_1})B \tag{4.17}$$

$$S_{1-next} = (\overline{S_0}S_1 + S_0\overline{S_1F^c})\overline{B}$$
(4.18)

$$F_{out} = S_1 \tag{4.19}$$

# 4.5 Experimental Verification

In this section, the functioning of the proposed gate-open failure detection technique is experimentally verified for all the possible failure scenarios. The highly intermittent and unpredictable nature of gate-open failures makes it nearly impossible to recreate these faults on demand. This is especially important since the exact instance of fault occurrence determines the state of the failed MOSFET's gate and consequently its behavior under fault. Therefore, in order to comprehensively validate the functioning of the proposed detection technique under different failure scenarios, a gate-open fault emulation technique is used. The schematic of the experimental gate-open failure detection circuit is shown in Fig. 4.24(a). The DUT is plugged into this detection circuit and the board itself has external connections compatible with a TO-247 PCB footprint. The external drain and source connections are routed directly



Figure 4.24: (a) Schematic and (b) Actual prototype of proposed gate-open failure detection circuit board.

to the respective DUT leads. The gate connection, however, passes through an ultra-fast reed relay. Under normal operation, the relay is closed and the DUT behaves normally. To emulate a gate-open fault, the relay is opened which mimics the physical disconnection of the gate bond-wire under actual failure. The picture of the developed prototype board is shown in Fig. 4.24(b). The gate voltage applied by the gate driver is sensed for failure detection purposes. The previously discussed failure detection logic is implemented on the on-board microcontroller (MCU) which generates the fault signals. In actual applications, the logic



Table 4.5: Characterization of Relay Release Time

Figure 4.25: Verification of fault emulation technique.

can be implemented directly on the main control MCU. The proposed detection technique is experimentally verified under all failure scenarios using a synchronous boost topology. In the following sub-sections, the gate-open failure emulation strategy is first verified followed by verification of the proposed detection technique.

# 4.5.1 Characterization and Verification of Gate-Open Failure Emulation Technique

In order to reliably emulate gate-open failure during converter operation, it is necessary to precisely time the opening of the reed relay with respect to the applied gate signals. To this end, the relay release time is experimentally characterized, the results of which are shown in Table 4.5. Based on the values in the table, the worst-case relay opening time can be approximated to  $< 10 \ \mu$ s. Consequently, for 10 kHz converter operating frequency and D =



Figure 4.26: Synchronous boost converter used for experimental validation of proposed gate-open failure detection technique.

0.5, the relay should operate within half of the switching period (= 50  $\mu$ s). This is verified in actual converter operation as shown in Fig. 4.25. The relay is commanded to open soon after the DUT turns off, as indicated by the falling edge of the relay drive signal  $(S_{relay})$ . In the subsequent on interval, although the gate driver voltage  $(V_{g-DUT}^{GD})$  is high, the actual DUT gate voltage  $(V_{g-DUT}^{actual})$  remains low. This floating gate behavior is consistent with a gate-open fault. Similarly, if the relay is open when  $V_{g-DUT}^{actual}$  is high, it will remain high even when  $V_{g-DUT}^{GD}$  is low. It must be noted that for these experiments, the relay is reconnected after 1-2 switching intervals to allow normal converter operation. The theoretical fault interval is given by  $T_f$ , at the end of which the relay closes (including contact bounce) and  $V_{g-DUT}^{actual}$ starts following  $V_{g-DUT}^{GD}$ .

# 4.5.2 Verification of Failure Detection under Q1 Operation

The synchronous boost configuration used to verify failure detection under the Q1 operation of the DUT is shown in Fig. 4.26. The specifications of the boost converter are as follows- $V_{in}$ = 50 V,  $V_{out} = 100$  V,  $f_{sw} = 10$  kHz, D = 0.5,  $C_{in} = 50 \ \mu\text{F}$ ,  $C_{out} = 1800 \ \mu\text{F}$  and  $R_L = 180 \ \Omega$ . For Q1 operation, the DUT is configured as the low side switch and S' is the complimentary



Figure 4.27: Experimental verification of a) Q1 open fault detection and b) fault detection timing.

high-side switch. Under Q1 operation of the DUT, an open or conduction type gate-open failure may occur as discussed in Section II. Each of these scenarios is verified below.

# Q1 Open Fault

Figure 4.27(a) shows the waveform for verification of Q1 open fault detection. The gate relay is opened when  $S_{relay}$  goes low. In the subsequent on period, the DUT fails to turn on. Consequently, even though the complimentary synchronous switch is off  $(PWM_{S'})$ , its body-diode is forward biased and the drain-source voltage across the DUT  $(V_{ds-DUT})$  remains at 100 V as shown in the figure. Therefore, Q1 open fault  $(F_{open-Q1})$  is raised during the DUT's on interval. The timing of the fault signal is verified in Fig. 4.27(b). As seen, the delay between rising edges of  $V_{g-DUT}^{GD}$  and  $F_{open-Q1}$  is 120 ns. This delay includes the delay caused by comparators, digital isolators, and blanking intervals.

#### Q1 Conduction Fault

Figure 4.28(a) shows the experimental waveforms in case of a conduction fault. In this case, the gate relay is opened when  $V_{g-GD}$  is high to emulate an on-period gate-open fault. As seen, the DUT fails to turn off when  $V_{g-GD}$  goes low. In the figure, this condition is represented by  $V_{ds}$  remaining low when DUT should turn on. The fault output  $(F_{cond-Q1})$  is used as a trip signal for the PWM generator. Therefore, it is seen that the gate signals of both the DUT and S' are low after the fault is raised. The timing of the fault signal is shown in Fig. 4.28(b).  $F_{cond-Q1}$  goes high 150 ns after  $V_{g-GD}$  goes low. This is lower than the dead time between the high-side and low-side switching signals which, for this experiment, is set at a fixed value of 400 ns. Therefore, the gate signal for S' remains low because of PWM trip action. Consequently, fast fault detection prevents a shoot-through event in case of conduction type gate-open failure scenario.

#### 4.5.3 Verification of Failure Detection under Q3 (Synchronous) Operation

For experimental verification of failure detection in Q3 mode of operation, the position of S' and DUT in Fig. 4.26 is interchanged. Specifically, the DUT operated as the synchronous switch. The experimental waveforms for verification of the proposed gate-open failure detection technique in the Q3 open type fault scenario in shown in Fig. 4.29(a). As seen when the relay is opened,  $V_{g-DUT}^{actual}$  remains low even during the on intervals. Moreover, as discussed


Figure 4.28: Experimental verification of a) conduction fault detection and b) conduction fault detection timing.

in Section II,  $V_{g-DUT}^{actual}$  becomes more negative when the diode turns on which is indicated by large negative  $V^{ds-DUT}$ . For these intervals, it is seen that  $F_{open-Q3}$  is high. The timing of the fault signal is verified in Fig. 4.29(b). Unlike the previous cases, it is seen that  $F_{open-Q3}$ is asserted 800 ns after PWM<sub>S'</sub> rising edge. It is because, for Q3 open faults, the blanking time necessarily has to be greater than the switching dead time. Since the body-diode is on during the dead time, using a blanking value less than that would trigger a false positive.



Figure 4.29: Experimental verification of a) Q3 open fault detection and b) fault detection timing.

Moreover, since this is a safe failure mode, the delay in failure detection is not a significant factor as long as the fault is detected within the off period.

# 4.5.4 Comparison of Proposed Technique to Traditional DESAT Protection Scheme

DESAT protection schemes are traditionally used to protect the switching device against high-current events that may occur during faults. Many modern commercial gate drivers have built-in DESAT protection feature. A typical DESAT protection circuit is shown across S' in Fig. 4.26. In case of a fault, when  $V_{ds}^{S'}$  exceeds  $V_{cc} + V_D$ , the current source starts charging the blanking capacitor  $C_{blk}$ . When the voltage across the blanking capacitor exceeds the desat threshold  $(V_{th-desat})$ , the switch is turned off and a fault signal is raised. The conventional DESAT protection scheme is compared to the proposed gate-open failure detection technique for different gate-open failure scenarios as described below.

#### Q1 Open Fault

In this case, if present, DESAT protection of the faulty switch is triggered since the switch is in a blocking state and  $I_{chg}$  charges  $C_{blk}$ . However, the conventional DESAT scheme will not be able to differentiate between an open fault caused by gate-open failure and an over-current saturation fault. On the other hand, the proposed fault detection circuit is triggered only in case of a Q1 open fault caused by gate-open failure. As described above, the proposed gate-open failure detection circuit is not active throughout the on/off interval and only makes a single shot detection at the end of the blanking period which in this case is 60 ns. Since most switches are unlikely to saturate within this time, saturation fault in most cases will not trigger  $F_{open-Q1}$ .

#### Q1 Conduction Fault

In case of a conduction-type gate-open fault, the resulting shoot-through event may trigger the DESAT protection feature of the complimentary switch's gate driver which could theoretically protect against a conduction gate-open fault. Fig. 4.30 shows the comparison between the proposed fault detection technique and DESAT scheme. The setup is similar to Fig. 4.26. However,  $F_{conduct}$  does not trip the PWM outputs in this case. For the conventional gate driver used,  $C_{blk} = 60$  pF,  $I_{chg} = 0.5$  mA and  $V_{th-desat} = 9$  V. It is seen that it takes 2.5  $\mu$ s from PWM<sub>S'</sub> rising edge to DESAT fault getting triggered. As shown above, with the



Figure 4.30: Comparison of proposed fault detection circuit with conventional DESAT protection scheme.

proposed technique,  $F_{conduct}$  is asserted in 150 ns and since this is less than the dead-time, PWM is tripped and shoot-through is prevented. In the case of DESAT based protection, however, a shoot-through current is necessary to saturate the switches and trigger the fault. This is especially important since SiC MOSFETs unlike Si IGBTs have lower short-circuit withstand capability due to their relatively smaller die size [55]. Moreover, SiC MOSFETs do not have a well-defined knee point on the output curve and have high power dissipation in saturation [56]. In addition, the shoot-through event may cause thermal runway in not only the power switches but also in other system components as well. Furthermore, DESAT scheme cannot differentiate between saturation event caused by the gate-open fault or a different fault mechanism. Because of these reasons, the proposed gate-open failure detection has advantages over traditional protection schemes in detecting conduction type fault.

# Q3 Open Fault

Conventional DESAT scheme cannot detect this fault type since it is deactivated during the switch-off time. Moreover, since this is a soft failure where the converter may seem to be healthy apart from deteriorated efficiency, it is very challenging for most conventional protection mechanisms to detect an open failure in Q3 operation. Therefore, the proposed failure detection circuit can reliably detect Q3 open-type failures. This is especially useful since the intermittent nature of gate-open failure may cause the device to recover from Q3 open failure and later show conduction-type failure.

# 4.6 Conclusion

In this chapter, intermittent gate-open failure is investigated in the context of discrete SiC devices. The electrical behavior of MOSFET under gate-open failure is first analyzed. Failed devices from the DC power cycling test are inspected and analyzed using a systematic multi-step process. Given the intermittent and elusive nature of gate-open failure, the methods used in this article maybe be used as a guide for gate-open failure analysis. FEA analysis is used to identify a potential mechanism for gate-open failure. While the gate-bond itself does not carry a large current, it is shown that it experiences interfacial shear stress due to deformation caused by CTE mismatch between various device elements. A larger CTE mismatch between EMC and Copper drain tab is shown to increase the maximum shear stress. Thereafter, an on-board failure detection technique is proposed for all types of gate-open failure modes. The specific nature of gate-open failure is exploited to create a fast failure detection technique that is inherently selective and robust. Through comparison and experimental verification, it is shown that the proposed technique is not only capable of detecting all gate-open failure modes but also differentiates between gate-open failure and other failure modes. Specifically, potentially dangerous conduction type failure mode is detected within the switching dead-time, thus preventing a shoot-through event in the switching leg.

#### CHAPTER 5

# A PRACTICAL SWITCH CONDITION MONITORING SOLUTION FOR SIC TRACTION INVERTERS

# 5.1 Introduction

Traction inverters are safety-critical sub-systems in an electric vehicle. The power semiconductor switches at the heart of these inverters are single-point-of-failure components that limit the reliability of the system. Therefore, online condition monitoring solutions for power switches would be valuable in improving the overall safety and reliability of electric vehicles. This is especially the case as manufacturers are shifting from IGBTs to Silicon Carbide (SiC) MOSFETs in traction inverters due to efficiency and power-density benefits [57]. The long-term reliability of SiC devices is unproven in the field and several fundamental challenges exist [58,59]. However, of the many condition monitoring techniques that have been proposed in the literature, most do not sufficiently address the challenges in developing an effective yet practical online condition monitoring (OCM) solution for reasons discussed further.

Developing an OCM solution starts with the selection of a suitable precursor. An ideal precursor for OCM is 1) sensitive to device aging, 2) insensitive to change in junction temperature/operating conditions, and 3) easy to measure online. However, these requirements are often conflicting which makes developing a practical OCM solution challenging. In particular, compensating for precursor change due to variable operating conditions and isolating aging-related change is difficult [9]. Several well reported aging precursors are qualitatively compared in Table 5.1 [60]. The device threshold voltage  $(V_{th})$  is a reliable gate-oxide degradation precursor. Gradual  $V_{th}$  shift, particularly under high temperature operation and large gate-bias voltages is a well known phenomenon in SiC MOSFETs [32]. However, while simple circuits have been proposed for offline in-circuit  $V_{th}$  measurement,

| Aging Precursor                            | Sensitivity to<br>Gate-oxide Aging | Sensitivity to<br>Package Aging | Sensitivity to<br>Operating Conditions | Ease of Online<br>Measurement |
|--------------------------------------------|------------------------------------|---------------------------------|----------------------------------------|-------------------------------|
| Gate threshold voltage $(V_{th})$          | High                               | None                            | High                                   | Hard                          |
| Saturation On-state Resistance $(R_{sat})$ | High                               | Very Low                        | Medium                                 | Hard                          |
| Body-diode Forward Voltage $(V_{sd})$      | High                               | High                            | Low<br>(by compensation)               | Hard                          |
| Turn on time $(T_{on})$                    | Medium                             | Low                             | Medium                                 | Hard                          |
| Gate-Leakage Current $(I_{gss})$           | High                               | Low                             | Low                                    | Easy                          |
| Package thermal resistance $(R_{th})$      | Low                                | High                            | Low                                    | Hard                          |
| On-state Resistance $(R_{ds-on})$          | Medium                             | High                            | High                                   | Easy                          |

Table 5.1: Qualitative Comparison of Known Aging Precursors

online  $V_{th}$  measurement during system operation can be relatively more complicated due to the need for a high-speed sample and hold circuits [61, 62]. An alternative precursor for monitoring gate-oxide degradation is the saturation region resistance  $(R_{sat})$ . However, like  $V_{th}$ , although practical in-system measurement of  $R_{sat}$  is feasible, online monitoring is challenging. As the gate-oxide degrades, charged defects may line up and form a conductive path between the gate and the source. This causes a remarkable increase in the gate-leakage current  $(I_{gss})$ which is used as an aging precursor in several studies [24, 63]. In addition to being a strong indicator of gate-oxide degradation,  $I_{gss}$  is also easy to measure online. Even so, a major challenge with using  $I_{gss}$  is that due to its exponential nature, in some applications, it may not provide a failure warning sufficiently in advance. Unlike the static precursors discussed so far, turn-on time  $(t_{on})$  is a useful dynamic precursor for tracking gate-oxide degradation [64]. But because of the inherently short switching duration in SiC devices, obtaining high-resolution measurement of  $t_{on}$  is difficult [65]. Another crucial limitation of all the preceding precursors is their insensitivity to package degradation. For comprehensive OCM of a device, both gate-oxide and package degradation modes need to be tracked. Body-diode forward voltage drop  $(V_f)$  is one such precursor that is sensitive to both gate-oxide and package degradation at zero and negative gate voltages respectively [33]. Moreover, by measuring  $V_f$  at low and high current values, the extent of gate-oxide and package degradation can be exclusively quantified. By using a similar technique it is also possible to compensate for variation in  $V_f$ 



Figure 5.1: Representation of fixed conservative failure threshold vs adaptive failure threshold with respect to healthy device characteristics.

due to junction temperature  $(T_j)$  change [36]. In spite of these promising properties, the need for a multi-level gate drive circuit makes  $V_f$  based OCM solution complicated. Also, since the device current is subject to the applied load, making repeatable measurements at fixed current values is challenging. The change in the thermal impedance of a device maybe used for die-attach delamination monitoring [66, 67]. Such techniques, however, are sensitive to the accuracy of the device thermo-mechanical model and require online temperature sensing capability.

On-state resistance  $(R_{ds-on})$  is a well known precursor for condition monitoring [68–70].  $R_{ds-on}$  in SiC MOSFETs is known to be sensitive to both package and die level degradation mechanisms [60]. One of the advantages of using  $R_{ds-on}$  as a precursor is its relative ease of online measurement [71,72]. Even with the fast switching speed of SiC MOSFETs which causes large dV/dt swings, online measurement of  $R_{ds-on}$  is feasible [73,74]. Moreover, in certain power converter topologies, in-circuit frequency analysis based indirect  $R_{ds-on}$  extraction techniques may also be used [75,76]. Although restricted to some topologies and parameter variation range, wherever feasible, such techniques further simplify the implementation of  $R_{ds-on}$  based condition monitoring solutions. However,  $R_{ds-on}$ 's equally popular use as a temperature-sensitive electrical parameter (TSEP) highlights its high sensitivity to junction temperature change with operating conditions [77,78]. This is a crucial challenge for  $R_{ds-on}$ based online condition monitoring. One way to overcome this challenge is to use a fixed conservative failure threshold ( $R_{th}^{\text{fail}}$ ) that accounts for worst-case operating conditions' related change as shown in Fig. 5.1. However, in this case, failure detection at lower load, and temperature levels may be ineffective. As shown in Fig. 5.1, an adaptive threshold on the other hand, would be more effective in detecting failures. Precise device characterization and thermo-electrical modeling maybe used to compensate for temperature-related changes but the online implementation of such methods is challenging. These challenges need to be addressed to create a practical online condition monitoring solution.

The first objective of this paper is to present an end-to-end solution for accurate online on-state resistance  $(R_{ds-on})$  measurement for all six switches of a 3-phase 2-level inverter. First, a simple and practical sensing circuit is proposed for online measurement of the switch drain-source voltage  $(V_{ds})$ . Specifically, the proposed circuit minimizes the no. of external components by reusing certain components in the built desaturation protection feature of the gate-driver. Data sampling and processing by the main controller requires to be done without interfering with the core motor control task. To address this challenge, an efficient equivalent time sampling technique is proposed. Using the proposed data acquisition technique, periodic  $R_{ds-on}$  data is obtained at a highly effective sampling rate. Lastly, the raw data is filtered using a Kalman filter to minimize noise effects and obtain an accurate  $R_{ds-on}$  estimate for each of the inverter's switches.

For online condition monitoring, it is necessary to translate the obtained  $R_{ds-on}$  value for the prediction of the switch state of health (SoH). In particular, it is necessary to isolate the effect of the system operating point on  $R_{ds-on}$ . To address this, this article proposes an online Bayesian condition monitoring technique. The algorithm exploits the approximate electrical and thermal symmetry in the inverter's operation. By carefully designing the input features to the Bayesian algorithm, the effect of operating point related change can be largely eliminated. The recursive nature of the proposed algorithm makes it feasible to implement online. Moreover, the expected deviations in device parameters and operating conditions can be modeled in the algorithm. The key contributions of this paper can be summarized as follows

- 1. This work proposes a practical, low-component count on-state voltage sensing circuit for  $R_{ds-on}$  measurement. While there are several on-state voltage measurement circuits in literature, they do not address the system level implementation of the solution, especially in multi-switch converters. The proposed method considers the entire signal chain and system level implementation. It is experimentally verified for simultaneous  $R_{ds-on}$  sensing of all six switches in a 3-phase inverter.
- 2. A critical and often overlooked aspect of controller based online condition monitoring solution is signal acquisition and processing. This paper addresses this by proposing a code-efficient data acquisition and filtering algorithm. The algorithm is specifically designed to address all the important challenges i.e. to achieve a high effective sampling rate while being implementable on a microcontroller and also ensuring that it does not interfere with the critical motor control tasks.
- 3. Lastly, and most importantly, a Bayesian sate-of-health estimation algorithm for estimating individual switch health is proposed. The proposed algorithm addresses the important problem of isolating aging-related change from operating conditions related change in switch  $R_{ds-on}$  change. The primary advantage of the proposed algorithm over existing techniques is the elimination of the need for extensive device/system specific calibration making it highly practical.



Figure 5.2: High-level schematic of the  $V_{ds}$  measurement circuit.

# 5.2 Proposed Online R<sub>ds-on</sub> Measurement Solution

The proposed end-to-end  $R_{ds-on}$  measurement solution is presented in the context of a generic traction drive system, the high-level schematic for which is shown in Fig. 5.2.

#### 5.2.1 Measurement Circuit Design

In this section, the details of the proposed  $R_{ds-on}$  measurement circuit are discussed in detail.

## **Circuit Description and Analysis**

In order to calculate a particular switch's  $R_{ds-on}$ , the on-state voltage  $(V_{ds})$  across the switch is measured and divided by the drain current value  $(I_d)$  obtained from the corresponding phase current sensor  $(\pm i_{a,b,c})$ . Consequently, accurate  $V_{ds}$  measurement is crucial for accurate measurement of  $R_{ds-on}$ .

The high-level schematic of the proposed  $V_{ds}$  measurement circuit is shown in Fig. 5.2. The measurement circuit has two op-amp stages for sensing and signal conditioning. The proposed measurement circuit's functionality reuses the components of the gate driver's desaturation (DESAT) protection circuit to sense switch  $V_{ds}$ . Specifically, the internal DESAT current source  $(I_{des})$  and high voltage blocking diodes  $(D_1, D_2)$  are common to both the DESAT circuit and sensing circuit. By leveraging this redundancy, the measurement circuit's overall component count is reduced. The measurement circuit's output  $(V_{U2})$  is connected to one of the integrated ADC channels of the Texas Instruments UCC5870 gate driver [79]. The converted digital output is read from the gate-driver by the main controller over an SPI link. Although this work uses a gate driver with an integrated ADC, a similar circuit with a combination of discrete components would be just as feasible.

In the circuit shown in Fig. 5.2, when the low-side switch turns on, the  $I_{des}$  flows through  $R_1$ ,  $D_1$ , and  $D_2$ . In this case, assuming op-amp  $U_1$  operates in the linear region and has a large gain, the non-inverting and inverting input voltages of  $U_1$  can be equated as shown in (5.1) and (5.2)

$$V_{ds} + V_d^{D_1} = V_{U_1}(\frac{R_4}{R_4 + R_5}) + (V_{ds} + V_d^{D_1} + V_d^{D_2})(\frac{R_5}{R_4 + R_5})$$
(5.1)

$$V_{U_1} = V_{ds} + (V^{D_1} - V^{D_2} \frac{R_5}{R_4})$$
(5.2)

where,  $V_d^{D_1,2}$  are forward voltage drops of diodes  $D_{1,2}$ . If  $V_d^{D_1} = V_d^{D_2}$  and  $R_4 = R_5$ , (5.2) reduces to (5.3):

$$V_{U_1} = V_{ds}.$$
 (5.3)

Since the ADC input is unipolar and it's input range usually lower than the range of  $V_{ds}$  variation, a second op-amp stage ( $U_2$  in Fig. 5.2) is used to scale and level-shift the output of  $U_1$ . The equation for  $U_2$  is

$$V_{U_2} = V_{ref} - V_{ds} \frac{R_9}{R_8}$$
(5.4)

Equation (5.3), however, represents an ideal case. Practically, even identical components have mismatches. In order to analyze the effect of component mismatch on  $V_{ds}$  measurement accuracy, it is assumed that  $V_d^{D1} = V_d^{D2} + e_D$  and  $R_4 = R_5 + e_R$ , where  $e_D$  is diode mismatch and  $e_R$  is resistance mismatch. In this case, (5.2) becomes (5.5)

$$V_{U_1} = V_{ds} + (V^{D_2} + e_D - V^{D_2} \frac{R_5}{R_5 + e_R})$$
(5.5)



Figure 5.3: Variation in forward voltage mismatch between high-voltage diodes with forward current.

In (5.5), when using low tolerance resistors it can be assumed that  $R_5 >> e_R$ . With this, (5.5) reduces to:

$$V_{U_1} = V_{ds} + e_D + V_d^{D2} \frac{e_R}{R_5}$$
(5.6)

Therefore, the mismatch in diode forward voltage drop, the magnitude of forward voltage drop, and resistance mismatch,  $e_R$  contribute to error in sensed  $V_{ds}$ .

From (5.6),  $e_D$  is observed to be a dominant contributor to the overall  $V_{ds}$  measurement error. Therefore, to understand how  $e_D$  changes with  $I_f$ , the absolute pairwise mismatch between four diodes is characterized at different forward current  $(I_f)$  values as shown in Fig. 5.3. It is seen that low  $I_f$  values do not necessarily result in a smaller mismatch. Relatively larger values inherently lead to large errors and also increase power loss in the gate driver and DESAT diodes. For this study,  $I_f = I_{des} = 1$  mA is chosen. Also, in the measurement circuit,  $R_3$  and  $C_2$  act as the input filter. The cutoff frequency is 1.59 MHz. The filter is used to prevent interference due to high-frequency switching noise.



Figure 5.4: Task execution in typical motor control firmware implementations.

## **Out-of-order Equivalent Time Sampling Data Acquisition Technique**

The typical high-level implementation of motor control firmware is illustrated in Fig. 5.4. All tasks performed by the controller can be broadly divided into two categories. The tasks essential to the core motor control function are executed in a strictly periodic interrupt service routine (ISR) which is triggered by a high priority interrupt with fixed period  $T_{\text{intr}}$ . All the other auxiliary functions like communications, system diagnostics, and fault management are performed within a loosely periodic loop outside the main ISR. Furthermore,  $T_{\rm ex}^{\rm ISR} < T_{\rm intr}$ , where  $T_{\text{ex}}^{\text{ISR}}$  is execution time for ISR, is a hard constraint in order to ensure that PWM updates reflect in the inverter's output voltage. On the other hand, the execution of one auxiliary tasks' loop iteration can be spread over several interrupt periods. Therefore, it is important to minimize the number of tasks executed during the ISR to minimize  $T_{\text{ex}}^{\text{ISR}}$ . Consequently, since degradation is a relatively gradual phenomenon, OCM functions should be implemented in the auxiliary tasks' loops. However, executing the  $R_{ds-on}$  measurement function in the auxiliary tasks' loop is challenging since it is not strictly periodic and time bound. Therefore, obtaining sequential, periodic, and simultaneously sampled data points for  $I_d$  and  $V_{ds-on}$  is a non-trivial problem. Applying data filtering and processing algorithms to sequential and periodic data is significantly simpler when compared to non-periodic data. And as discussed subsequently, due to the inherently noisy nature of traction systems, filtering



Figure 5.5: Schematic of traction drive system used in this study.

and post processing of raw  $R_{ds-on}$  data is inevitable for OCM. Therefore, in order to address this challenge, an out-of-order equivalent time sampling algorithm is proposed as illustrated in Fig. 5.6. This technique enables the execution of  $R_{ds-on}$  measurement function in a loosely periodic tasks' loop while ensuring a high sampling rate and periodicity of the acquired data. The technique uses the motor's electrical angle  $(\theta)$  as a reference, shown in Fig. 5.5, to precisely time the data sampling instances. The angle variable's value is incremented during every ISR execution and varies between 0 and 1 as shown in Fig. 5.5. During operation, the phase of q-axis current is given by the angle value. Therefore given a value of  $\theta$ , the corresponding value of current can be easily calculated. Based on the converse logic, an array of  $\theta$  reference points corresponding to desired  $I_d$  and  $V_{ds-on}$  sampling instances for all the six switches is created. Specifically, the sampling window is symmetrically centered around peak  $I_d$  as values close to the zero crossing can lead to large errors in calculated  $R_{ds-on}$ . As illustrated in Fig. 5.6, every time the controller executes the OCM function, it checks if the  $\theta$ value from the latest ISR execution is equal to any of the values in the reference array. If there is a match, the controller immediately logs the  $V_{ds-on}$  and  $I_d$  values into respective arrays at the corresponding index. A recursive binary search algorithm is used to perform the search for fast execution and code size efficiency. It must be noted that when using



Figure 5.6: Illustration of out-of-order equivalent time sampling technique.

equivalent time sampling, signal repetition is a prerequisite for accurate signal reconstruction. Therefore, it is important to acquire the data in a minimum number of current cycles to avoid inaccuracies due to signal variation. The opportunistic out-of-order equivalent sampling proposed above ensures that all the data points are sampled in the fewest number of motor current cycles possible. On the contrary, if a sequential equivalent time sampling method were used, at least N current cycles would be necessary to sample N data points. For the implementation used in this study, a comparison between an out-of-order equivalent time sampling and sequential sampling for different sample sizes is shown in Fig. 5.7. As seen, the out-of-order ETS technique is orders of magnitude faster than sequential ETS. Moreover, it should be noted that the results shown are obtained with significantly long auxiliary tasks'



Figure 5.7: Comparison of out-of-order vs sequential equivalent time sampling techniques.

loop. As mentioned above, the execution time of the auxiliary tasks' loop significantly affects the overall sampling duration.

# 5.2.2 Kalman Filter Design

Traction systems are inherently noisy and subject to operational variations. Due to this, raw data acquired using the sampling technique proposed above needs additional filtering to obtain an accurate real-time  $R_{ds-on}$  estimate. For this OCM application, a Kalman filtering technique is suitable due to its memory and processing efficient implementation. Furthermore, its optimal nature makes it robust to small modeling errors and enables a general implementation by eliminating the need for unit-specific calibration. Specific details of the filter's design and parameter selection are discussed further.

# Filter Model

To create the system model, it can be safely assumed that the sampling window for each measurement cycle is small. Based on this assumption, the effect of variations in operating temperature can be ignored. However, the switch current varies during the sampling window.



Figure 5.8: Variation of device  $R_{ds-on}$  with drain current at  $V_{gs} = 15$  V for the device used in this study.

As shown in Fig. 5.8, SiC MOSFET's  $R_{ds-on}$  typically has some degree of current dependence. Therefore, it is necessary to model the effect of current variation on change in  $R_{ds-on}$ . Although the switch current varies, due to sampling around  $I_d$  peak at a relatively high effective rate, the change in current between sampling instances is relatively constant. This is shown in (5.7), where the difference in switch currents  $\Delta i_k = i_k - i_{k-1}$ . In this case, relation between  $R_{ds-on}$  values,  $r_k$ , obtained at consecutive sampling instances is given by (5.8), where slope  $S = \frac{dr}{di}$ .

$$\Delta i_{k|k-1} = \Delta i_{k-1|k-1} \tag{5.7}$$

$$r_{k|k-1} = r_{k-1|k-1} + S\Delta i_{k|k-1} \tag{5.8}$$

Using the above equations, the system model for  $R_{ds-on}$  filtering can be created. With state matrix,  $X_k$ , given by (5.9), the equations for the Kalman filter's prediction step are given by (5.10), (5.11).

$$X_k = \begin{bmatrix} \Delta i_k & r_k \end{bmatrix}^T \tag{5.9}$$

Prediction step:

$$X_{k|k-1} = AX_{k-1|k-1} + R (5.10)$$

$$P_{k|k-1} = AP_{k-1|k-1}A^T + Q (5.11)$$

where state transition matrix,

$$A = \begin{bmatrix} 1 & 0 \\ S & 1 \end{bmatrix}, \tag{5.12}$$

P is the covariance matrix, Q is the process noise matrix that corresponds to modeling uncertainty, and R is the measurement noise covariance. Based on on-board  $R_{ds-on}$  measurements, the predicted state and covariance values,  $X_{k|k-1}$  and  $P_{k|k-1}$  are updated using equations (5.13), (5.14) respectively. Update step:

$$X_{k|k} = X_{k|k-1} + K_k(X_k - X_{k|k-1})$$
(5.13)

$$P_{k|k} = (I - K_k)P_{k-1|k-1}$$
(5.14)

where Kalman Gain K is given as

$$K_k = P_{k|k-1} H^T (H P_{k|k-1} H^T + R)^{-1}.$$
(5.15)

Since the state observation matrix H = I, in this case, (5.15) is rewritten as:

$$K_k = P_{k|k-1}(P_{k|k-1} + R)^{-1}$$
(5.16)

## Selection of Filter Parameters

For optimal filter performance, it is important to choose the initial values of  $X_k$ ,  $P_k$ , Q, and R carefully. For brevity, the choice of values used in this study is directly discussed here. The same values are used to obtain the experimental results presented in a subsequent section. For the initial value of the state matrix  $X_k$ , the typical value of  $R_{ds-on}$  for the selected device obtained from the datasheet is used. In this study,  $r_{\text{init}} = 0.180$ . The initial value of  $\Delta i$  is given by (5.17), where  $\text{rmp}_0^{\text{ref}}$  is the ramp reference value at the left end of the sampling window. From Fig. 5.8, the value of  $S \approx 0.005$ .

$$\Delta i_{\rm init} = \cos(2\pi \times \mathrm{rmp}_0^{\rm ref}) T_{\rm intr} \tag{5.17}$$

For initial values of the covariance matrix, a worst-case  $R_{ds-on}$  measurement standard deviation,  $\sigma_r = 30 \text{ m}\Omega$  is assumed. It is relatively large and gives a variance,  $\sigma_r^2 = 0.0009$ . Similarly, a large initial variance of 0.0001 is assumed for  $\Delta i$ . Due to the relatively high confidence in the model, the system noise is set to low values,  $Q = \begin{bmatrix} 0.00001 & 0 \\ 0 & 0.00001 \end{bmatrix}$ . For R, a high value is chosen for  $\Delta i$  measurement as the current measurement is inherently very noisy. The measurement noise for  $R_{ds-on}$  is set equal to the initially estimated variance. Therefore,  $R = \begin{bmatrix} 0.04 & 0 \\ 0 & 0.0009 \end{bmatrix}$ .

# 5.3 Proposed Bayesian SoH Estimation Solution

In the previous sections, an accurate online  $R_{ds-on}$  solution is presented. However, for online condition monitoring, it is necessary to isolate change in the precursor's value due to device degradation from change due to variable operating conditions. This is especially true for  $R_{ds-on}$ , which in addition to aging is also a function of applied gate voltage  $(v_g)$ , device junction temperature  $(t_j)$ , and the drain current  $(i_d)$ . Although challenging, it is technically possible to track changes in  $R_{ds-on}$  due to operational conditions and isolate aging-related change. However, this would require detailed parameter extraction for each individual device which would render such a solution impractical for general production. Therefore, in this section, a practical Bayesian state-of-health estimation algorithm is proposed. This stochastic SoH estimation solution exploits two properties of a traction system. First, under normal operation, the 3-phase inverter has a symmetrical operation within some tolerance. The load conditions are symmetrical on all three phases and therefore on the six switches. This also extends to the gate drive conditions and thermal design. Consequently, the mean operating conditions in all six switches of the inverter are similar with some tolerance. Secondly, the probability of failure occurring simultaneously in multiple switches is relatively low. Given that the useful operational life of switches is of the order of years, even with a narrow range of failure timelines, the temporal co-occurrence of failure in several switches is very unlikely. As the frequency of precursor measurement increases, this probability becomes smaller.

There are several advantages due to which a Bayesian inference technique is particularly suitable for  $R_{ds-on}$  based online condition monitoring. Using a stochastic SoH estimation solution allows accounting for device variation, errors in online measurement, and modeling uncertainties while minimizing false positives. Consequently, the need for the extensive device and/or system-specific parameter calibration can be avoided making the overall solution very practical. Moreover, the recursive nature of the Bayesian algorithm makes it computationally efficient which is crucial for OCM solutions. Furthermore, it is robust to initial conditions and enables the incorporation of existing system knowledge into the model. However, the realization of these benefits strongly depends on the effective formulation and feature selection. Therefore, feature selection and Bayesian formulation strategies for  $R_{ds-on}$ based SoH estimation are presented in detail in the following sections.

### 5.3.1 Input Feature Design

The equation for switch  $S_i$ 's  $R_{ds-on}$ ,  $R_i$  at a given operating point and degradation factor  $(\eta)$  is given by (5.18).  $X^{ds}$  represents the baseline value for parameter X at datasheet conditions.  $\frac{\partial R_i}{\partial x}$  represents the first-order change in  $R_i$  with respect to variation in parameter x. The variation due to operating point, switch junction temperature  $(t_j)$ , drain current  $(i_d)$  and degradation- both gate-oxide and package are modeled as first-order functions. The changes in these parameters reflect in the measured  $R_{ds-on}$ . To determine the switch SoH, the aging-related shift in of primary interest. However, isolating it from the operating point dependent component is challenging since it is generally more dominant. However, the relative  $R_{ds-on}$  deviation between switches is considered, (5.19) is obtained. In this case, assuming the same parameter dependence in the respective switches, the operating point component

is now only a function of the relative unsymmetry  $(\Delta t_j^{op}, \Delta t_j^{op})$ . Since the switches in an inverter operate symmetrically, these values are generally small. The mismatch in the switch baseline parameters is represented by the static mismatch component. Since it is independent of operating conditions, a one-time normalization of this can be performed at the start of the system's life. This leaves the relative aging-dependent component. In case of bond-wire failure or faster degradation in one of the switches, this value would show a clear and sharp increase. Package failures are by far the dominant failure mechanisms. However, in case of a gradual shift due to gate-oxide aging, considering the relative  $R_{ds-on}$  deviation may not be sufficient. Moreover, such aging is generally consistent in identical switches. For this, self deviation of a switch's  $R_{ds-on}$  from  $t_0$  baseline,  $\Delta R_{ii}$  can be considered as shown in (5.20). This parameter does not have a static mismatch component. However, it is dependent on the deviation from baseline operating conditions.

$$R_{i}(v_{g}, t_{j}, i_{d}, \eta) = R_{i}^{ds} + \frac{\partial R_{i}}{\partial v_{g}}(v_{g} - V_{g}^{ds}) + \underbrace{\frac{\partial R_{i}}{\partial t_{j}}(t_{j}^{op} - T_{j}^{ds}) + \frac{\partial R_{i}}{\partial i_{d}}(i_{d}^{op} - I_{d}^{ds})}_{\text{operating point dependent}} + \underbrace{R_{i}^{\eta - pk} + \frac{\partial R_{i}}{\partial v_{th}}(\Delta V_{th}^{\eta - ox})}_{\text{aging shift}} + e_{i} \quad (5.18)$$

$$\Delta R_{ij}(v_g, t_j, i_d, \eta) = \underbrace{\Delta R_{ij}^{ds} + \frac{\partial R}{\partial v_g}(\Delta v_g)}_{\text{static mismatch}} + \underbrace{\frac{\partial R}{\partial t_j}(\Delta t_j^{op}) + \frac{\partial R}{\partial i_d}(\Delta i_d)}_{\text{static mismatch}}$$

operational unsymmetry

$$+\underbrace{\Delta R_{ij}^{\eta-pk} + \frac{\partial R}{\partial v_{th}}(\Delta V_{th}^{\eta})}_{\text{diff. in aging}} + e_{ij} \quad (5.19)$$

$$\Delta R_{ii}(v_g, t_j, i_d, \eta) = \underbrace{\frac{\partial R_i}{\partial v_g} (\Delta v_g^{t_0}) + \frac{\partial R_i}{\partial t_j} (\Delta t_j^{t_0}) + \frac{\partial R_i}{\partial i_d} (\Delta i_d^{t_0})}_{\text{operating point w.r.t baseline conditions}} + \underbrace{R_{i0}^{\eta - pk} + \frac{\partial R_i}{\partial v_{th}} (\Delta V_{th}^{\eta - ox})}_{\text{aging shift}} + e_i \quad (5.20)$$

# 5.3.2 Bayesian Formulation

The SoH of the switch  $S_i$  is represented by  $F_i$ , i = 1, ..., 6, and  $P(F_i)$  represents the probability that the switch  $S_i$  has failed. Based on the previous discussion, the input feature vector for switch  $S_i$  is given as :

$$\mathbf{E}_{i} = [\Delta R_{i1}, \dots, \Delta R_{i(i-1)}, \Delta R_{i(i+1)}, \dots, \Delta R_{i6}, \dots, \dots, \Delta R_{11}, \dots, \Delta R_{66}]$$
(5.21)

where, normalized relative  $R_{ds-on}$  deviation between switches  $S_i$ ,  $S_j$  is given as  $\Delta R_{ij} = R_i - R_j - (R_i^{t_0} - R_j^{t_0}), i \neq j$ .  $\Delta R_{ii} = R_i - R_i^{t_0}$  represents self deviation of switch  $S_i$ 's  $R_{ds-on}$  from its zero-time  $(t_0)$  baseline value,  $R_{ii}^{t_0}$ .

Consequently, the Bayesian inference equation for switch  $S_i$  is given as:

$$P(F_i|\mathbf{E}_i) = \frac{p(\mathbf{E}_i|F_i)P(F_i)}{p(\mathbf{E}_i)}$$
(5.22)

To obtain an accurate SoH estimate from (5.22), it is important to model the *likelihood* function  $p(\mathbf{E}_i|F_i)$  accurately. The choice of initial *prior*,  $P(F_i)$ , isn't particularly crucial since Bayesian inference is recursive and eventually converges to the correct estimate. The *evidence*,  $p(\mathbf{E}_i)$ , can be calculated by knowing the likelihood function and is given as  $p(\mathbf{E}_i) = p(\mathbf{E}_i|F_i)P(F_i) + p(\mathbf{E}_i|\neg F_i)P(\neg F_i)$ .

#### 5.3.3 Modelling Likelihood Function

Applying the chain rule of probability to the likelihood function in (5.22) gives

$$p(\mathbf{E}_i|F_i) \approx \prod_{\substack{j=1\\i\neq j}}^{6} p(\Delta R_{ij}|F_i) \prod_{k=1}^{6} p(\Delta R_{kk}|F_i)$$
(5.23)

The naive Bayesian formulation in (5.23) assumes that the values of the feature vector,  $\mathbf{E}_i$ , are conditionally independent. For example,  $p(\Delta R_{12}|\Delta R_{13}, F_i) = p(\Delta R_{12}|F_i)$ . For the chosen feature vector, this is not entirely true for most failure cases. However, this is not a strict assumption and naive Bayesian models are known to perform well even with some non-independence. Moreover, the marginal loss in model accuracy is traded-off for the significantly simpler implementation of the naive Bayesian formulation. In (5.23), although  $F_i$  is a discrete variable representing switch SoH, variables  $\Delta R_{ij}$ ,  $(1 \le i, j \le 6)$  are continuous. Therefore,  $p(\Delta R_{ij}|F_i)$  represents the conditional probability for the continuous variable  $\Delta R_{ij}$ under the discrete condition  $F_i$ . To obtain a value for  $p(\Delta R_{ij}|F_i)$ , it is first necessary to consider the different failure scenarios that may occur in the inverter.

As mentioned in Section 5.1, SiC MOSFETs may fail due to gate-oxide degradation or package-level failure. While these degradation mechanisms occur simultaneously during switch operation, a critical "failure" event would only occur due to the relatively dominant mechanism. For the gate-oxide degradation mode, failure is defined as a permanent  $R_{ds-on}$  shift exceeding a threshold corresponding to a predetermined  $V_{th}$  shift. For package degradation mode, failure occurs either due to a bond-wire liftoff event which causes a jump in the switch  $R_{ds-on}$  or due to a gradual permanent  $R_{ds-on}$  shift caused by die-attach solder delamination. In this study, however, the Bayesian model only estimates an overall probability of failure for a particular switch considering all failure modes but does not classify among them. Although possible, such classification would need a complex classifier model, and online implementation of it may be infeasible due to processing constraints. Subject to the above failure mechanisms, given  $F_i$  for switch  $S_i$ , and switch combination  $(S_i, S_j)$ , three different failure scenarios are considered- 1) only switch  $S_i$  has failed due to wire-bond liftoff and  $S_j$  is healthy, 2) both switches  $S_i$ ,  $S_j$  have failed due to wire-bond liftoff, and 3) switch  $S_i$  and/or  $S_j$  have failed due to permanent  $R_{ds-on}$  shift exceeding the predetermined threshold. By definition, the three failure scenarios represent the complete sample space and are disjoint events. Therefore,  $P(F_i) = P(F_i^1) + P(F_i^2) + P(F_i^3)$ , where  $P(F_i^n)$ , (n = 1, 2, 3) represents the probability of the  $n^{\text{th}}$  failure scenario. Therefore, the likelihood function from (5.23) may be rewritten as

$$p(\mathbf{E}_{i}|F_{i}) = \prod_{\substack{j=1\\i\neq j}}^{6} \sum_{n=1}^{3} p(\Delta R_{ij}|F_{i}^{n}) P(F_{i}^{n}|F_{i}) \times \prod_{k=1}^{6} \sum_{n=1}^{3} p(\Delta R_{kk}|F_{i}^{n}) P(F_{i}^{n}|F_{i})$$
(5.24)

In (5.24), to calculate  $p(\Delta R_{ij}|F_i^n)$ ,  $(1 \le i, j \le 6; n = 1, 2, 3)$ , the conditional probability density functions  $p(\Delta R_{ij}|F_i^n)$ ,  $(1 \le i, j \le 6; n = 1, 2, 3)$  are assumed to be normal distributions with a certain mean and variance  $(\mathcal{N}(\mu, \sigma^2))$ . With this reasonable assumption, by only storing the mean  $(\mu)$  and standard deviation  $(\sigma)$  for each function, the likelihood values corresponding to the entire feature set can be efficiently calculated online. Fig. 5.9 shows the choice for mean and variance values for the different failure scenarios and feature variables. Since all switches are identical, the same mean and variance are used for the same function type irrespective of the actual switch parameters. As seen, the variations in the switches are accounted into the variance parameter for the different functions.

# 5.4 Experimental Verification and Discussion

The results from experimental verification of the proposed online  $R_{ds-on}$  and SoH estimation solutions are presented in this section. The picture of the gate driver board developed for this study is shown in Fig. 5.10(a). Each gate driver board drives one inverter phase leg. The fully assembled inverter setup is shown in Fig. 5.10(b). 1200 V/20 A discrete SiC MOSFETs in the TO-247 package are used as the switching device. For this study, the system is tested at



Figure 5.9: Continuous conditional probability distributions for the feature variables for different failure scenarios.

a scaled-down power level with 230 V DC bus voltage and a PMSM motor as the load. First, the online  $R_{ds-on}$  measurement solution is verified and thereafter, the proposed Bayesian SoH estimation solution is validated.



Figure 5.10: Picture of a) gate driver board for single leg with on-board  $V_{ds}$  measurement circuit b) fully assembled  $3\phi$  inverter setup.

# 5.4.1 Verification of Online R<sub>ds-on</sub> Measurement

For the first test,  $R_{ds-on}$  is measured for only one switch. 300 data points each are logged for  $I_d$ ,  $V_{ds}$  and  $R_{ds-on}$  as shown in Fig. 5.11. It should be noted that the effective sampling time between data samples is equal to the PWM period. For the test inverter, the PWM period is 50  $\mu$ s corresponding to a 20 kHz switching frequency. Both the unfiltered and filtered  $R_{ds-on}$  values are logged and compared against a baseline reference  $R_{ds-on}$  obtained offline using an Agilent B1506A device characterizer. As mentioned previously,  $R_{ds-on}$  value at  $I_d$  peak is

| Parameter                       | Unfiltered Data | After KF      |
|---------------------------------|-----------------|---------------|
| RMS Error                       | 0.0044          | 0.0021        |
| Standard Deviation              | 0.00506         | 0.00104       |
| $R_{ds-on}$ at $I_d$ peak       | -               | 0.1754        |
| $R_{ds-on}$ error at $I_d$ peak | -               | 0.0021(1.21%) |

Table 5.2: Statistical Analysis of Online  $R_{ds-on}$  Measurement Data

ideal for OCM purposes due to better accuracy. Therefore,  $R_{ds-on}$  for a given measurement interval is obtained by taking the mean of 10 data points around the peak. The statistical comparison of the filtered data against unfiltered values and the baseline reference is shown in Table 5.2. The RMS error for the entire sample set is >50% lower for the filtered data compared to unfiltered  $R_{ds-on}$ . Filtering also significantly reduces the standard deviation as it smooths out the measurement noise. The error in peak  $R_{ds-on}$  value w.r.t the baseline is ~1.2%. In addition to characterizing the measurement solution's accuracy, the objective of this test is also to determine the optimal data sample size for OCM. To determine this, the Kalman filter's  $R_{ds-on}$  estimate uncertainty is plotted against the data index. It is seen that the filter converges in < 20 data points. Therefore, in order to obtain accurate  $R_{ds-on}$  values around the  $I_d$  peak, a sample size of 50 data points centered at the  $I_d$  peak is used in this study.

The online  $R_{ds-on}$  measurement results for all six switches are shown in Fig. 5.12. As determined above, 50 data points are logged for each switch. The shown results validate the phase reference generation and the out-of-order equivalent time sampling technique as the current passing through each switch during the measurement interval is approximately equal. The variation in the filtered  $R_{ds-on}$  values represents the static mismatch between the switches. The high accuracy and symmetricity of online  $R_{ds-on}$  measurement enable effective SoH estimation as presented further.



Figure 5.11: Experimental results for on-board  $R_{ds-on}$  measurement for single switch.

# 5.4.2 Verification of Bayesian SoH Estimation Solution

The proposed Bayesian solution is useful in isolating load and temperature related  $R_{ds-on}$  changes from aging-related change to estimate the switches' SoH. In order to validate this, first a variable load test is set up. The results from online  $R_{ds-on}$  measurement for all six switches for 20 different load steps are shown in Fig. 5.13. 50 data points are measured for each load condition. Although Fig. 5.13 shows the full dataset, as previously mentioned, the  $R_{ds-on}$  at the middle of the sampling window, corresponding to the current peak, is fed into the Bayesian SoH algorithm.



Figure 5.12: Experimental results for on-board  $R_{ds-on}$  measurement for six switches.

Before verifying the SoH solution itself, the choice of the input features is validated. In the proposed algorithm, the load's effect on a given switch's  $R_{ds-on}$  is compensated for by considering the relative  $R_{ds-on}$  deviation w.r.t to all the other switches. Given that the inverter's operation is symmetrical, the relative  $R_{ds-on}$  deviation between switch pairs should be smaller than a switch's own  $R_{ds-on}$  deviation from its zero-time ( $t_0$ ) baseline. This is verified in Fig. 5.14 where, for each switch, the RMS normalized deviation between switch pairs is compared against self deviation from baseline. It is clearly seen that for every switch the RMS normalized deviation from other switches is significantly smaller than self-deviation



Figure 5.13: Experimental results for on-board  $R_{ds-on}$  measurement for six switches under 20 step load variation with maximum load current = 2.8 A.

in the majority of the load cases. It should be noted that  $t_0$  values are typically measured at the very start of the system's life. Moreover, by design, the measurement conditions at  $t_0$ characterization will not meaningfully affect the efficacy of the solution.

To test the proposed solution under realistic failure scenarios, a simple failure induction technique is used. As shown in Fig. 5.15(a) and 5.15(b), the SiC MOSFET used in this study is locally decapsulated to expose the die and source bond wires. Two such devices are used as switches  $S_3$  and  $S_4$ . First, the 20 step loading test as described above is performed to

| Failure Scenario | $\operatorname{pdf}$                    | Mean $(\mu)$ | SD $(\sigma)$ |
|------------------|-----------------------------------------|--------------|---------------|
| $2^*F_i^1$       | $p(\Delta R_{ij} F_i), i \neq j$        | 0.0035       | 0.009         |
|                  | $p(\Delta R_{ii} F_i)$                  | 0.00035      | 0.012         |
| $2^*F_i^2$       | $p(\Delta R_{ij} F_i), i \neq j$        | 0            | 0.009         |
|                  | $p(\Delta R_{ii} F_i)$                  | 0.0035       | 0.012         |
| $2*F_i^3$        | $p(\Delta R_{ij} F_i), i \neq j$        | 0            | 0.012         |
|                  | $p(\Delta R_{ii} F_i)$                  | 0.02         | 0.018         |
| $2*\neg F_i$     | $p(\Delta R_{ij}   \neg F_i), i \neq j$ | 0.0          | 0.0075        |
|                  | $p(\Delta R_{ii}   \neg F_i)$           | 0.0          | 0.016         |

 Table 5.3: Bayesian Inference Model Parameters

obtain the healthy case's data. Thereafter, one of the source bond-wires of switches  $S_3$  and  $S_4$  is cut to emulate a bond-wire liftoff scenario as shown in Fig. 5.15(b). The loading test is repeated with the failed switches under the same testing conditions. Given all the other test variables are kept constant, the difference in  $R_{ds-on}$  of switches  $S_3$  and  $S_4$  reflects the increase due to bond-wire liftoff. The comparison between  $R_{ds-on}$  under healthy and failed conditions is shown in Fig. 5.16. A consistent 3-5 m $\Omega$   $R_{ds-on}$  shift is seen in both switches. These observed values are consistent with readings obtained from the device characterizer before and after cutting the bond-wire. For testing the online model, since bond-wire liftoff cannot be emulated during testing, the data obtained from failed device testing is overlaid on the healthy data at the intended instance of failure. Given the same testing conditions, this technique emulates failure with reasonably good accuracy.

Based on results obtained in Fig. 5.14 and 5.16, the mean and standard deviation values for the Bayesian inference model are chosen as given in Table 5.3. Figure 5.9 provides a general reference for choosing the specific values. The mean shift due to bond-wire liftoff is considered as 0.035 m $\Omega$ . For the standard deviations, the empirically observed deviations are relaxed to account for system-level variations. Since the bond-wire resistance ( $R_{Bond}$ ) is a very small percentage of the overall switch resistance, the threshold for permanent  $R_{ds-on}$ 



Figure 5.14: Comparison of  $R_{ds-on}$  deviation in switches from own baseline vs RMS deviation from other switches due to load.

deviation for  $V_{th}$  related shift is given a relatively large value. At ~10-20% of  $R_{ds-on}^{t_0}$ , it represents a typical  $R_{ds-on}$  change due to oxide degradation related permanent  $V_{th}$  shift [80].

With the given parameters, the Bayesian SoH solution is tested for four cases- 1) when all switches are healthy, 2) switch  $S_3$  has a failure, 3) switch  $S_4$  has a failure, and 4) both switches  $S_3$  and  $S_4$  have bond-wire failures. For cases with failure, the failure is induced from the 7<sup>th</sup> load cycle. The experimental results from the online Bayesian SoH solution are shown in Fig. 5.17.  $P(F_i)$  indicates the probability of failure of switch  $S_i$ . Under a



Figure 5.15: Picture of locally decapsulated discrete SiC MOSFET a) under healthy condition b) with one source bond-wire broken.



Figure 5.16:  $R_{ds-on}$  shift in switches  $S_3, S_4$  under load due to single bond-wire liftoff.

healthy scenario,  $P(F_i)$  for all switches is 0. When  $S_3$  or  $S_4$  fails, the corresponding failure probability rises to 1 while the other switches' P(F) remains 0. Lastly, when both  $S_3$  and  $S_4$ fail, the corresponding failure probabilities of both switches rise to 1 and all other switches remain 0. These results highlight that the proposed solution detects device failure reliably while being robust to false positive scenarios. Moreover, the proposed solution is able reliably distinguish failure events from load related changes even when  $R_{ds-on}$  change due to failure is of the order of 2-4% of nominal values. It is observed that however, the failure probability of  $S_4$  rises more slowly than  $S_3$ . This is most likely due to the smaller  $R_{ds-on}$  shift in  $S_4$ . Moreover, 7 – 8 measurement cycles correspond to only several hundred seconds at most in



Figure 5.17: Experimental results from proposed Bayesian SoH estimation solution under different failure scenarios.

real-time with the relatively constant load during any given measurement cycle being the only constraint. Therefore, this delay is not significant from a condition monitoring standpoint.

The proposed Bayesian SoH estimation algorithm is also verified for relatively high power operation scenario. Figure 5.18 shows on on-board  $R_{ds-on}$  measurement results for 20-step load variation where the maximum switch current is 12.9 A. Here, the results are shown for cases where  $S_3$ ,  $S_4$  have single bond-wire liftoff. While  $S_4$  shows a prominent  $R_{ds-on}$  w.r.t to the remaining devices. On the other hand,  $S_3$ 's  $R_{ds-on}$  after fault is similar to the other devices. This may be due to initial differences and slight operational asymmetry. The results of the Bayesian SoH estimation solution for the high current operational scenario are shown in Fig. 5.19. The algorithm's parameters are similar to values shown listed in Table 5.3. Even for this case, the algorithm shows high sensitivity, specificity, and robustness against false positive detection. This is highlighted by the fact that although S3's post-fault  $R_{ds-on}$  is similar to other healthy devices, the proposed SoH estimation algorithm is able to identify a fault in S3 as shown in Fig. 5.17 and 5.19. Furthermore, importantly, the proposed solution is general and accounts for system and device variations in the model itself without requiring special calibration. Therefore, depending on the actual values of a given device and system operating conditions, there may be some difference in the response rate of the SoH estimator.

#### 5.5 Conclusion

This chapter presents a practical end-to-end solution for switch conditioning monitoring for traction inverters. A simple gate drive integrated on-state voltage measurement circuit is proposed. For simplicity, the proposed solution uses a gate driver with integrated ADC. However, it is also feasible to implement the same circuit with discrete components. An out-of-order equivalent time sampling-based data acquisition technique is proposed to address the practical challenge of acquiring data from the measurement circuit without affecting the motor control function. Lastly, a Kalman filter stage is used to filter the noisy measurement data. The overall solution ensures that the online measurement error at the current peak is <1.5%.

This work also proposes a Bayesian inference based SoH estimation solution. The input features to the proposed algorithm are chosen to exploit the inherently symmetrical nature


Figure 5.18: Experimental results for on-board  $R_{ds-on}$  measurement for six switches at maximum load current = 12.9 A. Switches S3, S4 have single bond-wire liftoff.

of the 3-Ph inverter's operation. This enables the elimination of operating point related changes in  $R_{ds-on}$  and obtains the probability of failure for each of the inverter's switches. In this study, the state vector PWM (SVPWM) technique was used. However, the proposed algorithm is not limited to SVPWM and may be used with other modulation techniques such as sinusoidal PWM (SPWM) and discontinuous PWM (DPWM). Fundamentally, for the proposed algorithm to be effective, it is only necessary that the inverter's operation is symmetrical.



Figure 5.19: Experimental results from proposed Bayesian SoH estimation solution under different failure scenarios at maximum load current = 12.9 A.

#### CHAPTER 6

# CONCLUSIONS, CONTRIBUTIONS, AND FUTURE WORK

#### 6.1 Conclusions and Contributions

This dissertation addresses challenges in testing and on-board condition monitoring of SiC MOSFETs. Practicality and actual implementation of the proposed solutions have been key focuses of the presented work. In this context, the major contributions of this dissertation are listed below-

- 1. This dissertation presents a fully modular, highly-scalable DC power cycling (DC PC) test bench architecture. The presented architecture address the fundamental trade-off between the number of devices that can be cycled simultaneously and the ability to control their individual test conditions. This is one of the primary bottlenecks in generating large aging datasets under different aging conditions necessary to study long-term SiC device reliability. The proposed setup enables control of critical aging parameters such as junction temperature swing and tests current. An actual test bench that can simultaneously age 48 discrete SiC MOSFETs are fabricated and commissioned. Most notably, a number of tests run on the developed test bench enabled the identification of gate-open failure as a potentially significant failure mode in discrete SiC MOSFETs.
- 2. The challenge of accurate on-board junction temperature estimation and control during DC power cycling is further studied in this dissertation. An improved model-based aging independent closed-loop junction temperature profile control method is presented in **Chapter 3**. Specifically, the temperature ramp rate and dwell time at the maximum junction temperature are controlled as per JEDEC JESD22-A122. The proposed solution enables accurate  $R_{ds-on}$  based junction temperature estimation and control along with

body-diode forward voltage drop  $(V_f)$  based aging correction. This study explores the idea of combining multiple electrical parameters to obtain aging-independent junction temperature estimation. Importantly, the solution is implemented and validated on a low-cost microcontroller which proves the practicality and feasibility of the proposed solution.

- 3. This dissertation, in Chapter 4, presents a systematic and comprehensive investigation of intermittent gate-open failures which is a scarcely studied failure mode in the existing literature. The work also aims to draw attention to the possibility of gate-open failure mode being a specific concern in SiC MOSFETs. It is shown that the fundamental differences between Si and SiC materials and specific device implementations may make SiC MOSFETs relatively more susceptible to gate-open failures. This work also presents a robust on-board technique for reliable cycle-by-cycle detection of gate-open faults. The proposed technique is superior to the traditional desaturation (DESAT) protection scheme in preventing a shoot-through in case of a gate-open failure. Importantly, the concept presented in this work maybe readily integrated into next-generation SiC gate drivers.
- 4. The end-to-end practical online condition monitoring solution presented in **Chapter 5**, to the best of the author's knowledge, is one of the first practical, comprehensive switch condition monitoring implemented fully on-board. The solution is developed considering the entire signal chain from the sensing circuit to the Bayesian state-of-health estimation algorithm. Critical challenges in every step are addressed. Due to the probabilistic nature of the proposed state-of-health, it does not require precise characterization and can be largely implemented with datasheet information. And, by exploiting the symmetry of the inverter's operation and considering relative deviations between switches, the solution does not require active tracking of the system's operating conditions. This

makes the proposed solution highly scalable and enables easy deployment in systems that are mass-produced.

### 6.2 Future Work

The research presented in this dissertation maybe improved, extended, and leveraged for potential future works. A few such topics are listed below.

- The improved DC power cycling architecture and junction temperature control solution can be extended to the DUT's cooling phase. This would require an adjustable rate cooling mechanism such as speed controlled fan. Modeling and control ideas from the presented work may be extended for cooling.
- 2. The study on gate-open failure mode can benefit from detailed knowledge of the material properties. Particularly, feeding accurate epoxy mold compound properties into the FEA simulations can provide higher fidelity results and enable further study of the gate-open failure mode in SiC MOSFETs.
- 3. Fundamental ideas from **Chapter 3** and **Chapter 5** maybe be leveraged to explore the possibility of using multiple precursors for accurate online junction temperature estimation of SiC MOSFETs.

#### REFERENCES

- [1] "C3M0040120K data sheet," Cree, Inc, Durham, North Carolina, USA.
- [2] J. Biela, M. Schweizer, S. Waffler, and J. W. Kolar, "SiC versus Si—evaluation of potentials for performance improvement of inverter and DC–DC converter systems by SiC power semiconductors," *IEEE Transactions on Industrial Electronics*, vol. 58, no. 7, pp. 2872–2882, 2011.
- [3] K. Suganuma, Wide Bandgap Power Semiconductor Packaging: Materials, Components, and Reliability, ser. Woodhead Publishing Series in Electronic and Optical Materials. Woodhead Publishing, 2018.
- [4] K. O. Dohnke, K. Guth, and N. Heuck, "History and Recent Developments of Packaging Technology for SiC Power Devices," in *Silicon Carbide and Related Materials 2015*, ser. Materials Science Forum, vol. 858, pp. 1043–1048. Trans Tech Publications Ltd, 6 2016.
- [5] T. Kimoto, "Bulk and Epitaxial Growth of Silicon Carbide," Progress in Crystal Growth and Characterization of Materials, vol. 62, no. 2, pp. 329–351, 2016.
- [6] O. A. Salvado, "Contribution to The Study of The SiC MOSFETs Gate Oxide," PhD Thesis, 2018.
- [7] J. Wang and X. Jiang, "Review and Analysis of SiC MOSFETs' Ruggedness and Reliability," *IET Power Electronics*, vol. 13, no. 3, pp. 445–455, 2020.
- [8] S. Mollov and F. Blaabjerg, "Condition and health monitoring in power electronics," in CIPS 2018; 10th International Conference on Integrated Power Electronics Systems, pp. 1–8, 2018.
- [9] F. Yang, E. Ugur, and B. Akin, "Evaluation of aging's effect on temperature-sensitive electrical parameters in SiC MOSFETs," *IEEE Transactions on Power Electronics*, vol. 35, DOI 10.1109/TPEL.2019.2950311, no. 6, pp. 6315–6331, 2020.
- S. S. Manson and T. J. Dolan, "Thermal Stress and Low Cycle Fatigue," Journal of Applied Mechanics, vol. 33, DOI 10.1115/1.3625225, no. 4, pp. 957–957, 12 1966.
  [Online]. Available: https://doi.org/10.1115/1.3625225
- [11] M. Held, P. Jacob, G. Nicoletti, P. Scacco, and M. Poech, "Fast power cycling test of igbt modules in traction application," in *Proceedings of Second International Conference* on Power Electronics and Drive Systems, vol. 1, pp. 425–430, 1997.
- [12] R. Bayerer, T. Herrmann, T. Licht, J. Lutz, and M. Feller, "Model for power cycling lifetime of IGBT modules - various factors influencing lifetime," in 5th International Conference on Integrated Power Electronics Systems, pp. 1–6, 2008.
- [13] H. Wang, K. Ma, and F. Blaabjerg, "Design for reliability of power electronic systems," in *IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society*, pp. 33–44, 2012.

- [14] Application Manual Power Semiconductors, Semikron Gmbh, 2015.
- [15] L. R. GopiReddy, L. M. Tolbert, and B. Ozpineci, "Power cycle testing of power switches: A literature survey," *IEEE Transactions on Power Electronics*, vol. 30, no. 5, pp. 2465–2473, 2015.
- [16] S. Ramminger, N. Seliger, and G. Wachutka, "Reliability model for al wire bonds subjected to heel crack failures," *Microelectronics Reliability*, vol. 40, DOI 10.1016/S0026-2714(00)00139-6, pp. 1521–1525, 08 2000.
- [17] T. Herrmann, M. Feller, J. Lutz, R. Bayerer, and T. Licht, "Power cycling induced failure mechanisms in solder layers," in 2007 European Conference on Power Electronics and Applications, pp. 1–7, 2007.
- [18] X. Zhang, "Failure mechanisms in wideband semiconductor power devices," Ph.D. dissertation, Univ. of Maryland, College Park, 2006. [Online]. Available: http://hdl.handle.net/1903/3653
- [19] C. J. Cochrane, P. M. Lenahan, and A. J. Lelis, "An electrically detected magnetic resonance study of performance limiting defects in sic metal oxide semiconductor field effect transistors," *Journal of Applied Physics*, vol. 109, DOI 10.1063/1.3530600, no. 1, p. 014506, 2011. [Online]. Available: https://doi.org/10.1063/1.3530600
- [20] H. Luo, F. Iannuzzo, F. Blaabjerg, M. Turnaturi, and E. Mattiuzzo, "Aging precursors and degradation effects of SiC-MOSFET modules under highly accelerated power cycling conditions," in 2017 IEEE Energy Conversion Congress and Exposition (ECCE), pp. 2506–2511, 2017.
- [21] S. Dusmez and B. Akin, "An accelerated thermal aging platform to monitor fault precursor on-state resistance," in 2015 IEEE International Electric Machines Drives Conference (IEMDC), pp. 1352–1358, 2015.
- [22] E. Ugur and B. Akin, "Aging assessment of discrete SiC MOSFETs under high temperature cycling tests," in 2017 IEEE Energy Conversion Congress and Exposition (ECCE), pp. 3496–3501, 2017.
- [23] H. Chen, B. Ji, V. Pickert, and W. Cao, "Real-time temperature estimation for power mosfets considering thermal aging effects," *IEEE Transactions on Device and Materials Reliability*, vol. 14, no. 1, pp. 220–228, 2014.
- [24] F. Erturk, E. Ugur, J. Olson, and B. Akin, "Real-time aging detection of SiC MOSFETs," *IEEE Transactions on Industry Applications*, vol. 55, no. 1, pp. 600–609, 2019.
- [25] C. Herold, J. Sun, P. Seidel, L. Tinschert, and J. Lutz, "Power cycling methods for sic mosfets," in 2017 29th International Symposium on Power Semiconductor Devices and IC's (ISPSD), pp. 367–370, 2017.

- [26] J. Li, A. Castellazzi, M. A. Eleffendi, E. Gurpinar, C. M. Johnson, and L. Mills, "A physical rc network model for electrothermal analysis of a multichip sic power module," *IEEE Transactions on Power Electronics*, vol. 33, no. 3, pp. 2494–2508, 2018.
- [27] C. Li, H. Luo, C. Li, W. Li, H. Yang, and X. He, "Online junction temperature extraction of sic power mosfets with temperature sensitive optic parameter (tsop) approach," *IEEE Transactions on Power Electronics*, vol. 34, no. 10, pp. 10143–10152, 2019.
- [28] F. Yang, E. Ugur, B. Akin, and G. Wang, "Design methodology of DC power cycling test setup for SiC MOSFETs," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, pp. 1–1, 2019.
- [29] I2C-bus specification and user manual, NXP Semiconductor, 2014.
- [30] B. T. Vankayalapati, S. Pu, F. Yang, M. Farhadi, V. Gurusamy, and B. Akin, "Investigation and on-board detection of gate-open failure in sic mosfets," *IEEE Transactions* on Power Electronics, vol. 37, DOI 10.1109/TPEL.2021.3125026, no. 4, pp. 4658–4671, 2022.
- [31] Y. Mukunoki, T. Horiguchi, Y. Nakayama, A. Nishizawa, Y. Nakamura, K. Konno, M. Kuzumoto, and H. Akagi, "Modeling of a silicon-carbide mosfet with focus on internal stray capacitances and inductances, and its verification," in 2017 IEEE Applied Power Electronics Conference and Exposition (APEC), DOI 10.1109/APEC.2017.7931076, pp. 2671–2677, 2017.
- [32] M. Farhadi, F. Yang, S. Pu, B. T. Vankayalapati, and B. Akin, "Temperature-independent gate-oxide degradation monitoring of sic mosfets based on junction capacitances," *IEEE Transactions on Power Electronics*, vol. 36, DOI 10.1109/TPEL.2021.3049394, no. 7, pp. 8308–8324, 2021.
- [33] E. Ugur, C. Xu, F. Yang, S. Pu, and B. Akin, "A new complete condition monitoring method for SiC power MOSFETs," *IEEE Transactions on Industrial Electronics*, vol. 68, no. 2, pp. 1654–1664, 2020.
- [34] M. Chinthavali, B. Ozpineci, and L. Tolbert, "Temperature-dependent characterization of sic power electronic devices," in *Power Electronics in Transportation (IEEE Cat. No.04TH8756)*, DOI 10.1109/PET.2004.1393790, pp. 43–47, 2004.
- [35] J. A. O. González and O. Alatise, "A novel non-intrusive technique for bti characterization in sic mosfets," *IEEE Transactions on Power Electronics*, vol. 34, DOI 10.1109/TPEL.2018.2870067, no. 6, pp. 5737–5747, 2019.
- [36] B. T. Vankayalapati, F. Yang, S. Pu, M. Farhadi, and B. Akin, "A highly scalable, modular test bench architecture for large-scale dc power cycling of SiC mosfets: Towards data enabled reliability," *IEEE Power Electronics Magazine*, vol. 8, DOI 10.1109/MPEL.2020.3047668, no. 1, pp. 39–48, 2021.

- [37] J. O. Gonzalez, R. Wu, S. Jahdi, and O. Alatise, "Performance and reliability review of 650 v and 900 v silicon and sic devices: Mosfets, cascode jfets and igbts," *IEEE Transactions on Industrial Electronics*, vol. 67, no. 9, pp. 7375–7385, 2020.
- [38] T. B. Soeiro, E. Mengotti, E. Bianda, and G. Ortiz, "Performance evaluation of the body-diode of sic mosfets under repetitive surge current operation," in *IECON 2019 -*45th Annual Conference of the IEEE Industrial Electronics Society, vol. 1, pp. 5154–5159, 2019.
- [39] D. A. Gajewski, B. Hull, D. J. Lichtenwalner, S. Ryu, E. Bonelli, H. Mustain, G. Wang, S. T. Allen, and J. W. Palmour, "Sic power device reliability," in 2016 IEEE International Integrated Reliability Workshop (IIRW), pp. 29–34, 2016.
- [40] S. Baba, A. Gieraltowski, M. T. Jasinski, F. Blaabjerg, A. S. Bahman, and M. Zelechowski, "Active power cycling test bench for sic power mosfets - principles, design and implementation," *IEEE Transactions on Power Electronics*, pp. 1–1, 2020.
- [41] U. Choi, F. Blaabjerg, S. Jørgensen, F. Iannuzzo, H. Wang, C. Uhrenfeldt, and S. Munk-Nielsen, "Power cycling test and failure analysis of molded intelligent power igbt module under different temperature swing durations," *Microelectronics Reliability*, vol. 64, pp. 403 – 408, 2016.
- [42] S. Dusmez, S. H. Ali, M. Heydarzadeh, A. S. Kamath, H. Duran, and B. Akin, "Aging precursor identification and lifetime estimation for thermally aged discrete package silicon power switches," *IEEE Transactions on Industry Applications*, vol. 53, no. 1, pp. 251–260, 2017.
- [43] C. Delepaut, S. Siconolfi, O. Mourra, and F. Tonicello, "MOSFET gate open failure analysis in power electronics," in 2013 Twenty-Eighth Annual IEEE Applied Power Electronics Conference and Exposition (APEC), pp. 189–196, 2013.
- [44] N. C. Remo and J. C. M. Fernandez, "A reliable failure analysis methodology in analyzing the elusive gate-open failures," in *Proceedings of the 12th International Symposium on* the Physical and Failure Analysis of Integrated Circuits, 2005. IPFA 2005., pp. 185–189, 2005.
- [45] C. Chen, F. Luo, and Y. Kang, "A review of sic power module packaging: Layout, material system and integration," CPSS Transactions on Power Electronics and Applications, vol. 2, DOI 10.24295/CPSSTPEA.2017.00017, no. 3, pp. 170–186, 2017.
- [46] H. Lee, V. Smet, and R. Tummala, "A review of sic power module packaging technologies: Challenges, advances, and emerging issues," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 8, DOI 10.1109/JESTPE.2019.2951801, no. 1, pp. 239–255, 2020.
- [47] T. Ziemann, U. Grossner, and J. Neuenschwander, "Power cycling of commercial sic mosfets," in 2018 IEEE 6th Workshop on Wide Bandgap Power Devices and Applications (WiPDA), pp. 24–31, 2018.

- [48] Y. Yao, G.-Q. Lu, D. Boroyevich, and K. D. T. Ngo, "Survey of high-temperature polymeric encapsulants for power electronics packaging," *IEEE Transactions on Components*, *Packaging and Manufacturing Technology*, vol. 5, DOI 10.1109/TCPMT.2014.2337300, no. 2, pp. 168–181, 2015.
- [49] Z. Li, H. Chen, H. Fan, and J. Yang, "Optimization of epoxy molding compound to enhance the solder joints robustness during thermal cycling for a clip bond power package," in 2020 21st International Conference on Electronic Packaging Technology (ICEPT), DOI 10.1109/ICEPT50128.2020.9201939, pp. 1–4, 2020.
- [50] R. Amro, J. Lutz, and A. Lindemann, "Power cycling with high temperature swing of discrete components based on different technologies," in 2004 IEEE 35th Annual Power Electronics Specialists Conference (IEEE Cat. No.04CH37551), vol. 4, pp. 2593–2598 Vol.4, 2004.
- [51] J. Lutz, H. Schlangenotto, U. Scheuermann, and R. De Doncker, Semiconductor Power Devices-Physics, Characteristics, Reliability, 1st ed., vol. 1, ch. 11, pp. 399–401. Springer, 2010.
- [52] S. Manoharan, N. M. J. Li, C. Patel, S. Hunter, and P. McCluskey, "Mechanics of copper wire bond failure due to thermal fatigue," in 2018 IEEE 20th Electronics Packaging Technology Conference (EPTC), DOI 10.1109/EPTC.2018.8654436, pp. 874–881, 2018.
- [53] S. Yin, K. J. Tseng, P. Tu, R. Simanjorang, and A. K. Gupta, "Design considerations and comparison of high-speed gate drivers for si igbt and sic mosfet modules," in 2016 IEEE Energy Conversion Congress and Exposition (ECCE), DOI 10.1109/ECCE.2016.7855013, pp. 1–8, 2016.
- [54] Designing With the C2000 Configurable Logic Block, Texas Instruments Inc., Dallas, TX, 2019, sPRACL3.
- [55] J. Sun, H. Xu, X. Wu, and K. Sheng, "Comparison and analysis of short circuit capability of 1200v single-chip sic mosfet and si igbt," in 2016 13th China International Forum on Solid State Lighting: International Forum on Wide Bandgap Semiconductors China (SSLChina: IFWS), DOI 10.1109/IFWS.2016.7803752, pp. 42–45, 2016.
- [56] "Understanding the short circuit protection for silicon carbide mosfets," Texas Instruments Inc, Dallas, Texas.
- [57] C. S. Goli, S. Essakiappan, P. Sahu, M. Manjrekar, and N. Shah, "Review of recent trends in design of traction inverters for electric vehicle applications," in 2021 IEEE 12th International Symposium on Power Electronics for Distributed Generation Systems (PEDG), DOI 10.1109/PEDG51384.2021.9494164, pp. 1–6, 2021.
- [58] J. Wang and J. Xi, "Review and analysis of SiC mosfets' ruggedness and reliability," *IET Power Electronics*, vol. 13, DOI 10.1049/iet-pel.2019.0587, 08 2019.

- [59] T. Santini, M. Sebastien, M. Florent, L.-V. Phung, and B. Allard, "Gate oxide reliability assessment of a SiC mosfet for high temperature aeronautic applications," in 2013 IEEE ECCE Asia Downunder, DOI 10.1109/ECCE-Asia.2013.6579125, pp. 385–391, 2013.
- [60] E. Ugur, F. Yang, S. Pu, S. Zhao, and B. Akin, "Degradation assessment and precursor identification for SiC mosfets under high temp cycling," *IEEE Transactions on Industry Applications*, vol. 55, DOI 10.1109/TIA.2019.2891214, no. 3, pp. 2858–2867, 2019.
- [61] S. H. Ali, X. Li, A. S. Kamath, and B. Akin, "A simple plug-in circuit for igbt gate drivers to monitor device aging: Toward smart gate drivers," *IEEE Power Electronics Magazine*, vol. 5, DOI 10.1109/MPEL.2018.2849653, no. 3, pp. 45–55, 2018.
- [62] X. Jiang, J. Wang, H. Yu, J. Chen, Z. Zeng, X. Yang, and Z. J. Shen, "Online junction temperature measurement for SiC mosfet based on dynamic threshold voltage extraction," *IEEE Transactions on Power Electronics*, vol. 36, DOI 10.1109/TPEL.2020.3022390, no. 4, pp. 3757–3768, 2021.
- [63] P. Wang, J. Zatarski, A. Banerjee, and J. Donnal, "Condition monitoring of SiC MOSFETs utilizing gate leakage current," in 2020 IEEE Applied Power Electronics Conference and Exposition (APEC), DOI 10.1109/APEC39645.2020.9124394, pp. 1837– 1843, 2020.
- [64] S. Pu, E. Ugur, F. Yang, and B. Akin, "In situ degradation monitoring of SiC mosfet based on switching transient measurement," *IEEE Transactions on Industrial Electronics*, vol. 67, DOI 10.1109/TIE.2019.2924600, no. 6, pp. 5092–5100, 2020.
- [65] M. Nitzsche, C. Cheshire, M. Fischer, J. Ruthardt, and J. Roth-Stielow, "Comprehensive comparison of a SiC MOSFET and si igbt based inverter," in *PCIM Europe 2019*; *International Exhibition and Conference for Power Electronics, Intelligent Motion*, *Renewable Energy and Energy Management*, pp. 1–7, 2019.
- [66] T. A. Polom, C. van der Broeck, R. W. De Doncker, and R. D. Lorenz, "Real-time, in situ degradation monitoring in power semiconductor converters," in 2019 IEEE Applied Power Electronics Conference and Exposition (APEC), DOI 10.1109/APEC.2019.8721825, pp. 2720–2727, 2019.
- [67] B. Ji, V. Pickert, W. P. Cao, and L. Xing, "Onboard condition monitoring of solder fatigue in igbt power modules," in 2013 9th IEEE International Symposium on Diagnostics for Electric Machines, Power Electronics and Drives (SDEMPED), DOI 10.1109/DEMPED.2013.6645690, pp. 9–15, 2013.
- [68] W. Lai, Y. Zhao, M. Chen, Y. Wang, X. Ding, S. Xu, and L. Pan, "Condition monitoring in a power module using on-state resistance and case temperature," *IEEE Access*, vol. 6, DOI 10.1109/ACCESS.2018.2879314, pp. 67108–67117, 2018.
- [69] S. Dusmez, H. Duran, and B. Akin, "Remaining useful lifetime estimation for thermally stressed power mosfets based on On-State Resistance variation," *IEEE Transactions on Industry Applications*, vol. 52, DOI 10.1109/TIA.2016.2518127, no. 3, pp. 2554–2563, 2016.

- [70] F. Gonzalez-Hernando, J. San-Sebastian, A. Garcia-Bediaga, M. Arias, F. Iannuzzo, and F. Blaabjerg, "Wear-out condition monitoring of IGBT and MOSFET power modules in inverter operation," *IEEE Transactions on Industry Applications*, vol. 55, DOI 10.1109/TIA.2019.2935985, no. 6, pp. 6184–6192, 2019.
- [71] S. Beczkowski, P. Ghimre, A. R. de Vega, S. Munk-Nielsen, B. Rannestad, and P. Thøgersen, "Online vce measurement method for wear-out monitoring of high power igbt modules," in 2013 15th European Conference on Power Electronics and Applications (EPE), pp. 1–7. IEEE, 2013.
- [72] Y. Peng and H. Wang, "A simplified on-state voltage measurement circuit for power semiconductor devices," *IEEE Transactions on Power Electronics*, vol. 36, DOI 10.1109/TPEL.2021.3070698, no. 10, pp. 10993–10997, 2021.
- [73] B. Yu, L. Wang, and D. Ahmed, "Drain-source voltage clamp circuit for online accurate on-state resistance measurement of SiC MOSFETs in dc solid-state power controller," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 8, DOI 10.1109/JESTPE.2019.2954038, no. 1, pp. 331–342, 2020.
- [74] M. Guacci, D. Bortis, and J. W. Kolar, "On-state voltage measurement of fast switching power semiconductors," CPSS Transactions on Power Electronics and Applications, vol. 3, DOI 10.24295/CPSSTPEA.2018.00016, no. 2, pp. 163–176, 2018.
- [75] S. Dusmez, M. Bhardwaj, L. Sun, and B. Akin, "In situ condition monitoring of high-voltage discrete power MOSFET in boost converter through software frequency response analysis," *IEEE Transactions on Industrial Electronics*, vol. 63, DOI 10.1109/TIE.2016.2595482, no. 12, pp. 7693–7702, 2016.
- [76] M. Bhardwaj, S. Choudhury, R. Poley, and B. Akin, "Online frequency response analysis: A powerful plug-in tool for compensation design and health assessment of digitally controlled power converters," *IEEE Transactions on Industry Applications*, vol. 52, DOI 10.1109/TIA.2016.2522951, no. 3, pp. 2426–2435, 2016.
- [77] Q. Zhang, Y. Yang, and P. Zhang, "A novel method for monitoring the junction temperature of SiC MOSFET on-line based on on-state resistance," in 2019 22nd International Conference on Electrical Machines and Systems (ICEMS), DOI 10.1109/ICEMS.2019.8922346, pp. 1–5, 2019.
- [78] F. Stella, G. Pellegrino, E. Armando, and D. Daprà, "On-line temperature estimation of SiC power MOSFET modules through on-state resistance mapping," in 2017 IEEE Energy Conversion Congress and Exposition (ECCE), DOI 10.1109/ECCE.2017.8096976, pp. 5907-5914, 2017.
- [79] UCC5870-Q1 30-A Isolated IGBT/SiC MOSFET Gate Driver with Advanced Protection Features for Automotive Applications, Texas Instruments, Sep. 2021, rev. C.

[80] D. Peters, T. Basler, B. Zippelius, T. Aichinger, W. Bergner, R. Esteve, D. Kueck, and R. Siemieniec, "The new coolsic<sup>™</sup> trench MOSFET technology for low gate oxide stress and high performance," in *PCIM Europe 2017; International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management*, pp. 1–7, 2017.

### **BIOGRAPHICAL SKETCH**

Bhanu Teja Vankayalapati (student member, IEEE) received a bachelor's degree (B.Tech) in electrical engineering and a master's degree (M.Tech) in power electronics from the Indian Institute of Technology (IIT-BHU), Varanasi, India in 2017. He is currently working toward a PhD degree in Wide Band Gap (WBG) semiconductor reliability with The University of Texas at Dallas, Richardson, Texas, United States.

From 2017 to 2018, he was a Project Engineer with the Indian Institute of Technology, Kanpur, India, and worked on high density GaN Switch Mode Power Supply (SMPS) project for Indian Space Research Organization (ISRO). His research areas include WBG device applications, WBG reliability, converter design, and embedded control.

# CURRICULUM VITAE

# Bhanu Teja Vankayalapati

Dec'22

# EDUCATION

PhD, Electrical Engineering The University of Texas at Dallas, United States Condition Monitoring Techniques for SiC MOSFETs

Master of Technology + Bachelor of Technology, Electrical Engineering Jun'17 5-Year Integrated Dual Degree Course, IIT(BHU), Varanasi, India Focus: Power Electronics

### **RESEARCH INTERESTS**

- Wide band-gap power electronics system design, simulation, and modelling.
- Low-cost, high accuracy, high speed sensing, and data acquistion solutions for on-board measurements.
- Code efficient embedded software algorithms.

# PUBLICATIONS

# Peer-Reviewed Journals

- (J1): B. T. Vankayalapati, S. Pu, F. Yang, M. Farhadi, V. Gurusamy, and B. Akin, "Investigation and On-Board Detection of Gate-Open Failure in SiC MOSFETs," in *IEEE Transactions on Power Electronics*, vol. 37, no. 4, pp. 4658-4671, April 2022.
- (J2): B. T. Vankayalapati, M. Farhadi, R. Sajadi, H. Tan, and B. Akin, "A Practical Switch Condition Monitoring Solution for SiC Traction Inverters," in *IEEE Journal of Emerging and Selected Topics in Power Electronics*, 2022
- (J3): B. T. Vankayalapati, F. Yang, S. Pu, M. Farhadi and B. Akin, "A Highly Scalable, Modular Test Bench Architecture for Large-Scale DC Power Cycling of SiC MOSFETs: Towards Data Enabled Reliability," in IEEE Power Electronics Magazine, vol. 8, no. 1, pp. 39-48, March 2021.
- (J4): C. Xu, B. T. Vankayalapati, F. Yang and B. Akin, "A Reconfigurable AC Power Cycling Test Setup for Comprehensive Reliability Evaluation of GaN HEMTs," in *IEEE Transactions on Industry Applications*, 2022.
- (J5): C. Li, B. Vankayalapati, B. Akin and Z. Yu, "Analysis and Compensation of Sigma-Delta ADC Latency for High Performance Motor Control and Diagnosis," in *IEEE Transactions on Industry Applications*, 2022.

- (J6): S. Pu, F. Yang, N. Zhang, B. T. Vankayalapati and B. Akin, "A Comparative Study on Reliability and Ruggedness of Kelvin and Non-Kelvin Packaged SiC Mosfets," in *IEEE Transactions on Industry Applications*, vol. 58, no. 3, pp. 3863-3874, May-June 2022.
- (J7): S. Pu, F. Yang, B. T. Vankayalapati and B. Akin, "Aging Mechanisms and Accelerated Lifetime Tests for SiC MOSFETs: An Overview," in *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 10, no. 1, pp. 1232-1254, Feb. 2022.
- (J8): A. Sarkar, B. T. Vankayalapati and S. Anand, "GaN-Based Multiple Output Flyback Converter With Independently Controlled Outputs," in *IEEE Transactions on Industrial Electronics*, vol. 69, no. 3, pp. 2565-2576, March 2022.
- (J9): B. T. Vankayalapati, F. Yang, S. Pu, M. Farhadi and B. Akin, "A Highly Scalable, Modular Test Bench Architecture for Large-Scale DC Power Cycling of SiC MOSFETs: Towards Data Enabled Reliability," in *IEEE Power Electronics Magazine*, vol. 8, no. 1, pp. 39-48, March 2021.
- (J10): M. Farhadi, F. Yang, S. Pu, B. T. Vankayalapati and B. Akin, "Temperature-Independent Gate-Oxide Degradation Monitoring of SiC MOSFETs Based on Junction Capacitances," in *IEEE Transactions on Power Electronics*, vol. 36, no. 7, pp. 8308-8324, July 2021.
- (J11): S. Pu, F. Yang, B. T. Vankayalapati, E. Ugur, C. Xu and B. Akin, "A Practical On-Board SiC MOSFET Condition Monitoring Technique for Aging Detection," in *IEEE Transactions on Industry Applications*, vol. 56, no. 3, pp. 2828-2839, May-June 2020.
- (J12): S. R. Meher, S. Banerjee, B. T. Vankayalapati and R. K. Singh, "A Reconfigurable On-Board Power Converter for Electric Vehicle With Reduced Switch Count," in *IEEE Transactions on Vehicular Technology*, vol. 69, no. 4, pp. 3760-3772, April 2020.

#### Patents

- (P1): B Vankayalapati, B Akin, F Yang, PU Shi, M Farhadi, ,A method for reliable onboard detection of gate-open faults in power semiconductor switches, UTD 21010 filed through SRC & UTD office of technology management, 2020.
- (P2): B Akin, F Yang, PU Shi, C Xu, B Vankayalapati, "Methods of measuring real-time junction temperature in silicon carbide power mosfet devices using turn-on delay, related circuits, and computer program products," US Patent App. 16/908,997.
- (P3): B Akin, PU Shi, E Ugur, F Yang, C Xu, BT Vankayalapati, "Methods of monitoring conditions associated with aging of silicon carbide power mosfet devices in-situ, related circuits and computer program products," US Patent App. 16/897,448.

### Peer-Reviewed Conference Publications

(C1): B. T. Vankayalapati and B. Akin, "Closed-loop Junction Temperature Control of SiC MOSFETs in DC Power Cycling for Accurate Reliability Assessments," 2021 IEEE 13th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (SDEMPED), 2021, pp. 209-215.

- (C2): C. Li, B. Vankayalapati and B. Akin, "Latency Compensation of SD-ADC for High Performance Motor Control and Diagnosis," 2021 IEEE 13th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (SDEMPED), 2021, pp. 289-294.
- (C3): S. Pu, F. Yang, E. Ugur, B. T. Vankayalapati, C. Xu and B. Akin, "On-Board SiC MOSFET Degradation Monitoring Through Readily Available Inverter Current/Voltage Sensors," 2019 IEEE Transportation Electrification Conference and Expo (ITEC), 2019, pp. 1-5.
- (C4): B. T. Vankayalapati, A. Sarkar, R. Nune, S. Anand and Y. S. Chauhan, "Comparison of Si and GaN Power Devices Based SMPS for Satellite Application," 2018 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES), 2018, pp. 1-6.
- (C5): B. T. Vankayalapati, R. Singh and V. K. Bussa, "Two stage integrated on-board charger for EVs," 2018 IEEE International Conference on Industrial Technology (ICIT), 2018, pp. 1807-1813.

# HONORS AND AWARDS

- 1. 2022 Jan Van der Ziel Fellowship.
- 2. SDEMPED 2021 Best Paper Award.