# TIME-DOMAIN ANALOG-TO-DIGITAL CONVERSION AND GIGAHERTZ TIME-DOMAIN FOLDING/FLASH ADC

by

Shuang Zhu



## APPROVED BY SUPERVISORY COMMITTEE:

Yun Chiu, Chair

Carl Sechen

Rashaunda Henderson

Qing Gu

Copyright 2017

Shuang Zhu

All Rights Reserved

# TIME-DOMAIN ANALOG-TO-DIGITAL CONVERSION AND GIGAHERTZ TIME-DOMAIN FOLDING/FLASH ADC

by

SHUANG ZHU, BS, MS

#### DISSERTATION

Presented to the Faculty of

The University of Texas at Dallas

in Partial Fulfillment

of the Requirements

for the Degree of

## DOCTOR OF PHILOSOPHY IN

#### ELECTRICAL ENGINEERING

## THE UNIVERSITY OF TEXAS AT DALLAS

May 2017

#### ACKNOWLEDGMENTS

As the completion of my PhD study approaches, I would like to express my gratitude to everyone who has helped me to complete the journey.

First, I would like to thank my advisor, Dr. Yun Chiu, for supporting me in my pursuit of a PhD degree at UT Dallas. His professional guidance in research and generous referral in academy opportunities has provided me both proficient research skills and vision in the state-of-art research. I would also like to thank my dissertation committee members, Dr. Carl Sechen, Dr. Rashaunda Henderson and Dr. Qing Gu for serving on my defense exam and reviewing my dissertation.

It has been a pleasure and an unforgettable time working with many colleagues at UT Dallas. I cannot count how many times Bo Wu and I ate pho together with some great idea coming out during the meal. I still remember when Yuan Zhou and I wiped ice from his windshield on a snowy night. I also experienced great time with my colleagues Yongda Cai, Benwei Xu, Sudipta Sarkar, Amy Song, Brian Elies, Hai Huang, Yanqing Li, Hongda Xu, Kiran Soppimath, Ling Du, Qian Zhong and Lei Chen. Furthermore, I appreciate the generosity of Dr. Kenneth K. O and Mrs. Donna J. Kuchinski for the support of the TxACE.

Last but not least, I thank my wife and my parents. I especially thank my wife for coming to the US with me.

March 2017

## TIME-DOMAIN ANALOG-TO-DIGITAL CONVERSION AND GIGAHERTZ TIME-DOMAIN FOLDING/FLASH ADC

Shuang Zhu, PhD The University of Texas at Dallas, 2017

Supervising Professor: Yun Chiu, PhD

High-speed ADCs with 6~10-bit resolution and multi-gigahertz sampling rate are highly demanded in next generation wireless and wireline communication systems.

For the wireline communication systems, such as backplane receivers and 10GBASE-T, 10+ GS/s, 6-8b ADCs are under great demand to deliver higher bit rate with PAM-4 modulation. In this work, a 10 GS/s 6 bit time domain (TD) folding ADC with single voltage-to-time converter (VTC) front-end eliminates the clock skew problem in time-interleaving (TI) architecture. Inherent dynamic element matching (DEM) also ensures good linearity. The RO-based folding time-to-digital converter (TDC) achieves high area efficiency too. This chip achieves a good SFDR of 42 dB with the minimum chip area among state-of-art counterparts.

In modern RADAR systems and 5G base stations, high speed ADCs with high input bandwidth enables RF sampling, which can significantly reduce the RF frontend complexity. This work proposed a 2 GS/s 8 bit Flash ADC based on novel TD remainder number system (RNS). The RNS architecture significantly reduces the CMP numbers of a Flash ADC, leading to low complexity and power consumption. An effective resolution bandwidth (ERBW) of 1.74 GHz is achieved with the low input capacitance enabled by the TD approach. This work achieves a higher sampling rate/ERBW while consuming lower power than conventional Flash ADCs, which shows great potential in future RADAR systems.

## TABLE OF CONTENTS

| ACKNOWLE    | DGMENTS                                    | iv   |
|-------------|--------------------------------------------|------|
| ABSTRACT.   |                                            | V    |
| LIST OF FIG | URES                                       | ix   |
| LIST OF TAE | BLES                                       | xiii |
| CHAPTER 1   | INTRODUCTION                               | 1    |
| 1.1         | MOTIVATION                                 | 1    |
| 1.2         | DISSERTATION ORGANIZATION                  | 3    |
| CHAPTER 2   | TIME DOMAIN CONVERSION TECHNIQUE           | 5    |
| 2.1         | RENAISSANCE OF TIME DOMAIN DATA CONVERSION | 5    |
| 2.2         | OVERVIEW OF THE TDC                        | 7    |
| 2.3         | TIME DOMAIN FOLDING WITH RO                | 11   |
| 2.4         | TIME DOMAIN FOLDING ADC BUILDING BLOCKS    | 12   |
| 2.5         | NOISE IN TIME DOMAIN ADC                   | 20   |
| CHAPTER 3   | 10GS/S 6BIT TIME DOMAIN FOLDING ADC        | 29   |
| 3.1         | INTRODUCTION                               |      |
| 3.2         | TIME DOMAIN FOLDING ADC ARCHITECTURE       |      |
| 3.3         | DESIGN OF CIRCUIT BUILDING BLOCKS          |      |
| 3.4         | MEASUREMENT RESULTS                        | 45   |
| 3.5         | SUMMARY                                    | 54   |
| CHAPTER 4   | 2GS/S 8BIT RNS BASED FLASH ADC             | 56   |
| 4.1         | INTRODUCTION                               | 56   |
| 4.2         | RNS QUANTIZATION                           | 57   |
| 4.3         | TIME DOMAIN APPROACH                       | 67   |
| 4.4         | MEASUREMENT RESULTS                        | 79   |
| 4.5         | SUMMARY                                    |      |
| CHAPTER 5   | CONCLUSION                                 |      |

| REFERENCES          |     |
|---------------------|-----|
|                     | 100 |
| BIOGRAPHICAL SKETCH | 100 |
| CURRICULUM VITAE    |     |

## LIST OF FIGURES

| Figure 1.1. Direct RF sampling system                                                        |
|----------------------------------------------------------------------------------------------|
| Figure 1.2. SNDR vs. Fs of the state-of-art gigahertz Flash/Folding ADCs                     |
| Figure 2.1. Number of publications of time-domain converters in major circuit conferences6   |
| Figure 2.2. (a) Delay-line TDC (b) Vernier TDC and (c) pulse shrinking TDC                   |
| Figure 2.3. (a) RO based folding TDC and (b) gated RO with noise shaping9                    |
| Figure 2.4. Concept of time-domain folding11                                                 |
| Figure 2.5. Comparison of (a) time-domain folding and (b) voltage-domain folding12           |
| Figure 2.6. Concept of TD ADC13                                                              |
| Figure 2.7. RO based folding TDC13                                                           |
| Figure 2.8. General circuit diagram and timing diagram of (a) CSI-based VTC (b) LR VTC15     |
| Figure 2.9. Gain expansion of the CSI-based VTC17                                            |
| Figure 2.10. General circuit diagram and timing diagram of time amplifier18                  |
| Figure 2.11. Noise of inverter20                                                             |
| Figure 2.12. (a) Simulated and calculated RMS noise vs. integration time of the VTC and (b)  |
| waveform of V <sub>OP</sub> and I <sub>D</sub> 23                                            |
| Figure 2.13. Jitter of RO vs. measured time interval27                                       |
| Figure 2.14. (a) DEM with gradient effect and (b) simulated RO noise vs time interval28      |
| Figure 3.1. System architecture of the 10-GS/s, 6-bit TD folding ADC with a single front-end |
| T/H and VTC                                                                                  |
| Figure 3.2. Complete VTC circuit                                                             |

| Figure 3.3. (a) Inverter propagation delay as a function of input edge time and (b) gain         |
|--------------------------------------------------------------------------------------------------|
| compression of inverter                                                                          |
| Figure 3.4. Simulated compensation efficacy of the VTC                                           |
| Figure 3.5. VTC linearity (a) cross process corners and (b) w/ VDD and temperature variation. 35 |
| Figure 3.6. Block diagram of the folding TDC                                                     |
| Figure 3.7. Architecture of the fine TDC                                                         |
| Figure 3.8. RO size optimization                                                                 |
| Figure 3.9. (a) Concept of the inherent DEM of the TDC and (b) behavioral simulation result of   |
| the mismatch-induced noise due to DEM                                                            |
| Figure 3.10. (a) conventional delay-line and (b) folded differential delay-line40                |
| Figure 3.11. Circuit and timing diagram of the 1-4 DEMUX                                         |
| Figure 3.12. Simplified (a) circuit diagram and (b) timing diagram of the TA42                   |
| Figure 3.13. Timing diagram of one TDC channel43                                                 |
| Figure 3.14. Calibration flow chart45                                                            |
| Figure 3.15. Block diagram of testbench setup                                                    |
| Figure 3.16. Photo of test board                                                                 |
| Figure 3.17. Die photo                                                                           |
| Figure 3.18. Measured DNL and INL plots of (a) aggregate result and (b) individual channel       |
| results                                                                                          |
| Figure 3.19. Measured output spectra at 10 GS/s with (a) low frequency input and (b) Nyquist     |
| input                                                                                            |
| Figure 3.20. Measured dynamic performance                                                        |

| Figure 3.21. Power breakdown                                                                                     |
|------------------------------------------------------------------------------------------------------------------|
| Figure 3.22. Measured mean squared noise vs. normalized input time                                               |
| Figure 3.23. Simulated $J_C$ growth vs. supply/ground and substrate noise                                        |
| Figure 4.1. (a) Principle of RNS and (b) transfer function of RNS with moduli set {3, 5}58                       |
| Figure 4.2. (a) RNS system with erroneous remainders and (b) the transfer function with $ \Delta r_i  \leqslant$ |
| 1                                                                                                                |
| Figure 4.3. Redundant RNS systems: (a) RNR, the moduli are $\{\Gamma_1, \Gamma_2, \Gamma_z\}$ and (b) RR, the    |
| moduli are $\{M\Gamma_1, M\Gamma_2\}$ 60                                                                         |
| Figure 4.4. Simulated RNS TF with (a) $M = 14$ and (b) $M = 16$ 61                                               |
| Figure 4.5. (a) RNS with output averaging and (b) $N_C$ vs. resolution n64                                       |
| Figure 4.6. (a) FoMw improvement vs. L and (b) FoMs gain vs. L                                                   |
| Figure 4.7. Block diagram of proposed RNS ADC68                                                                  |
| Figure 4.8. (a) Circuit and (b) timing diagram of the VTC                                                        |
| Figure 4.9. Simulated SFDR and THD of the VTC with process variation and mismatch71                              |
| Figure 4.10. (a) $4 \times$ TD SRL interpolation and (b) SRL with build-in offset72                              |
| Figure 4.11. (a) TDC mismatch model and (b) mismatch extraction method with TD ramp75                            |
| Figure 4.12. Histogram ideal TDC with (a) $Vtp<0> = 200 \text{ mV}$ and (b) $Vtp<0> = -200 \text{ mV}$ 77        |
| Figure 4.13. Measured (a) TDC RMS noise vs. g <sub>cal</sub> and (b) TDC RMS noise vs. iteration77               |
| Figure 4.14. Measured (a) normalized raw code histogram (b) calibration convergence curves78                     |
| Figure 4.15. (a) Target DAC voltage convergence curve and (b) incompletely calibrated                            |
| histogram78                                                                                                      |
| Figure 4.16. Block diagram of testbench setup                                                                    |

| igure 4.17. Photo of test board                                                             | 80 |
|---------------------------------------------------------------------------------------------|----|
| igure 4.18. Die photo                                                                       | 81 |
| igure 4.19. Power breakdown                                                                 | 82 |
| igure 4.20. Measured DNL and INL plots                                                      | 82 |
| Figure 4.21. Measured output spectra at 2 GS/s with (a) low-frequency input and (b) Nyquist |    |
| input.                                                                                      | 83 |
| igure 4.22. Measured dynamic performance.                                                   | 84 |
| Figure 4.23. Mod time plot of RMS noise of TDC1 vs. signal magnitude                        | 85 |
| Figure 4.24. Measured SNDR variations vs. the temperature and supply voltage                | 86 |
| Figure 4.25. Schreier FoM of all non-interleaved Flash/folding ADCs published at ISSCC and  |    |
| VLSI Symp. from 1997 to 2016.                                                               | 88 |

## LIST OF TABLES

| Table 3.1. ADC Noise Budget                                            | 53 |
|------------------------------------------------------------------------|----|
| Table 3.2. Performance Comparison                                      | 54 |
| Table 4.1. VTC Performance Comparison                                  | 68 |
| Table 4.2. ADC Noise Budget                                            | 73 |
| Table 4.3. Simulated mismatch and noise performance of RO, CMP and SRL | 74 |
| Table 4.4. Performance Comparison                                      | 87 |

#### **CHAPTER 1**

#### INTRODUCTION

#### **1.1 MOTIVATION**



Figure 1.1. Direct RF sampling system.

High-speed ADCs with 6~10-bit resolution and multi-gigahertz sampling rates are highly demanded in next generation wireless and wireline communication systems. In modern RADAR systems [1] and 5G base stations [2], more and more channels need to be integrated in one compact package. So it requires very high speed ADCs with high input bandwidth to enable IF or RF sampling, which can significantly reduce the RF frontend complexity (board size, bill of materials (BOM) cost, weight, and power) with the removal of mixers, LO synthesizers, amplifiers and filters as shown in Figure 1.1. This approach also moves the frequency down-conversion function from the analog domain to digital back-ends, enabling higher system flexibility and re-configurability [2]. In previous works, TI SAR or pipelined ADCs are introduced into such applications [3, 4]. However, large power consumptions are burned in these

designs since the relatively low conversion speed of s single channel SAR or pipelined ADC requires heavily interleaving with power hungry input buffer and calibrations. However, the Flash ADCs is a promising alternative due to their highest conversion rate without time-interleaving. The main drawback of a Flash ADC is its exponential dependence of comparator (CMP) number N<sub>C</sub> on the resolution n (N<sub>C</sub>  $\approx 2^n$ ). It leads to high complexity and power consumption. Furthermore, the input bandwidth of Flash ADCs is degraded by the large CMP array directly loaded at the input node, blocking their application in direct RF sampling. This work proposed a 2 GS/s 8 bit Flash ADC based on novel TD remainder number system (RNS) approach. It significantly reduces the CMP numbers and the input capacitance so that a bandwidth of 1.74 GHz is achieved. This work achieves higher speed while consumes less power than the ADC used in [1], showing greatly potential in future RADAR system.

For the wireline communication systems, such as ADC-based backplane receivers [5-7] and 10GBASE-T [8-10], PAM-4 is replacing traditional non-return-to-zero (NRZ) as mainstream modulation method due to its higher bit rate. Thus, 10+ GS/s, 6-8b ADCs are under great demand to these applications [5, 6, 11]. Such ADCs usually requires TI Flash or SAR ADC arrays [12-15]. The many-way interleaved front-end track-and-hold (T/H) structure usually requires a power-hungry input buffer and timing-skew calibration, resulting in added complexity and power consumption of the ADC. In this work, a 10 GS/s 6 bit TD folding ADC with single frontend eliminates the clock skew problem in time-interleaving architecture. Inherent DEM also ensures good linearity.

Figure 1.2 plots the SNDR versus sampling rate of state-of-art gigahertz ADCs. The TD ADCs are plotted in blue marker while the conventional voltage domain works are in grey. It can

be seen that the TD ADCs shows competitive performance. However, they are not completely exploiting the advantage of TD approach and beating the performance of the voltage domain ADCs. So in this dissertation, two TD ADCs with advanced architecture and improved performance are proposed. One of them focuses on higher speed (10 GS/s 6 bit), the other one focuses on higher resolution (8 bit, 2 GS/s). It can be seen that both works achieve the state-of-art performance compared to conventional TD or voltage domain gigahertz ADCs.



Figure 1.2. SNDR vs. Fs of the state-of-art gigahertz Flash/Folding ADCs.

#### **1.2 DISSERTATION ORGANIZATION**

Chapter 2 reviews the topologies of TDC. The basic idea of RO based time domain folding and building blocks of TD folding ADC are also discussed followed by the noise analysis. In Chapter 3, a 10 GS/s 6 bit TD folding ADC in 65 nm CMOS with inherent DEM is discussed with the details of linearity compensation of the VTC and DEM effect of the TDC. Chapter 4 proposes a 2 GS/s 8 bit Flash ADC in 65 nm based on the remainder number system (RNS) with TD approach. The design advantages of the RNS quantizer in high-speed data converters are discussed. Chapter 5 concludes our work on high speed TD Flash/folding converters.

#### **CHAPTER 2**

#### TIME DOMAIN CONVERSION TECHNIQUE<sup>1</sup>

#### 2.1 RENAISSANCE OF TIME DOMAIN DATA CONVERSION

Measurement of time seems like no big deal nowadays. A simple counter can do this job easily by counting a clock signal in any digital circuits. However, once the time needed to be measured is in pico-second level or the time measurement circuit is used to generate the clock signal itself, such a simple counter cannot do the job any more. As a result, the TDCs face a renaissance in the past decade. They are widely used in precise time interval measurement, i.e. light detection and ranging (LIDAR) [16-18], all digital phase-lock-loop (ADPLL) [19-21] and clock data recovery (CDR) [22]. Recently, time domain (TD) ADCs exploiting TDC techniques also becomes more and more active after decades of silence since single slope ADC was introduced [23]. Figure 2.1 shows the increasing number of publications of TD ADCs in major solid-state circuit conferences in recent years.

One of the reasons of the growing of interest on time domain ADC is the technology scaling of CMOS. As technology improves, all voltage domain approaches face the challenge of the reduction of signal headroom and degradation of the intrinsic gain of transistor, which will eventually degrade the SNR of converters operating in the voltage domain in advanced process nodes. In contrast, the speed of transistors is improving over process scaling such that at certain

<sup>&</sup>lt;sup>1</sup> Part of this chapter is reprinted with permission from publication: S. Zhu, B. Xu, B. Wu, K. Soppimath and Y. Chiu, "A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding and Inherent DEM," in IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1785-1796, Aug. 2016. (©IEEE) [92].

point a TD approach may gain advantage over the traditional voltage domain techniques. The TD circuits also scale well as digital circuits.



Figure 2.1. Number of publications of time-domain converters in major circuit conferences.

Inspired by this, TD approach attracts more and more attention from converter designers. A few works based on direct VTC + TDC architecture are reported [24-26]. In [26], the compact nature of the current-starved inverter (CSI) based VTC presents potentials for high-speed application. Meanwhile, the pulse position modulation (PPM) ADC exhibits a more linear V-T conversion [24]. In most recent publications [27-31], the multi-domain or the so-called "hybrid" converters that employ TD quantizers also show promising performance. Another large class of converters utilizing a TD approach is the VCO-based ADC [32-34]. Such ADCs employ a VCO as the voltage-to-phase converter and quantize the phase in the time domain.

The key part of the TD ADC is the performance of the TDC. Several types of TDCs are proposed in the past focusing on different specifics, such as high speed, high resolution, low complexity and low power. They can be classified in several categories similar as ADCs.

### 2.2 OVERVIEW OF THE TDC

## 2.2.1 FLASH TDC



Figure 2.2. (a) Delay-line TDC (b) Vernier TDC and (c) pulse shrinking TDC.

The most straight-forward architecture is the Flash TDC based in simple delay-line [35, 36] as sketched in Figure 2.2(a). Although it is sample, the exponential growth of the number of inverter and flip-flop (FF, i.e. comparator) with the resolution is the principal design overhead. Also, the LSB size of delay-line TDC is multiple of the inverter delay, which is directly limited by technology.

To achieve sub-gate-delay LSB size, two methods can be introduced. One is the Vernier TDC, which employs two delay lines with slightly different stage delay [37-40] as sketched in Figure 2.2(b). Thus, finer LSB size is obtained as the difference of the stage delay of the two delay lines, at the cost of doubling the power and area of the delay line. Alternatively, the pulse shrinking (PS) TDC also provides finer LSB size based on the delay difference between rising and falling transitions of a buffer [41-43] as sketched in Figure 2.2(c). However, both the Vernier TDC and the PS TDC have two major drawbacks. Firstly, their matching is even worse than simple delay-line TDC since the LSB size is smaller but averaging technique (such as resistive averaging in interpolation TDC in 2.1.2) is not suitable for them. Large DNL/INL occurs in such designs [40, 41]. Secondly, the conversion speed of both are slow (<100 MS/s), which blocks their application in high speed design. On the other hand, their power consumptions are quite higher than conventional voltage domain approach in low speed designs.

#### 2.2.2 RO BASED TDC

A common drawback among above mentioned Flash type TDCs is their complexity. One possible way to reduce their complexity is rolling them as a ring oscillator. This rolling operation basically generates a TD signal folding. When employing the RO as a fine TDC and a counter or separate course TDC, a folding TDC is achieved with same conversion speed as Flash TDC but

much less number of delay stages and DFFs as sketched in Figure 2.3(a). With the circulating of edge transitions in the RO, phase interpolation can be done by shorted inverter or resistor [44, 45]. Thus, sub-gate-delay LSB size and better matching can also be achieved with the interpolation and the resistive averaging, respectively.



Figure 2.3. (a) RO based folding TDC and (b) gated RO with noise shaping.

Another category is the gated-RO (GRO) based TDCs as sketched in Figure 2.3(b). They not only provide a first order noise shaping but also save power due to the gated operation. Combining with over-sampling, GRO based TDC can achieve high dynamic range and SNR [46, 47]. One drawback of the GRO TDC is the leakage problem. Since the state or phase of the RO

is stored on the parasitic capacitance of the internal nodes, the leakage current will discharge these nodes resulting in errors. The resistive interpolation and averaging cannot be applied to GRO TDC too. An upgraded version named Switched RO (SRO) TDC solves such problems [48].

#### 2.2.3 TWO-STEP AND PIPELINED TDC

Similar as Flash ADC, both Flash TDC and RO based TDC experience an exponential relationship between the conversion time and the resolution, which makes them difficult to achieve high resolution at high conversion speed. As a result, two-step or pipelined TDCs with multiple conversion stages and time amplification are proposed [49-53]. They can achieve more than 9 bit resolution with speed of hundreds of megahertz.

One of the most critical components of two-step or pipelined TDC is the time amplifier. Different types of TA are recently reported [52, 54-57]. Another difficulty to implement two-step or pipelined TDC is the Multiplying Digital-to-Time Converter (MDTC), the counterpart of MDAC in the voltage domain. Unlike storing residue signal on the capacitor in MDAC, storing time residue is necessary to wait for the valid output of the TDC. It requires either large number of dummy delay stages like the time register [51] with large power consumption or conversion back to voltage domain like [29].

#### 2.2.4 SAR TDC

The SAR TDC eliminates the usage of time amplification with finer achievable LSB size. However, the SAR operation also requires storage of residue signal, which is difficult to implement in the time domain. In [58], dummy delay stages are inserted along with the main delay path. The edges of the dummy stages are selected by multiplexers to generate residue signal. The time signal is propagated in the dummy delay stages so huge dynamic power is burn. Alternatively, the SAR logic is embedded in GRO with the correlated double sampling. So the time residue signal is accumulating sample by sample instead of storage by power hungry dummy delay stages. Then a noise shaping feather is provided by oversampling [59].

#### 2.3 TIME DOMAIN FOLDING WITH RO



Figure 2.4. Concept of time-domain folding.

The folding structure is well-known as high area efficiency and Flash-like speed [60, 61]. The concept of time-domain folding is illustrated in Figure 2.4. Let's take a three-stage RO as an example. The RO is free running and periodic oscillatory waveforms appear on the three internal nodes,  $V_1$ ,  $V_2$  and  $V_3$ . If we record  $V_1$ ,  $\overline{V_2}$  (the inversion of  $V_2$ ) and  $V_3$  collectively for some time, we can observe a thermometer-like digital code (represented together by  $V_1$ ,  $\overline{V_2}$  and  $V_3$ ) circulating among 6 codes per RO period. The circulating nature of the oscillation phases gives rise to the folding operation in TD. Meanwhile, the time interval between two consecutive thermometer codes corresponds to exactly one inverter delay, which represents the LSB size of the quantizer. Thus, RO provides a compact realization of signal folding and quantization in time domain. For N-stage RO, there are 2N states per cycle.

Compared to voltage domain folding operations, time-domain folding provides two distinctive advantages. First, voltage domain folding is quite non-linear. To overcome nonlinearity, parallel folding was introduced [61] with large structural overhead and analog complexity. In contrast, TD folding using RO results in an inherently linear operation since the phases of RO is circulating with no boundary effect. Secondly, time-domain folding, given enough conversion time, yields an infinite folding factor, whereas it is limited by the number of folding amplifiers in the voltage domain. Thus, TD folding is also very efficient. These facts are contrasted in Figure 2.5.



Figure 2.5. Comparison of (a) time-domain folding and (b) voltage-domain folding.

#### 2.4 TIME DOMAIN FOLDING ADC BUILDING BLOCKS

The concept of TD ADC is shown in Figure 2.6. The VTC circuit converts input voltage signal into time domain. Then TDC quantizes time signal into digital output. This architecture can potentially expand the input bandwidth of Flash-like ADCs, i.e., Flash or folding-interpolation, for high speed (>1 GS/s), medium resolution (6-10 bit) application. Since the large

number of CMPs directly loaded at the input of such ADCs, power hungry input drivers are usually required to maintain sufficient input bandwidth. However, in the TD architecture, the VTC decouples the load capacitance of the CMP arrays from the input, so high input bandwidth is much easy to achieve. The two works in this dissertation both employ the RO based folding TDC architecture, so the building blocks of the RO based folding TDC are introduced in this section.



Figure 2.6. Concept of TD ADC

#### 2.4.1 TDC



Figure 2.7. RO based folding TDC.

The block diagram of the RO based folding TDC is illustrated in Figure 2.7. It consists of an N-stage RO, two bunches of buffer and CMP arrays. The two CMP arrays quantize the phase of the RO triggered by the START and STOP signals with a double sampling scheme. The double sampling provides two advantages. Firstly, the input time becomes differential and bipolar instead of unipolar in traditional Flash TDC (Figure 2.2), which means the STOP signal could be either earlier than the START signal or later than it. As a result, the maximum input time interval is doubled. So the conversion speed increases. The second advantage is the inherent DEM effect embedded in this structure. It helps to improve the linearity of the TDC and will be explained in details in Chapter 3.

#### 2.4.2 VTC

Several types of VTC are recently reported in [24-28, 31, 62, 63]. The basic concepts of these designs are similar, which is to charge or discharge a capacitor from an initial reset voltage and subsequently detect the time event at which the capacitor voltage crosses a preset threshold. Then a threshold crossing time linearly dependent on the input control is generated. Linear ramp (LR) based VTCs employ an input-controlled threshold voltage [24, 31] or reset voltage sampled from the input [63]. While the current starved inverter (CSI) based VTCs modulate the charging current by the input voltage [25, 26, 62].

The CSI based VTCs are widely used in multi-gigahertz TD ADCs [25, 26, 62] since their reset and discharging operation are synchronized by the same clock signal and the sampling operation are decoupled from V-T conversion. Although such compact architecture and timing ensures them high conversion speed, their inherent non-linear operation limits the achievable resolution to be only about 6-bit even linearity compensation is introduced (Chapter 3). Their

output ranges are also small, which make the design of TDC more challenging. On the other hand, LR VTCs can provide better linearity and larger output range [63]. However, the speed of previous works is much lower than gigahertz level.





Figure 2.8. General circuit diagram and timing diagram of (a) CSI-based VTC (b) LR VTC.

The general circuit diagram and timing diagram of CSI-based VTC are plotted in Figure 2.8(a). When the clock signal  $\Phi$  is low, M<sub>1</sub> resets V<sub>A</sub> to V<sub>DD</sub>; when  $\Phi$  is high, the input transistor M<sub>D</sub> starts to discharge V<sub>A</sub> with a drain current directly controlled by the input voltage. Once V<sub>A</sub> crosses V<sub>TH</sub>, the trip voltage of the trailing inverter or threshold-crossing detector (TCD), a rising edge is generated at the output. Typically, M<sub>D</sub> is designed to ensure that V<sub>A</sub> crosses V<sub>TH</sub> before it

enters the triode region. Also, a pseudo-differential structure is often employed to improve the linearity of the VTC.

The circuit and timing diagrams of the LR VTC are shown in Figure 2.8(b). The input voltage is sampled onto  $C_S$  and then discharged by  $I_D$  with a constant current. The TCD generates a time difference signal  $t_{out,\pm}$ , corresponding to the differential input voltage. Since both the sampling operation and the constant current discharging are linear in the first order, the LR VTC is more linear than the CSI based VTC.

To analyze the linearity of the CSI based VTC, the transfer function of a PD CSI-based VTC is derived in the following section. Here square law is assumed for all transistors. According to Figure 2.8(a), with a common-mode input  $V_c$  and a differential input  $V_d$ , the output time is

$$t_{out} = \frac{2(V_{DD} - V_{TH})C_D}{\mu C_{ox} (W/L)} \cdot \left(\frac{1}{V_{ov,+}^2} - \frac{1}{V_{ov,-}^2}\right)$$
  
=  $-\frac{K(V_c - V_m)(V_d/2)}{(V_c - V_m + V_d/2)^2 \cdot (V_c - V_m - V_d/2)^2}$  (2-1)  
=  $-\frac{K}{A^3} \frac{B}{\left(1 - \frac{B^2}{A^2}\right)^2},$ 

where  $V_{TH}$  is the trip voltage of the first inverter in the TCD,  $V_{tn}$  is the threshold voltage of  $M_D$ , and

$$\begin{cases} A = (V_c - V_m) \\ B = (V_d / 2) \end{cases}, \quad K = \frac{8(V_{DD} - V_{TH})C_D}{\mu C_{ox}(W / L)}. \tag{2-2}$$

If a sinusoidal signal  $B = b\cos(\omega t)$  is fed to the VTC, with Taylor expansion,

$$\frac{1}{\left(1-x\right)^2} = 1 + 2x + 3x^2 + O\left(x^3\right) \quad for \ |x| < 1$$
(2-3)

the output time is

$$t_{out} \approx -\frac{Kb}{A^3} \left[ \cos(\omega t) + 2\frac{b^2}{A^2} \cos^3(\omega t) + 3\frac{b^4}{A^4} \cos^5(\omega t) \right]$$
(2-4)

and

$$HD_{3} \approx \frac{1}{2} \frac{b^{2}}{A^{2}}, \quad for \ \frac{b}{A} << 1.$$
 (2-5)

Equation (2-4) indicates an expansive transfer curve of the conventional CSI-based VTC. For example, assuming  $V_c$  and  $V_{tn}$  are set to 850 mV and 500 mV, respectively, the transfer curve is plotted in Figure 2.9. The circuit-level simulation result is also overlaid and agrees with the analytical model. A higher  $V_c$  provides better linearity according to (2-5). But the linear input range is quite small. For example, if  $V_d$  range is ±100 mV, the HD<sub>3</sub> is worse than 34 dB for the conventional CSI-based VTC.



Figure 2.9. Gain expansion of the CSI-based VTC.

#### 2.4.3 BUFFER, MUX AND TIME AMPLIFIER

The buffers in the TD circuits are usually normal logic gates or inverter chain with optimized fan-out (FO). Dynamic inverters can be used in multiplexer (MUX) or de-multiplexer (DEMUX), which only allow the preferred edges to pass through while preventing the other edges. All these circuits can be regarded as simple inverter at their ON state.

Different types of TA are recently reported [50, 52, 54-57]. In [50, 57], a TA based on the meta-stability of SR latch was presented with a non-linear gain and a small input range. A fixed TA gain of 2X was used in [54] with cross-coupled discharging paths. To expand the gain, a multiple-stage TA is necessary which unfortunately degrades the linearity. In [52], a pulse-train TA is proposed by replicating pulses. However, the gain must be an integer and the discontinuous operation increases the latency. A good way to build a highly linear TA with a flexible gain was presented in [55] with the double-rate discharging technique. SRO can also be used in time amplification [56]. Most of the time amplifiers above can be simplified to the model in Figure 2.10.



Figure 2.10. General circuit diagram and timing diagram of time amplifier.

Two discharging phases take place in turn during the amplification. The first one is an early discharging phase between  $t_1$  and  $t_2$  with a large slew rate after the first rising edge of the inputs arrives. For example, if the rising edge of  $t_{in,+}$  arrives first, it will discharge  $C_{S,+}$  with  $I_{D1}$  while keeping  $C_{S,-}$  at  $V_{DD}$ . Then, when the rising edge of  $t_{in,-}$  arrives, both capacitors will discharge together at an equal but small current  $I_{D2}$ . When the two voltages cross the threshold of two TCDs at  $t_3$  and  $t_4$ , respectively, an amplified time difference is generated.

The TA in [55] directly exploits this architecture while the TA in [54] uses cross-coupled feedback to control  $I_{D1}$  and  $I_{D2}$ , which degrade the linearity. The SR latch based TA [50, 57] can generator  $\Phi_0$  and  $\Phi_{1,\pm}$  with its logic, but the second phase with  $\Phi_2 = 1$  is invalid state. Thus  $I_{D2}$  is not well defined resulting in poor linearity. The SRO based TA [56] is also similar, which uses the phase of SRO to replace the role of voltage across the capacitor.

Based on this model, the input and output time of the TA are

$$t_{in} = t_2 - t_1 = \frac{V_{DD} - V_1}{SR_1} = \frac{V_{DD} - V_1}{I_{D1} / C_X},$$
(2-6)

$$t_{out} = t_4 - t_3 = \frac{\frac{1}{2}V_{DD1}}{SR_2} - \frac{V_1 - \frac{1}{2}V_{DD1}}{SR_2} = \frac{V_{DD} - V_1}{SR_2} = \frac{V_{DD} - V_1}{I_{D2} / C_X},$$
(2-7)

So, the gain of the TA is

$$A_{t} = \frac{t_{out}}{t_{in}} = \frac{SR_{1}}{SR_{2}} = \frac{I_{D1}}{I_{D2}},$$
(2-8)

#### 2.5 NOISE IN TIME DOMAIN ADC

The noise performance of the time-domain ADC is of interest. The noise will be divided into two parts for analysis – the pre-quantizer signal path and the quantizer. The signal path contains all stages before the quantizer, i.e., the VTC, the buffer and the TA. The quantizer contains the RO, the buffers, the CMPs and the interpolators if applicable.

#### 2.5.1 NOISE OF INVERTER

The noise of an inverter is well studied in [64, 65]. It can be modeled as a noisy current source charging/discharging a capacitor as Figure 2.11 shows.



Figure 2.11. Noise of inverter.

The noise voltage at the capacitor,  $V_n$ , is a random walk starting from  $V_{n,rst}$ , the noise voltage sampled on the capacitor. The variance of the noise is increasing with time.

$$V_n^2 = V_{n,rst}^2 + S_{i_n} \frac{t}{2C^2}$$
  
=  $\frac{kT}{C} + 4kT\gamma g_m \frac{t}{2C^2}$   
=  $\frac{kT}{C} \left(1 + 2\gamma g_m \frac{t}{C}\right)$  (2-9)

where,  $S_{i_n}$  is the single-sided spectral density of the noise of M<sub>1</sub>,  $g_m$  and  $\gamma$  are the transconductance and thermal noise coefficient of M<sub>1</sub>. Assuming  $V_{TH} = V_{DD}/2$ , then the time  $t_{cross}$ , when V<sub>A</sub> crosses V<sub>TH</sub>, is determined by

$$\frac{t_{cross}}{C} = \frac{V_{DD} - V_{TH}}{I_D} = \frac{V_{DD}}{2I_D}.$$
(2-10)

Substitute (2-9) into (2-10), the noise voltage at the threshold-crossing point is

$$V_{n,cross}^{2} = \frac{kT}{C} \left( 1 + \gamma V_{DD} \frac{g_{m}}{I_{D}} \right)$$
  
$$= \frac{kT}{C} \left( 1 + 2\gamma \frac{V_{DD}}{V_{ov}} \right).$$
 (2-11)

We can observe that the noise voltage of the inverter at the threshold-crossing point is still in kT/C format, with a multiplier of  $(1+2\gamma(V_{DD}/V_{ov}))$ . The time-domain jitter is defined by the noise voltage divided by the slew rate (SR) at the threshold-crossing point. So the output jitter of the inverter is

$$\sigma_{INV}^{2} = \frac{V_{n,INV}^{2}}{SR_{INV}^{2}} = \frac{\frac{kT}{C} \left(1 + 2\gamma \frac{V_{DD}}{V_{ov}}\right)}{\left(I_{D} / C\right)^{2}}.$$
(2-12)

The inverter usually acts as a buffer to drive the following stages, so its load capacitance is defined by its load. To minimize the inverter's jitter, larger SR, in other word larger  $I_D$ , is preferred. This is the same as voltage domain circuit where more power consumption gives rise to less noise.

For a buffer which consists of n-stage inverter chain, the fan-out FO and stage delay of each stage need to be consistent to optimize the delay of the entire inverter chain. So, with the same supply voltage, the SR is also consistent. The total jitter of the inverter chain is

$$\sigma_{BUF}^{2} = \frac{kT\left(1 + 2\gamma \frac{V_{DD}}{V_{ov}}\right)\left(\frac{1}{C_{1}} + \frac{1}{C_{2}} + \dots + \frac{1}{C_{n}}\right)}{SR^{2}}$$

$$= \frac{\frac{kT}{C_{eq}}\left(1 + 2\gamma \frac{V_{DD}}{V_{ov}}\right)}{SR^{2}}$$
(2-13)

So the jitter of buffer equals the jitter of a single inverter with  $C_{eq}$ , where  $C_{eq}$  is the series equivalent capacitance of all stages.

$$\frac{1}{C_{eq}} = \frac{1}{C_1} + \frac{1}{C_2} + \dots + \frac{1}{C_n}$$
  
=  $\frac{1}{C_1} \times \left( 1 + \frac{1}{f} + \frac{1}{f^2} + \dots + \frac{1}{f^{n-1}} \right)$   
=  $\frac{1}{C_1} \times \frac{1 - 1/f^n}{1 - 1/f}$   
 $\approx \frac{1}{C_1} \times \frac{f}{f - 1}, \quad (n \to \infty)$  (2-14)

For FO = 2, the maximum jitter of buffer is  $\sqrt{2}$  times of the first inverter while for FO = 4, the maximum jitter is 1.15 times of the first inverter. Due to the unity gain of the buffer, the input-referred jitter remains the same.

#### 2.5.2 NOISE OF SIGNAL PATH CIRCUITS

The signal path consists all the circuits transmitting time signal once it has been generated, such as the VTC, the TAMP and all the inverter buffers.

The noise mechanism of both types of VTC shown in Figure 2.8 is similar to a single inverter in the analysis above.

$$V_{n,VTC}^{2} = \frac{kT}{C_{D}} \left( 1 + 2\gamma \frac{V_{DD}}{V_{ov}} \right).$$
(2-15)

The calculated voltage noise on one side of the VTC ( $V_{OP}$ ) is compared with the simulation result in Figure 2.12(a). The simulated value starts to deviate from the calculation when  $t_{int} = 15$  ps. The noise decays after  $t_{int} = 15$  ps is caused by the M<sub>D</sub> entering triode region, which is shown in Figure 2.12(b).



Figure 2.12. (a) Simulated and calculated RMS noise vs. integration time of the VTC and (b) waveform of  $V_{OP}$  and  $I_D$ .

Fortunately, the predicted noise according to the model above is still accurate at the point when  $V_{OP}$  crossing  $V_{TH}$ . Thus, the output jitter of the VTC (differential) can still be derived from equation (2-13).
$$\sigma_{VTC}^{2} = \frac{\frac{2kT}{C_{D}} \left(1 + 2\gamma \frac{V_{DD}}{V_{ov}}\right)}{\left(I_{D} / C_{D}\right)^{2}}.$$
(2-16)

So for a given SR defined by the output range of the VTC, the jitter of the VTC can be reduced by increasing  $V_{ov}$  (or equivalently increasing  $V_C$ ). Here, same  $I_D$  should be kept by reducing W/L of  $M_D$ .

For the TA, let's first check the voltage noise of the two discharging path separately. Consider example in Figure 2.10, the noise voltage at  $t_{out,+}$  is

$$V_{n,tout+}^{2} = \frac{kT}{C} \left( 1 + 2\gamma \left( g_{m1} \frac{t_{2} - t_{1}}{C} + g_{m2} \frac{t_{3} - t_{2}}{C} \right) \right)$$

$$= \frac{kT}{C} \left( 1 + 2\gamma \left( \frac{g_{m1}}{I_{D1}} \left( V_{DD} - V_{1} \right) + \frac{g_{m2}}{I_{D2}} \left( V_{1} - \frac{1}{2} V_{DD} \right) \right) \right)$$

$$= \frac{kT}{C} \left( 1 + 2\gamma \frac{2}{V_{OV}} \left( \left( V_{DD} - V_{1} \right) + \left( V_{1} - \frac{1}{2} V_{DD} \right) \right) \right)$$

$$= \frac{kT}{C} \left( 1 + 2\gamma \frac{V_{DD}}{V_{OV}} \right).$$
(2-17)

The noise voltage at t<sub>out</sub>, is the same as inverter due to its single discharging operation.

$$V_{n,Y}^{2} = \frac{kT}{C} \left( 1 + 2\gamma \left( g_{m2} \frac{t_{4} - t_{2}}{C} \right) \right)$$
  
=  $\frac{kT}{C} \left( 1 + 2\gamma \frac{V_{DD}}{V_{OV}} \right).$  (2-18)

So, combining equation (2-8), the input-referred jitter of the TA (differential) is

$$\sigma_{TA}^{2} = \frac{\frac{2kT}{C} \left(1 + 2\gamma \frac{V_{DD}}{V_{OV}}\right)}{SR_{2}^{2}} \times \left(\frac{1}{A_{t}}\right)^{2} = \frac{\frac{2kT}{C} \left(1 + 2\gamma \frac{V_{DD}}{V_{OV}}\right)}{SR_{1}^{2}}.$$
(2-19)

Thus, the total input-referred jitter of all stages in the signal path is

$$\sigma_{path}^{2} = \sum_{i=1}^{n} \frac{\frac{2kT}{C_{i}} \left(1 + 2\gamma \frac{V_{DD}}{V_{OV,i}}\right)}{SR_{i}^{2}}.$$
(2-20)

If we assume that all stages share the same slew rate and  $V_{\rm OV}$  for simplicity, then (2-20) can be rewritten as

$$\sigma_{path}^{2} = \frac{\left(1 + 2\gamma \frac{V_{DD}}{V_{OV,path}}\right)}{SR_{path}^{2}} \sum_{i=1}^{n} \frac{2kT}{C_{i}}$$

$$= \left(1 + 2\gamma \frac{V_{DD}}{V_{OV,path}}\right) \left(\frac{C_{path}}{I_{path}}\right)^{2} \frac{2kT}{C_{path}}$$

$$\propto \frac{C_{path}}{I_{path}^{2}},$$
(2-21)

where  $C_{path}$  is the equivalent series capacitance of the entire signal path and  $I_{path}$  is the equivalent discharging current. Because the  $I_{path}$  is proportional to (W/L) and  $C_{path}$  is proportional to (WL), the jitter is

$$\sigma_{path}^{2} \propto \frac{WL}{\left(W/L\right)^{2}} = \frac{L^{3}}{W}$$
(2-22)

Equation (2-22) indicates that as the technology improved, the signal path gains SNR if the same W/L ratio is retained.

#### 2.5.3 NOISE OF QUANTIZER

The second noise contributor is the quantizer. The input-referred quantizer jitter is

$$\sigma_{qter,in}^{2} = \frac{1}{A_{t}^{2}} \Big( 2 \Big( \sigma_{q}^{2} + \sigma_{CMP}^{2} \Big) + \sigma_{RO}^{2} + \sigma_{DEM}^{2} \Big).$$
(2-23)

Where  $\sigma_q$  is the RMS value of the quantization noise,  $\sigma_{CMP}$  is the input referred jitter of the CMP. Due to the double sampling in this work, the power of the quantization noise and DFF inputreferred noise are doubled.

When time interval between the START and STOP events,  $\Delta t$ , is not very large, the accumulating jitter of the RO is proportional to the square root of  $\Delta t$  [66, 67]. The RO jitter can be defined as

$$\sigma_{RO}^2 = \frac{t_{in}}{T_{RO}} J_C^2.$$
 (2-24)

Where,  $J_C$  is the period jitter of the RO. While when  $\Delta t$  is not very large, the accumulating jitter of the RO is proportional to  $\Delta t$  due to correlation of low frequency supply/substrate noise and the flicker noise. Usually, the RO noise of TDC is predicted by (2-24) since  $\Delta t$  is much smaller than the period of low frequency noise. Also, for a good design of RO,  $J_C$  will not dominate the noise in the TDC. The simulated jitter of RO versus measured time interval is plotted in Figure 2.13.

The last term in equation (2-23) is the noise converted from the mismatch due to the inherent DEM effect. In conventional RO based TDC without the double sampling, significant gradient effect will degrade the INL. The INL will travels from zero randomly and back to zero after one period of the RO. Fortunately, in the RO based TDC with the doubling sampling, the

starting and ending points of each measurement is randomized. Thus, the INL will be transformed to white noise with zero mean value.



Figure 2.13. Jitter of RO vs. measured time interval.

Figure 2.14(a) illustrates the mechanism of DEM effect. Assume the stage delay of the RO increases in half cycle from 9 o'clock to 3 o'clock and then decreases in the other half cycle from 3 o'clock to 9 o'clock. The RMS noise of the RO phases is the minimum at k cycle since no mismatch is seen in this case, where k is an integer. On the other hand, the RMS noise is the maximum at k+0.5 cycle where the quantization value has the maximum variation. The simulation of RO RMS noise vs. measured time interval is plotted in Figure 2.14(b). It can be clearly seen that the  $\sigma_{\text{DEM}}$  experiences a period of 12 phases, which is the total phase number of the 6-stage differential RO used in the simulation. The noise meets notches at every 12 phases and the notch values increases due to the accumulating noise of the RO mentioned before. The peak noise does not increase significantly since the  $\sigma_{\text{DEM}}$  dominates. Larger mismatch results in larger  $\sigma_{\text{DEM}}$ . The relationship between  $\sigma_{\text{DEM}}$  and the mismatch can be learned from simulation.



Figure 2.14. (a) DEM with gradient effect and (b) simulated RO noise vs time interval.

### **CHAPTER 3**

# 10GS/S 6BIT TIME DOMAIN FOLDING ADC<sup>2</sup>

Authors - Shuang Zhu, Benwei Xu, Bo Wu, Kiran Soppimath, and Yun Chiu

The Department of Electrical Engineering, MS: EC37

The University of Texas at Dallas

800 West Campbell Road

Richardson, Texas 75080-3021

<sup>&</sup>lt;sup>2</sup> This chapter is reprinted with permission from two publications: S. Zhu, B. Xu, B. Wu, K. Soppimath and Y. Chiu, "A 0.073mm2 10-GS/s 6-bit time-domain folding ADC in 65-nm CMOS with inherent DEM," 2015 IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, 2015, pp. 1-4. (©IEEE) [89] and S. Zhu, B. Xu, B. Wu, K. Soppimath and Y. Chiu, "A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding and Inherent DEM," in IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1785-1796, Aug. 2016. (©IEEE) [92].

#### **3.1 INTRODUCTION**

High-speed ADCs achieving a sample rate of over 10 GS/s and a resolution of 6 to 8 bits are of critical demand for wireline communication systems such as fiber and backplane transceivers [5, 6, 11]. The conventional works reported in this field usually employ TI Flash or SAR ADC arrays [12-15]. The complexity of the Flash architecture and the large interleaving factor of a SAR array tend to inflate the silicon area of these ADCs, even when implemented in advanced CMOS nodes [15]. In addition, the many-way interleaved front-end track-and-hold (T/H) structure usually requires a power-hungry input buffer and timing-skew calibration, resulting in added complexity of the ADC when deployed in real-world systems. As technology improves, all voltage-domain approaches face the challenge of an increasingly reduced signal swing as the supply voltage scales, which will eventually degrade the SNR of converters operating in the voltage domain in advanced process nodes. In contrast, the speed of transistors is constantly improving over process generations such that at certain point a TD approach may gain advantage over the traditional voltage-domain techniques.

In this section, a TD folding ADC with high area efficiency achieving a 10-GS/s conversion speed and a 6-bit resolution is discussed. Two critical technologies are employed: 1) a high-speed non-interleaving VTC clocked at 10 GHz, 2) an area-efficient folding TDC array with built-in DEM. Due to the high-speed nature of the CSI-based VTC, the front-end T/H and the VTC can directly process a full-scale Nyquist (i.e., 5 GHz) input without interleaving. While the back-end TDC is 4-way interleaved, the folding nature of the TDC architecture based on ring oscillator (RO) leads to much area savings. In addition, since the RO is free running, the

beginning and ending points of the quantization process are always randomized from sample to sample, resulting in an inherent DEM operation without additional effort.

# **3.2 TIME DOMAIN FOLDING ADC ARCHITECTURE**

The overall architecture of the prototype TD folding ADC is sketched in Figure 3.1. The ADC front-end consists of a buffer-less high-speed VTC clocked at 10 GHz without TI; the back-end is composed of a 4-way TI TDC array utilizing the RO-based TD folding. The resolution of the ADC is 6 bits.



Figure 3.1. System architecture of the 10-GS/s, 6-bit TD folding ADC with a single front-end T/H and VTC.

Several architecture and circuit techniques are employed in this work to achieve the desired 10-GS/s throughput and 6-bit resolution with minimum silicon area. First, a CSI-based VTC is chosen for the target speed with a compact linearity compensation technique to support 6-bit accuracy. Owning to the high-speed nature of the VTC, no TI is necessary at the front-end; thus, the clock skew problem in TI ADCs [13, 15] is completely eliminated. Second, although the

back-end TDC is 4-way interleaved, the folding nature of the TDC architecture leads to much higher area efficiency than those based on delay lines (DL). A feed-forward structure of the RO and a 2× time interpolation are also incorporated to ensure a high conversion speed (i.e., 2.5 GS/s and 6-bit resolution in 65-nm CMOS). Third, the free-running RO and double sampling result in an inherent DEM operation for free. It works along with the pseudo-differential (PD) RO and the resistive averaging to further improve the linearity performance of the TDC. With a folding factor of 6, a total of 72 quantization levels are realized by the TDCs. The extra 8 levels are utilized to tolerate the inter-TDC offset and gain mismatch errors.

A de-multiplexer (DEMUX) circuit is employed to interface the single VTC and the 4-way TI TDC array. While distributing the VTC output into the 4 TDCs, 4 time amplifiers (TA) are inserted in between to make the output range of the VTC and the input range of the TDC meet with a gain of 8×. Lastly, the cyclic thermometer output codes of the TDC array are converted to binary format and decimated by a factor of 81× before being transmitted off-chip using differential current-mode logic (CML) drivers.

# **3.3 DESIGN OF CIRCUIT BUILDING BLOCKS**

#### 3.3.1 VTC

As discussed in Chapter 2, the CSI-based VTC topology is suitable for high-speed applications. The speed of a previous state-of-the-art CSI-based VTC is limited to 5 GS/s with a 5-bit resolution [62]. To meet the 10-GS/s throughput of this work, the proposed VTC consists of a top-plate T/H, a source-follower level shifter, a linearity-compensated TCD, and a pulse-shape restorer (PSR) as shown in Figure 3.2.



Figure 3.2. Complete VTC circuit.

The T/H employs NMOS switches for sampling and cross-connected dummy switches to cancel the hold-mode feedthrough. To further broaden the input bandwidth, a low input common-mode (~300 mV) is chosen for the T/H, whereas a high input common-mode (~850 mV) is selected for the CSI – a high common-mode V<sub>c</sub> not only improves the VTC linearity but also provides better noise performance as explained in Chapter 2. Finally, a PMOS source follower is inserted in between to level shift the common-modes. Since the loading of the level shifter can be significantly reduced by down-sizing M<sub>D</sub>, its power consumption is kept very low.

As analyzed in Chapter 2, the linearity of the conventional CSI-based VTC does not meet a 6-bit requirement. The expansive (superlinear) gain characteristic of the VTC is analyzed in Chapter 2. We introduce a compact linearity compensation technique in this work by cascading the gain-expansive VTC with a gain-compressive TCD. To understand the compressive (sublinear) behavior of the TCD, we first point out that the propagation delay of an inverter is approximately proportional to the rising/falling time of the input (Figure 3.3(a)) [68].



Figure 3.3. (a) Inverter propagation delay as a function of input edge time and (b) gain compression of inverter.

$$t_{pd,out} = t_{pd,0} + g\left(t_{edge,in}\right) \cdot t_{edge,in}$$
(3-1)

where  $t_{pd,out}$  is the propagation delay of the inverter,  $t_{pd,0}$  is its minimum value,  $t_{edge,in}$  is the rising/falling time of the input edge, and  $g(t_{edge,in})$  is the proportionality factor. In general,  $g(t_{edge,in})$  is a nonlinear function of  $t_{edge,in}$  [69]. If we plot  $g(t_{edge,in})$  of an inverter normalized to its peak value for different fan-out (FO), Figure 3.2(b) reveals that the proportionality factor exhibits a compressive behavior, and the nonlinearity is consistent for a FO of 1 to 3. So if a proper mean value and range of the CSI output's falling time  $t_{edge,in}$  (at node A in Fig. 7) are chosen, the gain expansion of the CSI can be compensated by the inverters in the TCD to some extent. In this design,  $t_{edge,in}$  has a mean value of 40 ps and ranges in ±10 ps with FO = 3. Transistor-level simulation confirms the analysis presented here, as Fig. 9 shows. The

nonlinearity and SFDR of the differential time signal,  $t_0$ , from node A, X, Y and Z of the TCD (Figure 3.4) are plotted. Simulation also indicates that an SFDR of over 42 dB can be achieved by the proposed VTC over process corners (Figure 3.5(a)) and with  $\pm 100$  mV supply voltage variation, 0-85°C temperature range in the TT corner (Figure 3.5(b)).



Figure 3.4. Simulated compensation efficacy of the VTC.



Figure 3.5. VTC linearity (a) cross process corners and (b) w/ VDD and temperature variation.

The VTC conversion cycle is 100 ps at 10 GS/s, out of which 40 ps are allocated to the reset phase and 60 ps are allocated to the V/T conversion. This gives rise to a  $\pm 20$  ps output range. For small inputs, V<sub>A</sub> may experience difficulty to fully discharge to ground before the reset edge arrives, resulting in an incomplete pulse that can be potentially swallowed by the TCD. To resolve this problem, a PSR circuit is added, employing an NMOS M<sub>C</sub> to help discharge C<sub>D</sub> under the control of a delayed clock phase (~40 ps delay inserted by a source follower).

# 3.3.2 TDC

The folding TDC consists of a coarse TDC, a fine TDC, and a bit-alignment logic (implemented off-chip). A block diagram of the folding TDC is sketched in Figure 3.6. The fine TDC employs a free-running RO and a double-sampling scheme for quantization [60] with 12 LSBs per fold. To make a 6-bit converter, 6 folds, equivalently 72 quantization levels (or 6.2 bits), are instantiated. A coarse TDC is also included to indicate within which fold the input resides.



Figure 3.6. Block diagram of the folding TDC.

The fine TDC contains four parts, a 6-stage RO, an interpolator, some buffers and CMPs. An overall PD structure is adopted for the TDC to ensure good common-mode rejection and stage delay uniformity. To make the RO differential, cross-coupled resistor pairs are inserted (Figure 3.7).



Monte-Carlo simulation is used to check the LSB size variation of the RO. In Figure 3.8, the standard deviation of the LSB size due to mismatch and the RO power consumption vs. the inverter sizes are plotted. To balance between the matching and power tradeoffs, an inverter with  $W_P/W_N = 3.2 \mu m/1.6 \mu m$  was chosen. The CMP is a sense-amplifier-based dynamic comparator consisting of a strong-arm comparator and an SR latch. Between the RO and the CMPs, interpolator and buffers are inserted. The inverters in the buffers are skewed to level up the trip points to ~800 mV to accommodate the higher than  $\frac{1}{2}$  VDD input common-mode of the CMPs. The process variation of the buffer only affects the input common-mode of the CMPs so that the TDC performance is not sensitive to buffer's process variation. The employment of two CMP

banks (CMP<sub>1</sub> and CMP<sub>2</sub>) to record the START and STOP events with buffered inputs help reduce the interference and potential timing corruption between the two recordings.



Figure 3.8. RO size optimization.



Figure 3.9. (a) Concept of the inherent DEM of the TDC and (b) behavioral simulation result of the mismatch-induced noise due to DEM.

The speed of the TDC is one critical parameter in this work. To expedite the conversion, two techniques, i.e., signal feed-forward and interpolation, are adopted to reduce the LSB size (the product of the LSB size and the total number of quantization steps determines the conversion speed). First, since PMOS is slower than NMOS, the input of the PMOS transistor of each inverter in the RO is 2 phases prior to that of the NMOS [70]. This feed-forward technique halves the inverter delay in this technology. Second, the 2× interpolator trailing the RO uses two output-shorted, half-size inverters to generate a middle phase [71]. With these two techniques, the LSB size of the TDC is cut down by 4 times from 16 ps to 4 ps in a 65 nm CMOS process.

Another important design consideration is the linearity of the TDC, which is degraded by the mismatch between the inverters of the RO, between the interpolators, and between the CMPs. In this work, besides the PD RO, two other techniques, i.e., resistive averaging and DEM, are employed to improve the linearity. First, averaging resistors are inserted after the interpolators and after the buffers to improve delay uniformity. Also, thanks to the double sampling and freerunning RO, the start and stop points of each quantization event are randomized from sample to sample, resulting in an inherent DEM operation. For example, as Figure 3.9(a) shows, three inputs of identical value (6t) are quantized by different sets of delay units of the RO due to its random initial phase. Assuming that the delay increases from  $\tau_0$  to  $\tau_5$  and decreases from  $\tau_6$  to  $\tau_{11}$ , then  $D_i = 7$ ,  $D_{i+1} = 6$ , and  $D_{i+2} = 5$ , which are randomized with a mean value of 6. Note that the DEM does not eliminate mismatch errors but convert them into white noise. A behavioral simulation was performed to determine the quantitative tradeoff between the mismatch and the SNR degradation due to the DEM. Figure 3.9(b) shows that larger mismatch gives rise to larger noise. In addition, the double sampling operation also leads to a 3-dB quantization noise penalty [60]. So in Fig. 14(b), the quantization noise is  $1/\sqrt{6} \approx 0.41$  LSBs.

The coarse TDC is a simple folded differential delay-line [72] consisting of two sub-DLs (i.e.,  $DL_P$  and  $DL_N$ ), each with a half number of delay stages (Figure 3.10). The START and

STOP signals are fed to the two sub-DLs in an opposite direction. The overall coarse output is obtained by differencing the output codes of the sub-DLs, i.e.,  $(D_{C1} - D_{C2})$ . Due to the overlap of the "0" codes of  $D_{C1}$  and  $D_{C2}$ , a negative offset of half a stage delay is instantiated in both sub-DLs to make the size of the "0" code equal to the other codes in the final coarse TDC output. Also, an extra delay stage is added to the end of the sub-DLs for over-/under-range detection.



Figure 3.10. (a) conventional delay-line and (b) folded differential delay-line.

Thanks to the double sampling in the fine TDC and the folded differential delay-line in the coarse TDC, the input can be either positive or negative differentially. So the magnitude span is only half of the full-scale range, i.e.,  $\pm$  36 LSBs or  $\pm$  150 ps. In addition, the CMP needs roughly 200 ps to complete one conversion and 50 ps to reset. Thus, the total conversion time is about 400 ps and the speed of the folding TDC is as high as 2.5 GS/s in a 65-nm CMOS technology. The gate count of the bit-alignment logic is 284 and the estimated power and area is 1.13mW and 184 $\mu$ m<sup>2</sup>, respectively.

# 3.3.3 VTC-TDC INTERFACE

A de-multiplexer (DEMUX), which works with a 2.5 GHz, 4-phase clock, is inserted to allocate signals from the single 10 GS/s VTC to the four 2.5 GS/s TDCs. Dynamic inverters are

used to allow only the rising edges to pass through while preventing the falling edges from disrupting the TDC's operation within the 400 ps conversion period. The circuit and timing diagram of the DEMUX is sketched in Figure 3.11.



Figure 3.11. Circuit and timing diagram of the 1-4 DEMUX.

Also, to match the output range of the VTC and the input range of the TDC, a TA is inserted right after the DEMUX. The TA in this work is a modified version of that reported in [55]. The circuit schematic and timing diagram of the TA are depicted in Figure 3.12, in which two discharging phases of two capacitors  $C_X$  and  $C_Y$  (in differential configuration) take place in turn during the amplification. The first one is an early discharging phase between  $t_1$  and  $t_2$  with a large slew rate after the first rising edge of the inputs arrives. For example, if  $V_{INA}$ 's rising edge arrives first, it will discharge  $C_X$  through  $M_{1A}$  and  $M_{2A}$  while keeping  $V_Y$  at  $V_{DD}$ . Then, when the rising edge of  $V_{INB}$  arrives, both capacitors will discharge together at an equal but small slew rate through  $M_{3A}$ ,  $M_{4A}$ ,  $M_{3B}$  and  $M_{4B}$ . When the two voltages cross the threshold of two TCDs at  $t_3$  and  $t_4$ , respectively, an amplified time difference is generated.



Figure 3.12. Simplified (a) circuit diagram and (b) timing diagram of the TA.

In this work, a gain of  $\sim 8 \times$  is chosen with an output time range of ±150 ps. To fine-tune this gain, two 4-bit DACs are connected in series with M<sub>2A</sub> and M<sub>2B</sub> to control the slew rate of the early discharging phase.

The overall timing diagram of one TDC channel is illustrated in Figure 3.13. The 4-phase clock  $\Phi_{2,i}$  resets the TDC first. After the reset, the first rising edge from the VTC passes the DEMUX while the following signals are blocked. It takes at least a 50 ps delay for the TA to output and the output time span of the TA (also the input time span of the TDC) is 150 ps. If the conversion period for the TDC is 400 ps, then only 150 ps are left for the CMPs in the worst

case. To alleviate the CMPs' timing constraint, the reset clock of the CMPs is delayed by 20 ps to increase the allocated time for the CMPs to 170 ps.



Figure 3.13. Timing diagram of one TDC channel.

### 3.3.4 PVT TUNING AND CALIBRATION

Most TD digitization techniques are subject to PVT variabilities. Several foreground tuning/calibration techniques are incorporated in this work to minimize the potential harmful impact of PVT variations.

For the VTC, simulation suggests its linearity is robust to the PVT variation. So the mismatch between the two pseudo differential paths in the VTC is of most concern. Such mismatch will increase  $2^{nd}$ -order harmonic (HD<sub>2</sub>). The threshold voltage of M<sub>D</sub>, V<sub>tn</sub>, is the most vulnerable to mismatch. Therefore, two 7-bit current DACs (controlled by scan chain register REG\_VTC<sub>P,N</sub>(6:0)) are employed in the source followers to tune I<sub>b</sub> and thus V<sub>D</sub> to compensate the mismatch of V<sub>tn</sub> differentially.

In addition, the threshold voltage mismatch of the TCD introduces inter-TDC offset errors. The PVT variations and mismatch of the TA and TDC also introduce gain error and offset errors between the four TDCs. To roughly match the TDC gains in the analog domain, a 4-bit DAC (controlled by scan chain register REG\_TA<sub>1-4</sub>(3:0)) is implemented in each TA. The fine gain-error and offset calibrations are performed in the digital domain.

The coarse TDC and fine TDC track each other well for PVT variations as they share similar delay stages. The DEM of the TDC can also absorb some mismatch. Thus, no special effort is paid in this work to compensate the TDC.

Lastly, as the goal of this work is to explore the feasibility of the proposed conversion technique, only foreground tuning and calibration are carried out with a sinusoidal input. More work will need to be done in the future to automate these calibration routines.

A flowchart is shown in Figure 3.14 for the calibration procedures. The scan chain is first initialized to set each register to its medium value. The gain and offset mismatch errors between the four TDCs are calibrated first, in which a sinusoidal curve fitting is performed to extract the errors. The REG\_TA<sub>1-4</sub>(3:0) is tuned to coarsely equalize the gains, followed by a fine gain and offset calibration in the digital domain. The P-N mismatch of the VTC is calibrated by tuning the REG\_VTC<sub>P,N</sub>(6:0). A binary search method is employed to minimize the HD2 iteratively.



Figure 3.14. Calibration flow chart.

# **3.4 MEASUREMENT RESULTS**

## **3.4.1 TESTBENCH SETUP**

The block diagram of testbench is plotted in Figure 3.15. This chip employs a single-ended low-jitter 10 GHz clock signal which is generated by the BERTscope (12500B). The input differential sinusoidal is generated by a signal generator (R&S SMB100A) and a broadband balun (HL9404) with low amplitude and phase imbalance up to 40 GHz bandwidth. The digital

outputs are picked up by a logic analyzer (Agilent 16902B) and analyzed by MATLAB program running on it.



Figure 3.15. Block diagram of testbench setup.



Figure 3.16. Photo of test board.

The photo of test board is shown in Figure 3.16. It consists of decoupling capacitors of the supplies, bias circuits, DC blocking capacitor of clock and off-chip termination network of the input. The input signals are AC coupled by the balun and biased by an external common-mode voltage through the off-chip termination resistors. Since the output impedance of the balun is 50 Ohms single-ended or 100 Ohms differential, the traces of the differential inputs are impedance matched to 50 Ohms with symmetric layout for delay matching.

The key of the setup is to maintain good amplitude and phase matching of the differential input signal since the signal frequency may be > 5 GHz. At 5 GHz, the wavelength is only 6 cm, so the even the length of the cable between the balun and the board needs to be matched. The residual mismatch will lead to even order distortions in measured spectrum with high frequency input signal due to sensitivity of the input common-mode variation with the pseudo-differential VTC architecture.

### **3.4.2 EXPERIMENTAL RESULTS**

The prototype ADC, fabricated in a 65 nm CMOS process, consumes a silicon area of 0.073 mm<sup>2</sup> (260  $\mu$ m × 280  $\mu$ m). A die photo is shown in Figure 3.17. The ADC performance was characterized at the sample rate of 10 GS/s. In the experiments, all necessary calibrations are administered manually, including the static offset and gain corrections of the four interleaved TDCs. The measured DNL and INL are +0.27/-0.28 LSBs and +0.48/-0.49 LSBs, respectively, for the 4-channel aggregate output (Figure 3.18(a)). The DNL and INL for the individual channels are shown in Figure 3.18(b).



Figure 3.17. Die photo.



Figure 3.18. Measured DNL and INL plots of (a) aggregate result and (b) individual channel results.



Figure 3.19. Measured output spectra at 10 GS/s with (a) low frequency input and (b) Nyquist input.

The output spectra corresponding to  $f_{in} = 100$  MHz and 5 GHz are given in Figure 3.19. For the low-frequency input, the measured SFDR and SNDR are 44.96 dB and 28.03 dB, respectively. For the Nyquist input, the measured SFDR and SNDR are 42.10 dB and 27.53 dB, respectively, indicating an ENOB of 4.28 bits. Three non-idealities can be observed in the ADC

output spectra. First, the second-order harmonic shows up for high input frequencies, which is caused by the phase and amplitude imbalance of the differential input signal due to the off-chip cable length mismatch. Secondly, the interleaving spurs at  $f_s/4 \pm f_{in}$  and  $f_s/2 \pm f_{in}$  can be seen with a high frequency input. This residual clock skew problem is introduced by the unintentional sharing of the power supplies of the DEMUX and the clock driver of the T/H. It causes an unwanted clock-timing modulation due to the channel-dependent supply noise. Re-simulation with mismatch introduced in the DEMUX confirms this hypothesis. Lastly, four small sidebands clustering around the fundamental tone are seen, no matter what the input frequency is. The frequency difference between these tones is 0.54 MHz indicating the input is likely modulated by 0.54-MHz interference. The source of this interference is believed to be the spurs in the clock source.



Figure 3.20. Measured dynamic performance.

The measured dynamic performance is plotted in Figure 3.20. The SFDR is much better than the SNDR, mostly attributable to the inherent DEM and the other two linearity improvement techniques. The power breakdown is shown in Figure 3.21. The ADC consumes 98 mW from a 1.3 V supply, in which the TDC consumes half of the total power. The VTC and DEMUX play a similar role as the three sampler banks in [14]. However, both in 65 nm CMOS technology, the VTC + DEMUX only consume 11.7 mW at 10 GS/s in this design whereas the consumption is 43.5 mW at 12.8 GS/s in [14]. This indicates that the TD approach exhibits a low power potential. With a finer technology in the future, for instance, in a 28-nm node, the supply voltage will further decrease and the gate capacitance will further scale down. Thus, the TD approach will scale in a similar way as digital circuits scale.



Figure 3.21. Power breakdown.

To characterize the noise performance, the mean square value of the measured noise is plotted vs. the normalized input time with a modulo-time plot [73] in Figure 3.22. The x-axis ranges from 0 to  $0.5\pi$ , corresponding to the output code ranging from 0 to 32. A straight line is

fitted to determine the period jitter  $J_C$  as suggested in Chapter 2. Because the RO traverses 24 LSBs per oscillation cycle, the jitter at code 32 (i.e.,  $0.5\pi$ ) is  $(32/24) \times J_C^2 = (1.5-0.8)$  LSB<sup>2</sup>. So  $J_C = 0.72$  LSB or 2.88 ps. However, this value is much larger than the simulated value, 74 fs. One possible cause of this degradation is the power-ground-coupled noise due to the sharing of the power supplies of the RO with other readout circuitry in the TDC. Figure 3.23 plotted the simulated RO noise ( $J_C$ ) growth with supply/ground and substrate coupled noise. It indicates a supply/ground noise of in hundred mV level. To avoid this in a future design, a totally separate supply is necessary for the RO. If this 2.88 ps jitter of the RO is added to Table 3.1, the overall SNR is limited to 29.8 dB. Also, considering the random mismatch within the ROs, the DEM will lead to more SNR degradations. These numbers are close to the measured 28 dB SNDR of the prototype.



Figure 3.22. Measured mean squared noise vs. normalized input time.

The total noise budget of this design is listed in Table 3.1. All the numbers are extracted from post-layout simulation and agree well with the calculations above. Here all the RMS jitters

are input-referred to the VTC, where TD signal is first generated. The signal swing is 40 ps peakto-peak, differential. So the RMS value of a full-scale sinusoidal input is 14.14 ps. While no random mismatch is included in the simulation, the systematic mismatch from layout also generates noise due to the DEM. The VTC noise here includes the S/H noise. The total noise sums to 0.286 ps, indicating an ideal SNR of 33.9 dB for an individual ADC channel.



Figure 3.23. Simulated J<sub>C</sub> growth vs. supply/ground and substrate noise.

| Table 5.1. ADE Noise Budget |       |       |                         |             |       |             |       |  |  |  |  |
|-----------------------------|-------|-------|-------------------------|-------------|-------|-------------|-------|--|--|--|--|
| VTC                         | DEMIT | TΔ    | Quantization            | Other Noise | DFM   | Total Noise | SNR   |  |  |  |  |
| VIC                         | DLMOA | 171   | Quantization            | Other Noise | DLIVI |             | SIVIC |  |  |  |  |
|                             |       |       |                         |             |       |             |       |  |  |  |  |
| $\langle \rangle$           | ( )   | ( )   |                         |             |       |             | (1D)  |  |  |  |  |
| (ps)                        | (ps)  | (ps)  | Noise (ps)              | (ps)        | (ps)  | (ps)        | (dB)  |  |  |  |  |
|                             |       |       |                         |             |       |             |       |  |  |  |  |
|                             |       |       |                         |             |       |             |       |  |  |  |  |
| 0 1 1 9                     | 0.031 | 0.063 | $0.144 \times \sqrt{2}$ | 0.108       | 0.096 | 0.286       | 33.9  |  |  |  |  |
| 0.117                       | 0.051 | 0.005 | 0.144.12                | 0.100       | 0.070 | 0.200       | 55.7  |  |  |  |  |
|                             |       |       |                         |             |       |             |       |  |  |  |  |
|                             |       |       |                         |             |       |             |       |  |  |  |  |

Table 3.1. ADC Noise Budget

The ADC measurement results are summarized and compared to a few state-of-the-art ~10-GS/s ADCs in Table 3.2. This work achieves a good SFDR with the minimum chip area among the counterparts. It should be mentioned that the area efficiency of this work is even better than in [15], which used a much more advanced technology. The figure of merit (FoM) of this work is also comparable to the other works.

| Work                    | This work     | [26]        | [13]   | [12]     | [15]     | [14]   |
|-------------------------|---------------|-------------|--------|----------|----------|--------|
| Architecture            | TD<br>Folding | TD<br>Flash | TI SAR | TI Flash | TI Flash | TI SAR |
| Technology (nm)         | 65            | 65          | 65     | 40       | 32 SOI   | 65     |
| Sample Rate (GS/s)      | 10            | 5           | 10     | 10.3     | 20       | 12.8   |
| ENOB (bits)@NQ          | 4.3           | 2.8         | 4      | 5.1      | 4.8      | 4.1    |
| SNDR (dB)@NQ            | 27.2          | 18.4        | 26     | 33       | 30.7     | 26.4   |
| SFDR (dB)@NQ            | 42.1          | 22.3        | 36     | NA       | 39.4     | 32.4   |
| DNL (LSB)               | 0.28          | 0.91        | 0.19   | NA       | 0.47     | 3      |
| INL (LSB)               | 0.49          | 0.95        | 0.65   | NA       | 0.42     | NA     |
| Power (mW)              | 98            | 35          | 79.1   | 240      | 69.5     | 162    |
| Area (mm <sup>2</sup> ) | 0.073         | 0.08        | 0.52   | 0.27     | 0.25     | 0.23   |
| FoM (fJ/step)           | 504           | 1000        | 480    | 700      | 124      | 740    |
| Skew Calibration        | No            | N/A         | Yes    | Yes      | Yes      | No     |

 Table 3.2.
 Performance Comparison

## 3.5 SUMMARY

A 10 GS/s 6 bit TD folding ADC is designed and verified for high-speed wireline communication application. Implemented in a 65-nm CMOS process, this work achieves highly competitive area efficiency among all recent ADC works of similar sample rate and resolution.

The single VTC front-end eliminates the input buffer and requires no clock-skew calibration at 10 GS/s. The RO-based folding TDC achieves high area efficiency and high speed simultaneously. The inherent DEM of the RO-based TDC also achieves very good linearity performance, manifested by the measured DNL of +0.27/-0.28 LSBs, INL of +0.48/-0.49 LSBs, and an over 42 dB SFDR up to the Nyquist frequency.

#### **CHAPTER 4**

### 2GS/S 8BIT RNS BASED FLASH ADC

### 4.1 INTRODUCTION

1-2GS/s, 8-10b ADCs are under great demand for applications such as ADC-based backplane receivers [5-7], 10GBASE-T [8-10] and direct RF sampling radios [4, 74]. Since the Flash ADCs can achieve the highest conversion rate without time-interleaving, they are widely used in such applications. However, its exponential dependence of comparator (CMP) number  $N_C$  on the resolution n ( $N_C \approx 2^n$ ) leads to high complexity and power consumption. This limits the resolution of gigahertz Flash ADCs to be 6~7 bit [75-78]. Meanwhile, the input bandwidth of Flash ADCs is degraded by the large CMP array directly loaded at the input node, blocking their application in direct RF sampling. Voltage-domain signal folding can potentially reduce the Flash complexity but suffers from the folder non-linearity problem, which requires complex calibration [79, 80]. Power hungry input buffers are still necessary to drive their large input loadings [79]. Time domain (TD) SR-latch (SRL) interpolation is an alternative way to reduce  $N_C$  [75-77]. But the resolution is limited to 6~7 bit due to the nonlinearity of such interpolation [76, 77].

In this chapter, an efficient RNS quantizer replaces the conventional folding architecture. Two small quantizers with 48 levels and 80 levels per fold generate 240 levels full scale (~8 bit resolution) based on RNS. Thus, its complexity and power consumption are greatly reduced. In addition, a TD approach provides both inherently linear folding characteristic by ring oscillator (RO) and low input capacitance and wide tracking bandwidth due to the simplicity of the frontend voltage-to-time converter (VTC).

A prototype RNS ADC is reported in this work. It achieves 2 GS/s sampling rate and 8 bit resolution without time-interleaving in 65 nm CMOS. The RNS architecture reduces the number of comparators while preserving the Flash speed of the converter. A time domain approach also helps to cut down its input capacitance and broaden the ERBW to 1.74 GHz. Furthermore, the DEM within the time-to-digital converter (TDC) provides good linearity of  $\pm 0.14$  LSB DNL and  $\pm 0.61$  LSB INL, respectively. A TD linear ramp generation and histogram based TDC mismatch calibration are employed to improve the ADC performance. Fabricated in a 65nm CMOS process with an area of 0.08 mm<sup>2</sup>, the prototype RNS ADC achieves an SNDR of 40.7 dB for a Nyquist input. This work reports the best Schreier FoM among all non-interleaved Flash/folding ADCs published at ISSCC and VLSI Symp.

# 4.2 RNS QUANTIZATION

### 4.2.1 CONCEPT OF RNS

RNS is a non-weighted number system which was proposed by Garner back in 1959 [81]. Unlike weighted number system such as binary, RNS offers parallel operation and carry propagation free property in addition, subtraction and multiplication [81, 82]. The concept of RNS quantization is shown in Figure 4.1(a). By dividing a positive real number X by L positive integers { $\Gamma_1$ ,  $\Gamma_2$ , ...,  $\Gamma_L$ }, a set of quotient { $q_1$ ,  $q_2$ , ...,  $q_L$ } and a set of remainder { $r_1$ ,  $r_2$ , ...,  $r_L$ } are generated. Where [83]=| X/{{ $\Gamma_i$ }} | and { $r_i$ }=X mod { $\Gamma_i$ }, i=1, ..., L. In RNS system, X can be represented by the remainders { $r_i$ } independently without the quotients. When all the moduli { $\Gamma_i$ } are co-prime and X is smaller than their least common multiple of  $\Gamma = \prod_{i=1}^{L} \Gamma_i$ , such representation (i.e., quantization) is unique. The transfer function (TF) of the RNS with a moduli set of {3, 5} is plotted in Figure 4.1(b). The TF is monotonic within the full range of  $\Gamma$ . The RNS breaks a large quantizer into several small one in parallel, which can potentially reduce the complexity of the quantizer. For example, let us assume two moduli  $\Gamma_1 = 3$  and  $\Gamma_2 = 5$ , and an input X of 8.3. So,  $q_1 = 2$ ,  $r_1 = 2.3$  and  $q_2 = 1$ ,  $r_2 = 3.3$ . Retaining the integer part (quantization) of the two remainders, then  $r_1 = 2$  and  $r_2 = 3$  jointly represent 8, the quantized value of X. Thus, all values from 0 to FS =  $\Gamma_1\Gamma_2 = 15$  can be quantized by only  $N_C = \sum_{i=1}^{L} \Gamma_i = 8$  comparators instead of 15 comparators in conventional Flash converter.



Figure 4.1. (a) Principle of RNS and (b) transfer function of RNS with moduli set {3, 5}.

An RNS decoder can be used to convert the remainder representation  $\{r_i\}$  to binary weighted number system. Alternatively, the remainder set  $\{r_i\}$  can also to directly fed to a RNS based digital signal processors (DSP) which provide fast arithmetic operations [82, 84].

# 4.2.2 REDUNDANCY OF RNS

Although the RNS shows great potential to reduce N<sub>C</sub> in Flash converters, its sensitivity to the noise in {r<sub>i</sub>} prevent its application in analog circuits. If the modulo operation is noisy or nonlinear, then the erroneous remainder is  $\hat{r}_i = r_i + \Delta r_i$ , where  $r_i$  is the ideal remainder and  $\Delta r_i$  is the noise in the modulo operation as Figure 4.2(a) shows. Assuming  $|\Delta r_i| \leq 1$ , a behavior simulation shows the TF in Figure 4.2(b). Except the desired TF colored in blue, there are several ambiguous error bands in parallel with it. For instance, with the same setting as previous example and  $\Delta r_1 = 1$  and  $\Delta r_2 = -1$ , so  $r_1 = (2+1) \mod 3 = 0$  and  $r_2 = (3-1) \mod 5 = 2$ . Then the decoded value is Y = 12 instead of the correct value 8. Such large errors are not acceptable in an RNS quantizer.



Figure 4.2. (a) RNS system with erroneous remainders and (b) the transfer function with  $|\Delta r_i| \le 1$ .
In practice, analog remainder generators (i.e., folding amplifier) are employed to do modulo operation. Since the noise and circuit imperfection are inevitable, an RNS quantizer dictates hardware redundancy to tolerate analog impairments in the remainder generator. Two redundancy methods are proposed in the past. The first method is the remainder number redundancy (RNR) [83] as shown in Figure 4.3(a) and the second is the remainder redundancy (RR) [85] as shown in Figure 4.3(b). In RNR, additional moduli are introduced, while in RR, a common multiplication factor M is introduced to all the moduli. Because the RNR method requires part of the remainders be completely error-free, it is not suitable for analog implementation where all remainders may contain errors due to circuit non-idealities. Hence, the RR method is employed in this work.



Figure 4.3. Redundant RNS systems: (a) RNR, the moduli are  $\{\Gamma_1, \Gamma_2, \Gamma_z\}$  and (b) RR, the moduli are  $\{M\Gamma_1, M\Gamma_2\}$ .

It had been proven that when adequate M is chosen, the RR RNS system is robust to remainder

errors (no several ambiguous error bands exist) [85]. According to [85], Y can be correctly decoded if and only if

$$\frac{M}{2} > \left| \Delta r_i - \Delta r_j \right|, \text{ for } 1 \le i, j \le L \text{ and } i \ne j$$
(4-1)

In practice, a more conservative statement is exploited

$$\frac{M}{4} > \left| \Delta r_i \right|, \text{ for } 1 \le i \le L$$
(4-2)

To justify this conclusion, a behavior simulation is conducted with  $\Gamma_1=3$ ,  $\Gamma_2=5$  and  $|\Delta r_i| \le 3.9$  as Figure 4.4 plotted. When M = 14, the redundancy is not adequate and ambiguous errors occur. When M = 16, all ambiguous errors are eliminated.



Figure 4.4. Simulated RNS TF with (a) M = 14 and (b) M = 16.

# 4.2.3 RNS DECODER

Given collectively all the remainders  $\{r_i\}$ , several algorithms exist for decoding Y, such as the Chinese Remainder Theorem (CRT) and the Mixed Radix Conversion (MRC) [86]. In this work, closed form robust CRT [85] is used to decode RNS system with redundancy.

To decode RNS into weighted number system,  $Y_i$  can be calculated with a certain remainder and the corresponding quotient. So, L decoded values [31] are available based on each remainder.

$$\begin{cases}
Y_{1} = q_{1}\Gamma_{1}M + \hat{r}_{1} \\
Y_{2} = q_{2}\Gamma_{2}M + \hat{r}_{2} \\
\vdots \\
Y_{L} = q_{L}\Gamma_{L}M + \hat{r}_{L}
\end{cases}$$
(4-3)

The quotients can be calculated as

$$\begin{cases} q_1 = \sum_{i=2}^{L} \left[ \left( \xi_{i,1} b_{i,1} \frac{\Gamma}{\Gamma_1 \Gamma_i} \right) \mod \left( \frac{\Gamma}{\Gamma_1} \right) \right] &, i = 1 \\ q_i = \frac{q_1 \Gamma_1 - \hat{r}_{i,1}}{\Gamma_i} &, i > 1 \end{cases}$$

$$(4-4)$$

And

$$\begin{cases} \boldsymbol{\xi}_{i,1} = \left(g_{i,1}\hat{r}_{i,1}\right) \mod \Gamma_{i} \\ \boldsymbol{r}_{i,1} = \left\|\frac{\hat{r}_{i} - \hat{r}_{1}}{M}\right\| \\ \boldsymbol{g}_{i,1} \equiv \Gamma_{1}^{-1} \left( \mod \Gamma_{i} \right) \\ \boldsymbol{b}_{i,1} \equiv \left(\frac{\Gamma}{\Gamma_{1}\Gamma_{i}}\right)^{-1} \left( \mod \Gamma_{i} \right) \end{cases}, 2 \leq i \leq L \tag{4-5}$$

Where ||x|| is the rounding function and  $y \equiv x^{-1} \pmod{R}$  is the modular multiplicative inverse operation, which is defined by

$$y \equiv x^{-1} (\operatorname{mod} R) \Leftrightarrow (xy) \operatorname{mod} (R) = 1$$
(4-6)

The parameter  $g_{i,1}$  and  $b_{i,1}$  are constant which can be pre-calculated to simplify the decoding process. In this design,  $\Gamma_1 = 3$  and  $\Gamma_2 = 5$ , and M = 16 are chosen. So

$$\begin{pmatrix}
 g_{2,1} = 2 \\
 b_{2,1} = 1
 \end{pmatrix}$$
(4-7)

Then

$$\begin{cases} q_1 = (2\hat{r}_{2,1}) \mod \Gamma_2 \\ q_2 = \frac{q_1\Gamma_1 - \hat{r}_{2,1}}{\Gamma_2} = (-2\hat{r}_{2,1}) \mod \Gamma_1 \end{cases}$$
(4-8)

For instance, with X = 124.3,  $r_1 = 28$  and  $r_2 = 44$ . So  $\hat{r}_{2,1} = 1$  and  $q_1 = 2$ ,  $q_2 = 1$ . Then, the decoded values are  $Y_1 = 3 \times 16 \times 2+28 = 124$  and  $Y_2 = 5 \times 16 \times 1+44 = 124$ . When considering the noise in the remainders, i.e.  $\Delta r_1 = 0.2$  and  $\Delta r_2 = -0.3$ , then  $Y_1 = 124.2$  and  $Y_2 = 123.7$  are decoded.

From the analysis and example above, we can observe that the noise of the decoded value  $Y_i$ only depends on the noise in corresponding remainder  $\hat{r_i}$ . Thus, a final averaging can be performed across all L decoded values [31] (Figure 4.5(a)), resulting in improved SNR performance [60, 85]. The noise power is attenuated by L times after averaging, so the SNR gain is

$$\Delta SNR = SNR - SNR \Big|_{L=1} = 10 \log L \tag{4-9}$$

## 4.2.4 RNS QUANTIZER PERFORMANCE TRADE-OFF

With the redundancy factor M, the full range of quantization FS and the comparator number  $N_{\rm C}$  are

$$\begin{cases} FS = M \times \prod_{i=1}^{L} \Gamma_i \\ N_C = M \times \sum_{i=1}^{L} \Gamma_i \end{cases}$$
(4-10)

We can prove that for a certain FS and M, N<sub>C</sub> will be the smallest when all the moduli are close to  $\Gamma_i \approx \sqrt[L]{FS/M} = \sqrt[L]{\prod_{i=1}^{L} \Gamma_i}$ . For M = 2<sup>m</sup>,

$$N_{C} \approx 2^{m} \times L \times \sqrt[L]{\frac{2^{n}}{2^{m}}}$$

$$= L \times 2^{m + (n - m)/L}$$
(4-11)

Figure 4.5(b) plots the N<sub>C</sub> of RNS quantizer with M = 16 versus resolution n-bit for L = 1, 2 and 3. When L = 1, the quantizer is conventional Flash quantizer. It suggests that for  $n \ge 8$ , an RNS ADC with two or more moduli can greatly reduce N<sub>C</sub>, resulting in large hardware savings.



Figure 4.5. (a) RNS with output averaging and (b)  $N_C$  vs. resolution n.

Although the RNS quantizer's power consumption will slightly increase when L is larger, its SNR will also be improved according to the analysis above. So the figure of merit (FoM) of the RNS quantizer is of interest. The Walden FoM is

$$FoMw = \frac{P}{f_s \times 2^{ENOB}} = \frac{P}{f_s \times 2^{\frac{SNR-1.76}{10\log 4}}}$$
(4-12)

So, the ratio of FoMw and FoMw(L=1) can be expressed as

$$\frac{FoMw}{FoMw|_{L=1}} = \frac{P}{P|_{L=1}} \times \frac{1}{2^{\frac{\Delta SNR}{10\log 4}}}$$
$$= \frac{L \times 2^{m+(n-m)/L}}{2^n} \times \frac{1}{2^{\frac{10\log L}{10\log 4}}}$$
$$= \sqrt{L} \times 2^{(m-n)(1-1/L)}$$
(4-13)

This ratio will achieve its minimum value when

$$L = (n - m)\ln 4 \tag{4-14}$$

Meanwhile, the Schreier FoM is

$$FoMs = SNR + 10\log\frac{BW}{P}$$
(4-15)

So, the gain of FoMs in dB can be expressed as

$$\Delta FoMs = \Delta SNR - 10 \log \left( \frac{P}{P|_{L=1}} \right)$$
  
= 10 log L - 10 log (L × 2<sup>(m-n)(1-1/L)</sup>)  
= 10 log (2<sup>(n-m)(1-1/L)</sup>) (4-16)

For a certain n and m, the Schreier FoM has a maximum improvement of 3(n-m) dB when L increases to infinity. Figure 4.6 plots the FoM improvement versus L for 6-10 bit RNS quantizer with M = 16.



Figure 4.6. (a) FoMw improvement vs. L and (b) FoMs gain vs. L.

The above analysis is based on the assumption that all the moduli are close to  $\Gamma_i \approx \sqrt[L]{FS/M} = 2^{(n-m)/L}$ . However, since all the moduli need to be co-prime, such moduli may be hard to choose when L increases. For example, if n = 10 and m = 4, then L could only be 2 or 3, which give rise to a moduli set of {7, 9} or {3, 4, 5}, respectively. L = 4 yields no valid moduli set. To increase the maximum valid L, larger resolution (n) or smaller redundancy (M = 2<sup>m</sup>) is necessary.

In this design, n = 8 is selected to best leverage the resolution and the speed of the prototype RNS ADC. M = 16 is chosen to tolerate ±4 LSB error in the remainders. Thus, the only valid L = 2 yields a choice of  $\Gamma_1=3$ ,  $\Gamma_2=5$ , which means 240 quantization levels with an N<sub>C</sub> of  $M \times (\Gamma_1 + \Gamma_2) = 128$ . While a smaller M or a larger L can result in more improvement, we choose to employ 4× TD SR-latch (SRL) interpolation [76] in this design to further cut down N<sub>C</sub> to 32 and provide part of the redundancy.

#### 4.3 TIME DOMAIN APPROACH

The most critical part in an RNS ADC is the remainder generator which offers signal folding property. Due to the error sensitivity of RNS, the linearity of the folding circuit must be very good to meet the requirement. Several works had tried to build an RNS ADC in the conventional voltage-domain in the past [87, 88]. However, the voltage-domain folder is not a suitable choice due to its nonlinear characteristic. Thus, none of these previous works can demonstrate a feasible RNS ADC.

Alternatively, TD folding using ring oscillators (RO) presents a viable way of realizing the remainder circuit because it is inherently linear [60]. In addition, RO based TD folding is also more efficient than the voltage-domain folding which requires increasing number of folding amplifiers with the growth of resolution. With TD folding, this work reports the first RNS ADC fabricated and verified in silicon.

The RNS ADC consists of the VTC, the TDC, the calibration DAC and engine, thermometer-to-binary circuits and the RNS decoder as Figure 4.7 illustrated, where the calibration engine and RNS decoder are implemented off-chip. At the core of the TDC are two sub-TDCs with a 6-stage and a 10-stage pseudo-differential (PD) ROs as TD signal folders. The comparators and fourfold SRL interpolation trailing the RO act as the TD quantizer. The raw thermometer codes of the TDC are transformed to binary format and directly sent to the histogram based calibration engine. The RNS decoder then converts the raw data to weighted number and averages it to the final output.



Figure 4.7. Block diagram of proposed RNS ADC.

# 4.3.1 VTC

The recently published VTCs in both architectures are compared in Table 4.1.

|                                 | This      | TVLSI'16  | CICC'15   | CICC'13 | ISCAS'08 |
|---------------------------------|-----------|-----------|-----------|---------|----------|
|                                 | Work      | [62]      | [89]      | [26]    | [63]     |
| Architecture                    | LR        | CSI       | CSI       | CSI     | LR       |
| Technology                      | 65nm      | 65nm      | 65nm      | 65nm    | 130nm    |
| Sample Rate (GS/s)              | 2         | 5         | 10        | 5       | 0.08     |
| Bandwidth (GHz)                 | 1.74      | 2         | 5         | 2.1     | 0.04     |
| -THD (dB) @ LF/NQ               | 48.1/46.1 | NA        | 41.6/36.6 | 27.7/NA | NA       |
| SFDR (dB) @ LF/NQ               | 53.8/48.4 | 42.4/30.5 | 45.0/42.1 | 29.4/NA | 55/NA    |
| Input Range (V <sub>P-P</sub> ) | 1.2       | 0.36      | 0.6       | 0.2     | NA       |
| Output Range (ps, diff)         | 480       | 100       | 40        | 50      | NA       |
| Power (mW)                      | 1.9       | 4.3       | 5.6       | 4.0     | 1        |

 Table 4.1. VTC Performance Comparison

The VTC can be potentially designed in two ways, based on the current starved inverter (CSI) [25, 26, 62] and linear ramp (LR) [63]. Although the CSI based VTCs are widely used in multi-gigahertz TD ADCs, their inherent non-linear operation limits the achievable resolution to be only about 6-bit even linearity compensation is introduced (Chapter 3). Their output ranges are also small, which make the design of TDC more challenging. On the other hand, LR VTCs can provide better linearity and larger output range [63]. However, the speed of previous works is much lower than Gigahertz level. This work presents an SS based VTC with 2 GS/s sampling rate while provides 8 bit linearity.



Figure 4.8. (a) Circuit and (b) timing diagram of the VTC.

The circuit and timing diagrams of the proposed SS VTC are shown in Figure 4.8(a). It contains three parts, a sample and hold (S/H), a discharger (ID) and a threshold-crossing detector (CD). The input voltage is bottom-plate sampled onto  $C_S$  and then discharged by ID with a constant current. Finally, the CD generates a time difference signal  $t_{out,\pm}$ , corresponding to the

differential input voltage. Simple inverters are used in the CD. Due to the PD operation, their PVT variations only introduce static offset in  $t_{out,\pm}$ . Simulation reveals a peak offset time of ±16 ps or ±8 LSB.

In the ID,  $C_B$  is charged to a reference voltage  $V_R$  and  $V_{gs}$  of the discharging device  $M_D$  is discharged to zero when  $\Phi_2$  is low. Then  $V_{gs}$  of  $M_D$  is biased to a constant value when  $\Phi_2$  is high and  $C_B$  shared its change with  $C_{gs}$  of  $M_D$ . Since the gain of the VTC is

$$A_{VTC} = \frac{t_{out}}{V_{in}} = \frac{C_s}{I_D} = \frac{2C_s}{\mu C_{ox} (W_D / L_D) V_{ov,D}^2}$$
(4-17)

Where

$$V_{ov,D} = \left(\frac{C_B}{C_B + C_{gs}}\right) V_R - V_{t,D}$$
(4-18)

So  $V_R$  sets the gain of the VTC. The VTC's linearity relies on the linearity of the discharging current of  $M_D$ . To improve the linearity, the channel length of  $M_D$  is set to 500 nm. In addition, the trip voltage of the first inverter in the CD (the  $V_{TH}$  in Figure 4.8) is skewed up to 650 mV to prevent  $M_D$  entering the triode region before the rising edges of  $t_{out,\pm}$  are generated. The SFDR and THD of the proposed VTC are simulated with process variation and mismatch. As Figure 4.9 shows, an SFDR above 50 dB and a –THD above 47 dB are achieved.

Unlike the large input capacitance in the conventional Flash ADCs, the size of  $C_S$  here is primarily determined by the noise requirement. Thus, the input bandwidth of this work is greatly expanded. According to Chapter 2, the input referred noise of the PD VTC is

$$V_{n,in}^{2} = \frac{2kT}{C_{s}} \left( 1 + 2\gamma \frac{V_{DD}}{V_{ov,D}} \right).$$
(4-19)

Where  $\gamma$  and V<sub>ov</sub> are the thermal noise coefficient and overdrive voltage of M<sub>D</sub>. With the input voltage range of 1.2 V<sub>P-P,diff</sub> and C<sub>S</sub> of 64 fF single-ended, the SNR of the VTC is 47.8 dB.



Figure 4.9. Simulated SFDR and THD of the VTC with process variation and mismatch. **4.3.2** TDC

# In each sub-TDC, a PD RO functions as TD signal folder. CMPs and SRLs follow the RO to quantize the phase of it. With a choice of $\Gamma_1=3$ , $\Gamma_2=5$ , a 6-stage and a 10-stage PD ROs are employed to provide a redundancy factor of 4×, including the inherent 2× redundancy factor of the RO [60]. Feed-forward inverters are used in the RO to reduce the LSB size of the TDC to 8 ps, which is almost half of inverter delay in 65 nm CMOS [70].

A fourfold interpolation is introduced not only further reducing the LSB size to 2 ps, also providing another redundancy factor of 4×. One potential way is cascading two stages of shorted inverter based 2× interpolators [71] or resistor ladder based 4× interpolators [44] before the CMPs. Although this way can offer better matching when resistive averaging is introduced, it is not power efficient due to the extra power consumption of the interpolators and the quadruple comparator number. So in this work, a more power efficient way, 4× TD SRL interpolation [76] is employed (Figure 4.10). Three SRLs (NOR gate) with build-in offset of  $\{-\Delta, 0, \Delta\}$  are inserted between the outputs of two adjacent CMPs. To save power, minimum sized SRLs are used.



Figure 4.10. (a)  $4 \times$  TD SRL interpolation and (b) SRL with build-in offset.

Two bunches of CMPs and SRLs triggered by the differential output of the VTC are trailing each RO, which leads to double sampling. The double sampling helps to improve the conversion rate of the TDC by cutting down the full input time range in half with differential format. But it will introduce a 3 dB SNR loss [60]. Thanks to the RNS architecture, this problem is solved in this work when the decoded data  $Y_1$  and  $Y_2$  are averaged. Since the ROs are free running, the starting and stopping points of each time interval to be quantized are randomized, resulting in built-in dynamic element matching (DEM) and thus improved TDC linearity.

The DEM transforms the mismatch to white noise. To minimize such noise, calibration is introduced to combat the mismatch of ROs and offset of CMPs. The SRLs are left un-calibrated, so they introduce an RMS noise of 0.26 LSB due to DEM.

The noise budget of each TDC is listed in Table 4.2. It indicates a total RMS noise of 0.63 LSB (zero input) to 0.65 LSB (full scale input). In the averaged decoded output, all the noise powers (except the VTC noise) are halved. So the RMS noise of the averaged output is 0.49 LSB at zero input. With the full scale of the signal of  $\pm 120$  LSB, the peak SNR is about 44.8 dB.

| Source |                | RMS Noise (LSB)        |            |  |  |
|--------|----------------|------------------------|------------|--|--|
|        |                | Not averaged           | Averaged   |  |  |
| VTC    |                | 0.27                   | 0.27       |  |  |
|        | RO             | 0.10~0.165             | 0.07~0.117 |  |  |
| TDC    | CMP            | 0.035×√2               | 0.035      |  |  |
|        | SRL            | $0.07 \times \sqrt{2}$ | 0.07       |  |  |
|        | ε <sub>q</sub> | 0.29×√2                | 0.29       |  |  |
|        | DEM            | 0.26×√2                | 0.26       |  |  |
| Total  |                | 0.63~0.65              | 0.49       |  |  |

Table 4.2. ADC Noise Budget

The simulated mismatch and noise of the RO, CMP and SRL are summarized in Table 4.3.



Table 4.3. Simulated mismatch and noise performance of RO, CMP and SRL.

#### 4.3.3 TDC MISMATCH CALIBRATION

Figure 4.11(a) shows that the random mismatches within the RO cells and the comparators can lead to DNL/INL errors in the TDC. Since the differential phase pair  $\Phi_i$  and  $\Phi_i$  bar of the RO are fed to the CMP, their crossing point shift due to mismatch introduces  $\Delta t_{RO}$ . Also, the CMP input referred offset voltage V<sub>OS</sub> can be lumped to the TD mismatch for the first order. If we define the DNL of TDC as  $\Delta t_{eff}$ , then

$$\Delta t_{eff} = \Delta t_{RO} + \Delta t_{OS} \approx \Delta t_{RO} + V_{OS} / (2 \times S_{RO})$$
(4-20)

With  $S_{RO1} = (10.3+11.9)/2 = 11.1 \text{ GV/s}$ ,  $S_{RO2} = (9.5+13.3)/2 = 11.4 \text{ GV/s}$ , Monte Carlo simulation suggests a 1.61 ps standard deviation of  $\Delta t_{OS}$  and a 1.28 ps standard deviation of  $\Delta t_{RO}$ . As a result,  $\Delta t_{eff}$  has a standard deviation of 2.06 ps. The peak-to-peak value of  $\Delta t_{eff}$  could exceed ±4 LSB redundancy where the LSB size is 2 ps with 4× SRL interpolation.  $\Delta t_{eff}$  could be reduced by increasing the  $S_{RO}$  with more power consumption. Alternatively, mismatch calibration is more economic to make sure  $\Delta t_{eff}$  within the redundancy.



Figure 4.11. (a) TDC mismatch model and (b) mismatch extraction method with TD ramp.

In this work, an auxiliary input pair controlled by a DAC is added to each CMP for mismatch calibration. The size of the auxiliary input pair is a quarter of the main input pair. A calibration DAC with  $\pm 300$  mV range and 5bit accuracy is employed to provide  $\pm 5$  LSB calibration range with an accuracy of 0.3 LSB.

A linear TD ramp is first fed to the TDC as Figure 4.11(b) illustrates. With the RO free running and the VTC input set to zero, the RO frequency  $f_0$  will beat with the sampling clock  $f_s$ . The RO phase will then accumulate linearly and all codes will be hit periodically. The histogram of the raw codes directly reflects the DNL of the TDC, which is employed to control the calibration DAC after digital processing. The histogram of a simulation of ideal TDC with only P-side of the auxiliary pair of first CMP, Vtp<0>, biased at ±200 mV is plotted in Figure 4.12. It indicates that the four LSBs interpolated from the same CMP pair (CMP<sub>i</sub> and CMP<sub>i+1</sub> in Figure 4.10) will experience same size increasing or decreasing due to the RO/CMP mismatch  $\Delta t_{eff}$ . The DAC voltages can be calculated from the histogram of sum of four interpolated LSBs from the same CMP pair with a certain gain  $g_{cal}$ . The measured TDC RMS noise versus  $g_{cal}$  is plotted in Figure 4.13(a). With a  $g_{cal}$  of 150 to 200, good calibration accuracy is achieved.

However, this sample gain calculation is not accurate enough due to the nonlinearity of the auxiliary pair and the TD SRL interpolation. So a binary search algorithm is employed to approximate the best calibration accuracy. The DAC voltages of each CMP's auxiliary pair are accumulated iteratively by an attenuation factor u. The DAC voltage after j-time iteration is

$$V_{cal,j} = g_{cal} \times \left( H_{cal,0} + \sum_{i=1}^{j} u \times H_{cal,i} \right)$$
(4-21)

Where  $H_{cal,i}$  is the histogram of i-th iteration. Figure 4.13(b) plots the measured TDC RMS noise versus iteration with different u. It indicates that with  $u = 0.6 \sim 0.8$ , the lowest RMS noise, or the best calibration accuracy, can be achieved.





Figure 4.13. Measured (a) TDC RMS noise vs. g<sub>cal</sub> and (b) TDC RMS noise vs. iteration.

Figure 4.14(a) shows the histogram of the first bunch of CMPs and SRLs of TDC2 before and after calibration. Figure 4.14(b) shows the RMS noise and SNDR versus 10-time iteration. Before iteration 3, the mismatch is out of redundancy, so the RMS noise is large and the SNDR drops dramatically. While after it, they converge to 0.56 LSB and 41 dB, respectively. The RMS noise of the TDC1 after calibration is about 0.72 LSB, which is a little bit larger than the simulated value. This is caused by the incompletely calibration in the first bunch of CMPs and SRLs of TDC1. The DAC voltage after calibration is plotted in Figure 4.15.



Figure 4.14. Measured (a) normalized raw code histogram (b) calibration convergence curves.



Figure 4.15. (a) Target DAC voltage convergence curve and (b) incompletely calibrated histogram.

As Figure 4.15 plots, 3 of the target DAC voltages exceed the maximum range of the calibration DAC, resulting in residue non-uniformity in the histogram after calibration. This under-estimation of mismatch can be easily solved by increasing the size of the auxiliary input pair or calibration DAC range.

## 4.4 MEASUREMENT RESULTS

## 4.4.1 TESTBENCH SETUP

The block diagram of the testbench is plotted in Figure 4.16. The 2 GHz clock signal is generated by the R&S SMB 100A signal generator. The input sinusoidal is generated by a signal generator (Agilent E6267D) and transformed to differential a broadband balun (HL9404). The digital outputs are picked up by a logic analyzer (Agilent 16902B). An MATLAB program running on the logic analyzer accesses the data and controls an NI DAQ6008 to calibrate the ADC with scan chain.



Figure 4.16. Block diagram of testbench setup.

The logic analyzer, the MATLAB code, the DAQ and the test chip make a closed calibration loop, where the MATLAB program acts as the main controller. In the calibration mode, the input signal magnitude is set to zero by simply disabling the output of signal generator. So the linear TD ramp is generated for the TDC. Then, the MATLAB code acquires the data from the logic analyzer and calculates the histogram. By computing the target DAC voltage necessary for calibration, the MATLAB code then controls the DAQ to send a serialized binary sequence to the test chip to refresh the register of the calibration DAC. A FOR loop is employed in the MATLAB code to do this procedure recursively. In this way, the testbench can calibrate the ADC chip automatically. Figure 4.17 shows the photo of the test board.



Figure 4.17. Photo of test board.

256K samples are used to calculate the histogram. With 2 GHz clock frequency and  $16 \times$  decimation of output data, it will take about 2 ms to collect the data. However, limited by the

computing capacity of the logic analyzer and the latency of the interaction between modules, each calibration run would takes about 1 minute.

# 4.4.2 EXPERIMENTAL RESULTS

The prototype RNS ADC is fabricated in a 65 nm CMOS process with an active area of  $0.08 \text{ mm}^2$  (380 µm × 210 µm). A die photo is shown in Figure 4.18. At 2 GS/s, the measured power consumption is 21 mW. Out of which, 1.9 mW is from the VTC and clock generator with a 1.2V supply, 7.6 mW and 8.5 mW are from the TDC1 and TDC2 with a 1.3V supply, respectively, and 3.0 mW is from the digital circuits with a 1.2V supply. The breakdown of power consumption is shown in Figure 4.19.



Figure 4.18. Die photo.

Thanks to the DEM, the measured DNL and INL after calibration (Figure 4.20) are  $\pm 0.14$  and  $\pm 0.61$  LSBs, respectively.



Figure 4.19. Power breakdown.



Figure 4.20. Measured DNL and INL plots.

Figure 4.21 shows the measured ADC spectra with a 100MHz input and Nyquist input, respectively.



Figure 4.21. Measured output spectra at 2 GS/s with (a) low-frequency input and (b) Nyquist input.

The ADC achieves a 41.4 dB SNDR with a low-frequency input and a 40.7 dB SNDR with a Nyquist input. The -48.1dB THD indicates good linearity performance of the VTC. Figure 4.22 plots the dynamic performance of the ADC, revealing an ERBW of 1.74 GHz.



Figure 4.22. Measured dynamic performance.

The noise of the RO is also studied as the mode time plot shows in Figure 4.23. The two ROs the in this design are powered with independent supplies in triple wells. So the supply/substrate noise coupling problem like the first design is eliminated. No significant difference between the RMS noise of TDC1 at zero input and at full scale input. This means the RO noise does not dominate in the total noise because the noise of the RO will increase with the growth of the input magnitude (Chapter 2).



Figure 4.23. Mod time plot of RMS noise of TDC1 vs. signal magnitude.

Figure 4.24 shows the measured SNDR variations vs. the temperature and supply voltage after a one-time calibration performed at 25°C and 1.3V supply. A <1dB SNDR variation is achieved from 0°C to 60°C and from 1.3V to 1.42V. The SNDR variation mainly results from the temperature and supply dependence of the RO and the SRL interpolation. The former one affects the oscillation frequency of RO,  $f_0$ , which leads to a gain variation of the RNS ADC. Because of the mobility-dominated temperature dependence of the inverters in the RO,  $f_0$  shows a CTAT characteristic [90]. The latter one affects the DNL of the ADC and will lead to dramatically SNDR droop when the DNL becomes larger than the redundancy coverage range. So if assuming the SNDR variation is dominated by the RO from 0°C to 65°C, the temperature sensitivity of the RO can be estimated as

$$TC_{f_o} = \frac{10^{\frac{\Delta SNDR}{20}} - 1}{\Delta T} = -2300 \, ppm \, / \, ^{o}C$$
(4-22)



Figure 4.24. Measured SNDR variations vs. the temperature and supply voltage.

For the supply voltage variation, we can assume it's dominated by the RO from 1.25V to 1.35V (SRL interpolation dominates after 1.35V), then the supply sensitivity of the RO can be estimated as 2647 ppm/°C. These values agree with the measured data in 65 nm process reported in [90]. For more than 8 bit RNS ADC, techniques such as PLL or replica circuits can be employed to track the RO temperature and supply variation.

The performance of the RNS ADC prototype is summarized and compared with the state ofthe-art non-interleaved 6-8b, >1GS/s Flash/folding ADCs in Table 4.4. This works reports the first non-interleaved 2GS/s 8b Flash ADC in CMOS technology with excellent DNL performance. The input capacitance of this work is the smallest if the resolutions of the counterparts are normalized to 8 bit, resulting in the highest bandwidth among works in bulk CMOS.

|                                 | This      | ISSCC'14   | VLSI'12 | VLSI'13   | ASSCC'15  | VLSI'09   |
|---------------------------------|-----------|------------|---------|-----------|-----------|-----------|
|                                 | Work      | [25]       | [75]    | [78]      | [76]      | [77]      |
| Architecture                    | RNS TD    | Folding TD | Flash   | Flash     | Flash     | Folding   |
| Technology                      | 65nm      | 40nm       | 40nm    | 32nm SOI  | 65nm      | 90nm      |
| Sample Rate (GS/s)              | 2         | 2.2        | 3       | 5         | 3.4       | 2.7       |
| Res. (Bits)                     | 7.93      | 7          | 6       | 6         | 6         | 6         |
| Input Cap. (fF)                 | 128       | 300        | 72      | 135       | 120       | NA        |
| Input Range (V <sub>P-P</sub> ) | 1.2       | 1.0        | 0.5     | NA        | 0.8       | 0.8       |
| Bandwidth (GHz)                 | 1.74      | 1.2        | 1.5     | 2.5       | 1.8       | 1.35      |
| SNDR (dB) @NQ                   | 40.7      | 37.4       | 33.1    | 30.9      | 34.2      | 33.6      |
| ENOB (Bits) @NQ                 | 6.5       | 5.9        | 5.2     | 4.8       | 5.4       | 5.3       |
| DNL/INL (LSB)                   | 0.14/0.61 | 0.6/1.0    | NA/0.35 | 0.52/0.37 | 0.48/0.64 | 0.53/0.73 |
| Supply Voltage (V)              | 1.2/1.3   | 1.1        | 1.1     | 0.85      | 1.0       | 1.0       |
| Power (mW)                      | 21        | 27.4       | 11      | 8.5       | 12.6      | 50        |
| Area (mm <sup>2</sup> )         | 0.08      | 0.052      | 0.021   | 0.02      | 0.034     | 0.36      |
| Walden FoM (fJ/step)            | 119       | 210        | 100     | 59.4      | 89        | 470       |
| Schreier FoM (dB)               | 150.6     | 143.3      | 144.4   | 145.6     | 145.7     | 137.9     |

Table 4.4. Performance Comparison

Thanks to the RNS architecture, this work reports the best Schreier FoM among all noninterleaved Flash/folding ADCs published at ISSCC and VLSI Symp. from 1997 to 2016 [91] as plotted in Figure 4.25.



Figure 4.25. Schreier FoM of all non-interleaved Flash/folding ADCs published at ISSCC and VLSI Symp. from 1997 to 2016.

#### 4.5 SUMMARY

A novel Flash ADC using the RNS architecture is proposed in this work. It achieves 2 GS/s sampling rate and 8 bit resolution without time-interleaving in 65 nm CMOS. The RNS architecture reduces the number of comparators while preserving the Flash speed of the converter. A time domain approach also helps to cut down its input capacitance and broaden the ERBW to 1.74 GHz. Furthermore, the DEM within the TDC provides good linearity of  $\pm 0.14$  LSB DNL and  $\pm 0.61$  LSB INL, respectively. A TD linear ramp generation and histogram based TDC mismatch calibration are employed to improve the ADC performance. Fabricated in a 65nm CMOS process with an area of 0.08 mm<sup>2</sup>, the prototype RNS ADC achieves an SNDR of

40.7 dB for a Nyquist input. This work reports the best Schreier FoM among all non-interleaved Flash/folding ADCs published at ISSCC and VLSI Symp.

#### **CHAPTER 5**

#### CONCLUSION

Two high speed ADCs employing TD approach are presented in this dissertation for next generation wireline/wireless communication system. The novel TD architecture shows unique benefits in such applications than the conventional voltage domain ADCs. More specifically, a TD folding method based on the RO is chosen as the fundamental these two works. The background of TDC development is reviewed. The noise and linearity of the key components are also analyzed to present a guideline to high speed converter design.

In the first chip, a 10 GS/s 6 bit TD folding ADC is implemented in a 65-nm CMOS process for high speed PAM-4 wireline receivers. The single high speed VTC front-end eliminates the clock-skew problem in other 10+ GS/s ADCs. The inherent DEM of the RO-based TDC also achieves very good linearity performance, manifested by the measured DNL of +0.27/-0.28 LSBs, INL of +0.48/-0.49 LSBs, and an over 42 dB SFDR up to the Nyquist frequency. This work achieves the minimum chip area among all recent state-of-the-art ~10-GS/s ADCs with good SFDR. The figure of merit of this work is also comparable to the other works

A novel Flash ADC using the RNS architecture is proposed in the second chip. It achieves 2 GS/s sampling rate and 8 bit resolution without time-interleaving in 65 nm CMOS. An ERBW of 1.74 GHz enables it in the next RF sampling wireless radio/ RADAR system. The RNS architecture reduces the number of comparators while preserving the Flash speed of the converter. A time domain approach also helps to cut down its input capacitance and broaden the ERBW. Furthermore, the DEM within the TDC provides good linearity of  $\pm 0.14$  LSB DNL and  $\pm 0.61$  LSB INL, respectively. A TD linear ramp generation and histogram based TDC mismatch

calibration are employed to improve the ADC performance. The RNS ADC achieves an SNDR of 40.7 dB for a Nyquist input. This work reports the best Schreier FoM among all non-interleaved Flash/folding ADCs published at ISSCC and VLSI Symp.

Due to the digital-like property of TD circuits, the TD ADC will scale well in advanced technology nodes. So a better area efficiency and figure of merit is potentially achievable in future technology. The TD ADCs proposed in this dissertation are still vulnerable to the PVT variation and poor matching. In the future work, replica and PLL can be adopted to track and stabilize the PVT variation. New calibration methods can also be introduced to make TD ADCs more accurate for more resolution.

#### REFERENCES

- [1] Hoffmann, T., et al. *IMPACT A common building block to enable next generation radar arrays.* in 2016 IEEE Radar Conference (RadarConf). 2016.
- [2] Instruments, T., *RF-Sampling and GSPS ADCs Breakthrough ADCs Revolutionize Radio Architectures*. 2012.
- [3] Chen, S. and B. Murmann. An 8-bit 1.25GS/s CMOS IF-sampling ADC with background calibration for dynamic distortion. in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC). 2016.
- [4] Straayer, M., et al. 27.5 A 4GS/s time-interleaved RF ADC in 65nm CMOS with 4GHz input bandwidth. in 2016 IEEE International Solid-State Circuits Conference (ISSCC). 2016.
- [5] Yueksel, H., et al. A 3.6pJ/b 56Gb/s 4-PAM receiver with 6-Bit TI-SAR ADC and quarterrate speculative 2-tap DFE in 32 nm CMOS. in ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC). 2015.
- [6] Shafik, A., et al. 3.6 A 10Gb/s hybrid ADC-based receiver with embedded 3-tap analog FFE and dynamically-enabled digital equalization in 65nm CMOS. in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 2015.
- [7] Raghavan, B., et al. A 125 mW 8.5-11.5 Gb/s serial link transceiver with a dual path 6-bit ADC/5-tap DFE receiver and a 4-tap FFE transmitter in 28 nm CMOS. in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). 2016.
- [8] Mulder, J., et al. 26.3 An 800MS/S 10b/13b receiver for 10GBASE-T Ethernet in 28nm CMOS. in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 2015.
- [9] Gupta, T., et al. A sub-2W 10GBase-T analog front-end in 40nm CMOS process. in 2012 IEEE International Solid-State Circuits Conference. 2012.
- [10] Gupta, S.K., et al. 10GBASE-T for 10Gb/s full duplex ethernet LAN transmission over structured copper cabling. in 2008 IEEE Radio Frequency Integrated Circuits Symposium. 2008.
- [11]Zhang, B., et al., A 40 nm CMOS 195 mW/55 mW Dual-Path Receiver AFE for Multi-Standard 8.5-11.5 Gb/s Serial Links. IEEE Journal of Solid-State Circuits, 2015. 50(2): p. 426-439.
- [12] Verma, S., et al. A 10.3GS/s 6b flash ADC for 10G Ethernet applications. in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. 2013.

- [13] Tabasy, E.Z., et al. A 6b 10GS/s TI-SAR ADC with embedded 2-tap FFE/1-tap DFE in 65nm CMOS. in 2013 Symposium on VLSI Circuits. 2013.
- [14] Duan, Y. and E. Alon, A 12.8 GS/s Time-Interleaved ADC With 25 GHz Effective Resolution Bandwidth and 4.6 ENOB. IEEE Journal of Solid-State Circuits, 2014. 49(8): p. 1725-1738.
- [15] Chen, V.H.C. and L. Pileggi. 22.2 A 69.5mW 20GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32nm CMOS SOI. in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 2014.
- [16] Nissinen, I. and J. Kostamovaara. A 2-channel CMOS time-to-digital converter for time-offlight laser rangefinding. in 2009 IEEE Instrumentation and Measurement Technology Conference. 2009.
- [17] Jahromi, S., et al. A single chip laser radar receiver with a 9×9 SPAD detector array and a 10-channel TDC. in ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC). 2015.
- [18] Dutton, N.A.W., et al. 11.5 A time-correlated single-photon-counting sensor with 14GS/S histogramming time-to-digital converter. in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 2015.
- [19] Staszewski, R.B., et al. Spur-free all-digital PLL in 65nm for mobile phones. in 2011 IEEE International Solid-State Circuits Conference. 2011.
- [20] Ho, C.R. and M.S.W. Chen. A fractional-N DPLL with adaptive spur cancellation and calibration-free injection-locked TDC in 65nm CMOS. in 2014 IEEE Radio Frequency Integrated Circuits Symposium. 2014.
- [21]Elkholy, A., et al. A 3.7mW 3MHz bandwidth 4.5GHz digital fractional-N PLL with 106dBc/Hz In-band noise using time amplifier based TDC. in 2014 Symposium on VLSI Circuits Digest of Technical Papers. 2014.
- [22] Iizuka, T., et al. A true 4-cycle lock reference-less all-digital burst-mode CDR utilizing coarse-fine phase generator with embedded TDC. in Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. 2013.
- [23] Smarandoiu, G., et al., *An All-MOS Analog-to-Digital converter using a constant slope approach*. IEEE Journal of Solid-State Circuits, 1976. 11(3): p. 408-410.
- [24] Naraghi, S., M. Courcy, and M.P. Flynn. A 9b 14uW 0.06mm<sup>2</sup> PPM ADC in 90nm digital CMOS. in 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers. 2009.

- [25] Miyahara, M., et al. 22.6 A 2.2GS/s 7b 27.4mW time-based folding-flash ADC with resistively averaged voltage-to-time amplifiers. in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 2014.
- [26] Macpherson, A.R., J.W. Haslett, and L. Belostotski. A 5GS/s 4-bit time-based single-channel CMOS ADC for radio astronomy. in Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. 2013.
- [27] Yan-Jiun, C. and C.C. Hsieh. A 0.4V 2.02fJ/conversion-step 10-bit hybrid SAR ADC with time-domain quantizer in 90nm CMOS. in 2014 Symposium on VLSI Circuits Digest of Technical Papers. 2014.
- [28] Taillefer, C.S. and G.W. Roberts, *Delta Sigma A/D Conversion Via Time-Mode Signal Processing*. IEEE Transactions on Circuits and Systems I: Regular Papers, 2009. 56(9): p. 1908-1920.
- [29] Oh, T., H. Venkatram, and U.K. Moon. A 70MS/s 69.3dB SNDR 38.2fJ/conversion-step time-based pipelined ADC. in 2013 Symposium on VLSI Circuits. 2013.
- [30] Mathew, J.P., L. Kong, and B. Razavi. A 12-bit 200-MS/s 3.4-mW CMOS ADC with 0.85-V supply. in 2015 Symposium on VLSI Circuits (VLSI Circuits). 2015.
- [31] Chan-Hsiang, W., et al. An 8.5MHz 67.2dB SNDR CTDSM with ELD compensation embedded twin-T SAB and circular TDC-based quantizer in 90nm CMOS. in 2014 Symposium on VLSI Circuits Digest of Technical Papers. 2014.
- [32] Young, B., et al. A 75dB DR 50MHz BW 3rd order CT-sigma-delta modulator using VCObased integrators. in 2014 Symposium on VLSI Circuits Digest of Technical Papers. 2014.
- [33] Sharma, P.K. and M.S.W. Chen. A 6b 800MS/s 3.62mW Nyquist AC-coupled VCO-based ADC in 65nm CMOS. in Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. 2013.
- [34]Kim, B., W. Xu, and C.H. Kim. A fully-digital beat-frequency based ADC achieving 39dB SNDR for a 1.6mV-pp input signal. in Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. 2013.
- [35] Staszewski, R.B., et al., 1.3 V 20 ps time-to-digital converter for frequency synthesis in 90nm CMOS. IEEE Transactions on Circuits and Systems II: Express Briefs, 2006. 53(3): p. 220-224.
- [36] Mantyniemi, A., T. Rahkonen, and J. Kostamovaara. An integrated 9-channel time digitizer with 30 ps resolution. in 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2002.

- [37] Yu, J., F.F. Dai, and R.C. Jaeger. A 12-bit vernier ring time-to-digital converter in 0.13um CMOS technology. in 2009 Symposium on VLSI Circuits. 2009.
- [38] Vercesi, L., A. Liscidini, and R. Castello, *Two-Dimensions Vernier Time-to-Digital Converter*. IEEE Journal of Solid-State Circuits, 2010. 45(8): p. 1504-1512.
- [39] Markovic, B., et al., A High-Linearity, 17 ps Precision Time-to-Digital Converter Based on a Single-Stage Vernier Delay Loop Fine Interpolation. IEEE Transactions on Circuits and Systems I: Regular Papers, 2013. 60(3): p. 557-569.
- [40] Dudek, P., S. Szczepanski, and J.V. Hatfield, A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line. IEEE Journal of Solid-State Circuits, 2000. 35(2): p. 240-247.
- [41] Iizuka, T., et al. A fine-resolution pulse-shrinking time-to-digital converter with completion detection utilizing built-in offset pulse. in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC). 2016.
- [42] Yue, L., et al. A 6ps resolution pulse shrinking Time-to-Digital Converter as phase detector in multi-mode transceiver. in 2008 IEEE Radio and Wireless Symposium. 2008.
- [43] Chen, C.C., S.H. Lin, and C.S. Hwang, An Area-Efficient CMOS Time-to-Digital Converter Based on a Pulse-Shrinking Scheme. IEEE Transactions on Circuits and Systems II: Express Briefs, 2014. 61(3): p. 163-167.
- [44] Henzler, S., et al. 90nm 4.7ps-Resolution 0.7-LSB Single-Shot Precision and 19pJ-per-Shot Local Passive Interpolation Time-to-Digital Converter with On-Chip Characterization. in 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers. 2008.
- [45] Chen, M.S.W., D. Su, and S. Mehta. A calibration-free 800MHz fractional-N digital PLL with embedded TDC. in 2010 IEEE International Solid-State Circuits Conference (ISSCC). 2010.
- [46] Straayer, M.Z. and M.H. Perrott. An efficient high-resolution 11-bit noise-shaping multipath gated ring oscillator TDC. in 2008 IEEE Symposium on VLSI Circuits. 2008.
- [47] Helal, B.M., et al. A Low Jitter 1.6 GHz Multiplying DLL Utilizing a Scrambling Time-to-Digital Converter and Digital Correlation. in 2007 IEEE Symposium on VLSI Circuits. 2007.
- [48] Elshazly, A., et al. A 13b 315fs-rms 2mW 500MS/s 1MHz bandwidth highly digital time-todigital converter using switched ring oscillators. in 2012 IEEE International Solid-State Circuits Conference. 2012.
- [49] Seo, Y.H., et al. A 0.63ps resolution, 11b pipeline TDC in 0.13um CMOS. in 2011 Symposium on VLSI Circuits - Digest of Technical Papers. 2011.
- [50] Lee, M. and A.A. Abidi. A 9b, 1.25ps Resolution Coarse-Fine Time-to-Digital Converter in 90nm CMOS that Amplifies a Time Residue. in 2007 IEEE Symposium on VLSI Circuits. 2007.
- [51] Kim, K., W. Yu, and S. Cho. A 9b, 1.12ps resolution 2.5b/stage pipelined time-to-digital converter in 65nm CMOS using time-register. in 2013 Symposium on VLSI Circuits. 2013.
- [52]Kim, K., et al. A 7b, 3.75ps resolution two-step time-to-digital converter in 65nm CMOS using pulse-train time amplifier. in 2012 Symposium on VLSI Circuits (VLSIC). 2012.
- [53]Kim, J.S., et al., A 300-MS/s, 1.76-ps-Resolution, 10-b Asynchronous Pipelined Time-to-Digital Converter With on-Chip Digital Background Calibration in 0.13-um CMOS. IEEE Journal of Solid-State Circuits, 2013. 48(2): p. 516-526.
- [54] Lee, S.K., et al., A 1 GHz ADPLL With a 1.25 ps Minimum-Resolution Sub-Exponent TDC in 0.18 um CMOS. IEEE Journal of Solid-State Circuits, 2010. 45(12): p. 2874-2881.
- [55]Kwon, H.J., et al. A high-gain wide-input-range time amplifier with an open-loop architecture and a gain equal to current bias ratio. in IEEE Asian Solid-State Circuits Conference 2011. 2011.
- [56]Kim, B., H. Kim, and C.H. Kim. An 8bit, 2.6ps two-step TDC in 65nm CMOS employing a switched ring-oscillator based time amplifier. in 2015 IEEE Custom Integrated Circuits Conference (CICC). 2015.
- [57] Abas, A.M., et al., *Time difference amplifier*. Electronics Letters, 2002. 38(23): p. 1437-1438.
- [58] Chung, H., H. Ishikuro, and T. Kuroda, A 10-Bit 80-MS/s Decision-Select Successive Approximation TDC in 65-nm CMOS. IEEE Journal of Solid-State Circuits, 2012. 47(5): p. 1232-1241.
- [59]El-Halwagy, W., P. Mousavi, and M. Hossain. A 79dB SNDR, 10MHz BW, 675MS/s openloop time-based ADC employing a 1.15ps SAR-TDC. in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC). 2016.
- [60] Wu, B., et al. A 9-bit 215-MS/s folding-flash time-to-digital converter based on redundant remainder number system. in Proceedings of the IEEE 2014 Custom Integrated Circuits Conference. 2014.
- [61]Bult, K. and A. Buchwald, *An embedded 240-mW 10-b 50-MS/s CMOS ADC in 1-mm*<sup>2</sup>. IEEE Journal of Solid-State Circuits, 1997. 32(12): p. 1887-1895.

- [62]Xu, Y., et al., 5-bit 5-GS/s Noninterleaved Time-Based ADC in 65-nm CMOS for Radio-Astronomy Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2016. 24(12): p. 3513-3525.
- [63] Park, M. and M.H. Perrott. A single-slope 80MS/s ADC using Two-Step Time-to-Digital Conversion. in 2009 IEEE International Symposium on Circuits and Systems. 2009.
- [64] Sepke, T., et al., *Noise Analysis for Comparator-Based Circuits*. IEEE Transactions on Circuits and Systems I: Regular Papers, 2009. 56(3): p. 541-553.
- [65] Abidi, A.A., Phase Noise and Jitter in CMOS Ring Oscillators. IEEE Journal of Solid-State Circuits, 2006. 41(8): p. 1803-1816.
- [66] Kundert, K., Modeling jitter in PLL-based frequency synthesizers. 2006.
- [67] Hajimiri, A., S. Limotyrakis, and T.H. Lee, *Jitter and phase noise in ring oscillators*. IEEE Journal of Solid-State Circuits, 1999. 34(6): p. 790-804.
- [68] Weste, N. and D. Harris, *CMOS VLSI Design: A Circuits and Systems Perspective*. 2010: Addison-Wesley Publishing Company. 864.
- [69] Dutta, S., S.S.M. Shetti, and S.L. Lusky, A comprehensive delay model for CMOS inverters. IEEE Journal of Solid-State Circuits, 1995. 30(8): p. 864-871.
- [70] Seog-Jun, L., K. Beomsup, and L. Kwyro, A novel high-speed ring oscillator for multiphase clock generation using negative skewed delay scheme. IEEE Journal of Solid-State Circuits, 1997. 32(2): p. 289-291.
- [71] Miyashita, D., et al. A −104dBc/Hz in-band phase noise 3GHz all digital PLL with phase interpolation based hierarchical time to digital convertor. in 2011 Symposium on VLSI Circuits Digest of Technical Papers. 2011.
- [72] Henzler, S., Time-to-digital converters. Vol. 29. 2010: Springer Science & Business Media.
- [73]Kull, L., et al. A 110 mW 6 bit 36 GS/s interleaved SAR ADC for 100 GBE occupying 0.048 mm<sup>2</sup> in 32 nm SOI CMOS. in 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC). 2014.
- [74] Ali, A.M.A., et al. 29.3 A 14b 1GS/s RF sampling pipelined ADC with background calibration. in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 2014.
- [75] Shu, Y.S. A 6b 3GS/s 11mW fully dynamic flash ADC in 40nm CMOS with reduced number of comparators. in 2012 Symposium on VLSI Circuits (VLSIC). 2012.

- [76] Liu, J., et al. A 89fJ-FOM 6-bit 3.4GS/s flash ADC with 4x time-domain interpolation. in 2015 IEEE Asian Solid-State Circuits Conference (A-SSCC). 2015.
- [77] Kim, J.I., et al., A 65 nm CMOS 7b 2 GS/s 20.7 mW Flash ADC With Cascaded Latch Interpolation. IEEE Journal of Solid-State Circuits, 2015. 50(10): p. 2319-2330.
- [78] Chen, V.H.C. and L. Pileggi. An 8.5mW 5GS/s 6b flash ADC with dynamic offset calibration in 32nm CMOS SOI. in 2013 Symposium on VLSI Circuits. 2013.
- [79] Taft, R., et al. A 1.8V 1.6GS/s 8b self-calibrating folding ADC with 7.26 ENOB at Nyquist frequency. in 2004 IEEE International Solid-State Circuits Conference. 2004.
- [80] Nakajima, Y., et al. A self-background calibrated 6b 2.7GS/s ADC with cascade-calibrated folding-interpolating architecture. in 2009 Symposium on VLSI Circuits. 2009.
- [81]Garner, H.L., *The Residue Number System*. IRE Transactions on Electronic Computers, 1959. EC-8(2): p. 140-147.
- [82] Chang, C.H., et al., Residue Number Systems: A New Paradigm to Datapath Optimization for Low-Power and High-Performance Digital Signal Processing Applications. IEEE Circuits and Systems Magazine, 2015. 15(4): p. 26-44.
- [83] Goh, V.T. and M.U. Siddiqi, Multiple error detection and correction based on redundant residue number systems. IEEE Transactions on Communications, 2008. 56(3): p. 325-330.
- [84] Taylor, F.J., Residue Arithmetic A Tutorial with Examples. Computer, 1984. 17(5): p. 50-62.
- [85] Wang, W. and X.G. Xia, A Closed-Form Robust Chinese Remainder Theorem and Its Performance Analysis. IEEE Transactions on Signal Processing, 2010. 58(11): p. 5655-5666.
- [86] Omondi, A. and B. Premkumar, *Residue Number Systems: Theory and Implementation*. 2007: Imperial College Press. 312.
- [87] Vun, C.H. and A.B. Premkumar. *RNS encoding based folding ADC*. in 2012 IEEE International Symposium on Circuits and Systems. 2012.
- [88] Ramamoorthy, P.A. and B. Potu. *High-speed ADC using residue number system*. in *International Conference on Acoustics, Speech, and Signal Processing*. 1989.
- [89] Zhu, S., et al. A 0.073-mm<sup>2</sup> 10-GS/s 6-bit time-domain folding ADC in 65-nm CMOS with inherent DEM. in 2015 IEEE Custom Integrated Circuits Conference (CICC). 2015.
- [90] Anand, T., K.A.A. Makinwa, and P.K. Hanumolu. A self-referenced VCO-based temperature sensor with 0.034C/mV supply sensitivity in 65nm CMOS. in 2015 Symposium on VLSI Circuits (VLSI Circuits). 2015.

- [91] Murmann, B., ADC Performance Survey 1997-2016.
- [92] Zhu, S., et al, A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding and Inherent DEM, in IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1785-1796, Aug. 2016.

## **BIOGRAPHICAL SKETCH**

Shuang Zhu received his BS and MS degrees in Microelectronics from Xi'an Jiaotong University, Xi'an, China, in 2010 and 2013, respectively. He is currently working towards his PhD degree in electrical engineering at The University of Texas at Dallas, Richardson, TX, USA. From June to August 2015 he was an intern with Analog Devices in Wilmington, MA. Mr. Zhu was a recipient of the Phil Ritter Fellowship and ADI Outstanding Student Designer Award in 2015 and 2016, respectively. His research interests include analog/mixed signal circuits, high-speed time-domain data converters and bio-medical electronics.

## **CURRICULUM VITAE**

| EDUCATION                                                                                                                                                                                       |                                                                       |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
| THE UNIVERSITY OF TEXAS AT DALLAS                                                                                                                                                               | 2013 - 2017                                                           |
| PhD Candidate, EE, Research Assistant with Dr. Yun Chiu, GPA 4.0                                                                                                                                |                                                                       |
| XI'AN JIAOTONG UNIVERSITY                                                                                                                                                                       | 2010 - 2013                                                           |
| Master of Science (MSc), Microelectronics                                                                                                                                                       |                                                                       |
| XI'AN JIAOTONG UNIVERSITY                                                                                                                                                                       | 2006 - 2010                                                           |
| Bachelor of Engineering (BEng), Microelectronics                                                                                                                                                |                                                                       |
| WORKING EXPERIENCE                                                                                                                                                                              |                                                                       |
| IC Design Engineer Intern, Analog Devices. Inc.                                                                                                                                                 | June – August 2015                                                    |
| PUBLICATIONS                                                                                                                                                                                    |                                                                       |
| A 2GS/s 8b Flash ADC Based on Remainder Number System in 65nm CMOS                                                                                                                              | IEEE VLSI Symp. 2017                                                  |
| A 24.7 mW 65 nm CMOS SAR-Assisted CT $\Delta\Sigma$ Modulator With Second-Order N MHz Bandwidth and 75.3 dB SNDR (Invited, 2010)                                                                | loise Coupling Achieving 45<br>2 <sup>nd</sup> Author) IEEE JSSC 2016 |
| A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding (Invited,                                                                                                            | g and Inherent DEM<br>. 1 <sup>st</sup> Author) IEEE JSSC 2016        |
| A 0.073-mm <sup>2</sup> 10-GS/s 6-bit Time-Domain Folding ADC in 65-nm CMOS with Inhe                                                                                                           | erent DEM<br>(1st Author) IEEE CICC 2015                              |
| A Smart ECG Sensor with In-Situ Adaptive Motion-Artifact Compensation<br>Healthcare Devices (1                                                                                                  | for Dry-Contact Wearable<br><sup>st</sup> Author) IEEE ISQED 2016     |
| A 24.7mW 45MHz-BW 75.3dB-SNDR SAR-Assisted CT $\Delta\Sigma$ Modulator with 2r 65nm CMOS                                                                                                        | nd-Order Noise Coupling in<br>2 <sup>nd</sup> Author) IEEE ISSCC 2016 |
| A 9-bit 215-MS/s Folding-Flash Time-to-Digital Converter Based on Redundant H                                                                                                                   | Remainder Number System                                               |
|                                                                                                                                                                                                 | (2 <sup>nd</sup> Author) IEEE CICC 2014                               |
| <b>SKILLS</b><br>IC Design: Virtuoso (ADE L/XL), Spectre AFS, Calibre, Modelsim<br>Modeling: MATLAB, Verilog-A, Verilog, Spice, C<br>System Design: OrCad, Allegro, Altium designer, Quartus II |                                                                       |
| Honors & Awards                                                                                                                                                                                 |                                                                       |
| ADI Outstanding Student Designer Award                                                                                                                                                          | 2016                                                                  |
| Phil Ritter Fellowship (Eric Jonsson School, UT Dallas)                                                                                                                                         | 2015                                                                  |
| Graduate Study Scholarships (Eric Jonsson School, UT Dallas)                                                                                                                                    | 2013                                                                  |
| Third Prize Winner of ABU Robot Contest (Robocon)                                                                                                                                               | 2009                                                                  |