# SELF-HEALING CIRCUITS USING STATISTICAL ELEMENT SELECTION 

Submitted in partial fulfillment of the requirements for
the degree of Doctor of Philosophy
in
Electrical and Computer Engineering

Gokce Keskin<br>B.S., Electrical and Electronics Engineering, Bilkent University<br>M.S., Electrical and Computer Engineering, Carnegie Mellon University

Carnegie Mellon University
Pittsburgh, PA

September, 2010

## Acknowledgments

First and foremost, I would like to thank my thesis advisor, Professor Larry Pileggi, for all his support and guidance throughout my years at CMU.

I would also like to thank my dissertation committee members, Prof. Gary Fedder (CMU), Prof. Xin Li (CMU), Dr. Jean-Olivier Plouchart (IBM) and Dr. Erkan Alpman (Intel) for their help and feedback during this work.

This research has been supported by the Focus Center for Circuit \& System Solutions (C2S2), one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation Program. This work has also been supported by the National Science Foundation under contract CCF-0702278 and Defense Advanced Research Projects Agency. Fabrication for the two test chips was provided by IBM.

My colleague Jonathan Proesel (CMU) has designed most of the analog portions of the 65 nm test chip described in this project. I used his MATLAB code as a basis for automated measurement of the chips.

Soner Yaldiz has helped in numerous technical issues during this project, but I learnt at least as much from his view of life during my studies.

I would also like to thank Umut Arslan, Dan Morris, Andrew Phelps, Vanessa Chen and Kaushik Vaidyanathan for their help in various stages of this work.

I would not be able to achieve anything in my life without my family. I would like to thank my mother, sister, grandmother, father and brother-in-law for being with me whenever I needed them. I love you all.

Last but not the least, I would like to thank the love of my life, Pelin. Thanks for finding me. Your every smile reminds me how much I love you.

Abstract<br>SELF-HEALING CIRCUITS USING STATISTICAL ELEMENT SELECTION<br>by<br>Gokce Keskin<br>Doctor of Philosophy in Electrical and Computer Engineering<br>Carnegie Mellon University<br>Professor Lawrence T. Pileggi, Chair

Process variations in advanced CMOS process nodes limit the benefits of scaling for ana$\log$ designs. In the presence of increasing random intra-die variations, mismatch becomes a significant design challenge in circuits such as comparators. In this dissertation, the statistical element selection (SES) methodology that was first proposed in [1] is analyzed in detail and extended to accommodate a broader spectrum of circuits and systems. SES relies on choosing a subset of selectable circuit elements (e.g., input transistors in a comparator) to achieve the desired specification (e.g., offset). Silicon results from a 65 nm bulk CMOS test chip demonstrate that it can achieve an order of magnitude better matching than both redundancy and simple scaling given the same core circuit area. To demonstrate its efficacy, we applied SES to enable a novel flash ADC topology in 45nm SOI CMOS that operated at $1 G S / s$ and achieved 4.6 bits of ENOB with a figure of merit of $160 \mathrm{fJ} / \mathrm{step}$. SES is also applied to an array of microelectromechanical resonators to improve the expected yield of RF MEMS filters.

## Contents

1 Introduction ..... 1
2 Background ..... 3
2.1 Process Variations ..... 3
2.1.1 Systematic Variations ..... 3
2.1.2 Random Variations ..... 5
2.2 Previous Work ..... 7
2.3 Summary ..... 11
3 Statistical Element Selection ..... 13
3.1 Basics ..... 13
3.2 Methodology ..... 17
3.2.1 Comparator Array in 65 nm Bulk CMOS ..... 17
3.2.1.1 Modeling ..... 17
3.2.2 6-Bit Flash ADC in 45nm SOI CMOS ..... 26
3.2.2.1 Basics ..... 26
3.2.2.2 Modeling ..... 28
3.2.3 SES Based MEMS Resonator Array ..... 36
3.2.3.1 Basics ..... 36
3.2.3.2 Modeling ..... 39
3.2.3.3 Simulation Results ..... 46
3.3 Summary ..... 47
4 Design Details and Silicon Results ..... 49
4.1 Comparator Array in 65 nm Bulk CMOS ..... 49
4.1.1 Test Chip Architecture ..... 49
4.1.2 Test Setup ..... 52
4.1.3 Measurement Results ..... 54
4.2 6-Bit Flash ADC in 45 nm SOI CMOS ..... 58
4.2.1 Test Chip Architecture ..... 58
4.2.2 Test Setup ..... 62
4.2.3 Measurement Results ..... 62
4.3 Summary ..... 68
5 Conclusions and Future Work ..... 69

## List of Tables

2.1 Summary of Results for Recent Flash ADCs ..... 11
4.1 Mean difference of input offset voltage with respect to $V_{d d}$ ..... 64

## List of Figures

2.1 Good and Bad Layout Styles to Control Systematic Poly Variation ..... 4
2.2 Common Centroid Style Layout ..... 6
2.3 Random Dopant Fluctuations and Line Edge Roughness ..... 6
2.4 Basic Flash ADC Architecture ..... 8
2.5 Configurable Comparators in [2, 3] ..... 10
2.6 Configurable Comparator in [4] ..... 10
2.7 Linearity Enhancement by Combining CDFs in [5] ..... 11
3.1 SES Based Differential Amplifier ..... 14
3.2 Basic Flash ADC Architecture ..... 17
3.3 Latch Type Comparator ..... 18
3.4 SES-based Latch Type Comparator ..... 19
3.5 Failure Probability for $N=20, \sigma_{o s, i}=1$, spec $=10^{-2}$ ..... 20
3.6 Failure Probability for $N=1$ to $20, \sigma_{o s, i}=1$, spec $=10^{-2}$ ..... 21
3.7 Comparison of SES, Redundancy and Scaling ..... 22
3.8 Decision Cube ..... 23
3.9 Differential Amplifier ..... 25
3.10 N Bit Flash ADC with Built-in Reference ..... 27
3.11 Bins in a 6 -bit Flash ADC ..... 28
3.12 Self-referenced Flash ADC Comparator ..... 29
3.13 Coarse Tuning to Increase Full Scale Range ..... 30
3.14 Simulated Success Probability for 6-Bit ADC, $\Lambda=0, R=0,8 \leq N \leq 16$ ..... 32
3.15 Simulated Success Probability for 6-Bit ADC, $\Lambda=0, R=2,8 \leq N \leq 16$ ..... 33
3.16 Simulated Success Probability for 6 -Bit ADC, $\Lambda=0$ to $2 \sigma_{\text {element }}, R=0,8 \leq$ $N \leq 16$ ..... 34
3.17 Simulated Success Probability for 6 -Bit ADC, $\Lambda=0$ to $2 \sigma_{\text {element }}, R=5,8 \leq$ $N \leq 16$ ..... 35
3.18 Superheterodyne Receiver Architecture ..... 37
3.19 RF Receiver Architecture with On-die Channel/Band Filtering ..... 37
3.20 Electrically Parallel Resonator Array ..... 39
3.21 Frequency Binning of Resonators ..... 39
3.22 Frequency Binning of Resonators in a Large Band ..... 40
3.23 MEMS Filter Design Flow ..... 41
3.24 MEMS Filter Bin Profile ..... 43
3.25 Bin Profile with Extra Added Resonators ..... 44
3.26 Expected Success Probability for the MEMS Filter (Only Mismatch Variations Turned On) ..... 45
3.27 Simulation Setup for the MEMS Filter ..... 46
3.28 Comparison of Magnitude and Phase for the MEMS Filter with SES and without SES ..... 48
4.1 Die Photo of 65 nm Test Chip ..... 50
4.2 Comparator Array Test Chip Architecture ..... 51
4.3 Measurement Setup ..... 52
4.4 Measured Offset Histograms Before and After SES (3315 comparators) ..... 54
4.5 Measured Offset Histograms for Redundancy (1275 comparators) ..... 55
4.6 Measured Success Probability for SES, $N=16$ to 32 (3315 Comparators) ..... 56
4.7 Measured Offset Standard Deviation Contour, $N=14$ (255 comparators) ..... 57
4.8 6-bit ADC Die Photo ..... 58
4.9 Layout Diagram of the 6 Bit ADC ..... 59
4.10 Digital Clock Divider (Subsampler) ..... 60
4.11 Process Scaling for Intel Corporation [6] ..... 60
4.12 Section of Regular Comparator Layout for 6 -bit ADC ..... 61
4.13 Test Setup for 6-bit ADC ..... 63
4.14 DNL and INL plot for the ADC ..... 64
4.15 FFT Magnitude Plot for $f_{S}=500 \mathrm{MS} / \mathrm{s}$ and $f_{\text {in }}=243.1 \mathrm{MHz}$ ..... 66
4.16 Dynamic Test Results for $f_{S}=500 \mathrm{MS} / \mathrm{s}$ ..... 66
4.17 FFT Magnitude Plot for $f_{S}=1000 \mathrm{MS} / \mathrm{s}$ and $f_{\text {in }}=497.1 \mathrm{MHz}$ ..... 67
4.18 Dynamic Test Results for $f_{S}=1000 M S / s$ ..... 67

## Chapter 1

## Introduction

Continuous advancement of CMOS process technology over the past four decades has made inexpensive integrated circuit products with significant processing capabilities an everyday reality. Cost pressures have resulted in substantial integration of analog and digital blocks on the same die, forcing analog designers to adapt to processes that were built for digital systems. As we rapidly approach the physical limits of scaling, one of the major challenges for analog circuits has been to ensure consistently high yield in the presence of increasing variability in these nanoscale CMOS processes.

In this thesis, a new methodology for analog circuits that is based on statistical element selection (SES) is described and applied to practical applications. SES is a post manufacturing calibration step to accommodate large-scale process variations. It exploits inherent random variations to improve the matching of transistors and to increase yield for matchingcritical circuits such as comparators. A subset of $k$ elements is selected among an identically laid out set of $N$ elements to provide the best matching performance. As the number of available subsets among a set of $N$ elements increases exponentially $\left(2^{N}-1\right)$, it is possible to achieve impressive matching performance with near-minimum size unit elements. The elements might be individual transistors, pairs of transistors, or passive components. A methodology is presented to determine the appropriate $(N, k)$ numbers and the size of the
unit element to ensure that a desired matching specification is met. Measurement results from two CMOS test chips support model predictions.

In Chapter 2, major sources of CMOS manufacturing variations and their effect on analog circuits are described. Various calibration methods published in the literature to alleviate the effects of these variations are reviewed. The focus of the review is flash ADCs.

In Chapter 3, the details of SES methodology are described and modeling results for three separate case studies are presented:

- A comparator array intended for an 8-bit flash ADC with a traditional reference ladder based architecture, where the offsets of comparators are calibrated to achieve the required spec. The final goal of this research direction is to build high resolution, high speed ADCs within feasible power and area requirements for traditional architectures.
- A self-referenced 6-bit flash ADC that exploits the features of SES to calibrate offset values to the desired reference voltages without the need for a reference ladder. Furthermore, comparators are laid out in a restricted layout fabric that has also been used for digital and memory circuits on the same IC to demonstrate the full utility of SES for implementing analog circuits with deeply scaled digital CMOS processes.
- A microelectromechanical systems (MEMS) based resonator array where SES is used to accommodate variations in the array to build a filter. MATLAB modeling and circuit simulations results demonstrate the efficacy of SES for such systems. The results demonstrate that RF bandpass filters/mixer-filters can be reliably implemented using highly varying MEMS resonators with the use of an SES methodology.

Chapter 4 presents the details and measurement results of test runs for the first two case studies. The first study is manufactured in 65 nm bulk CMOS, and the second in 45 nm SOI CMOS.

Finally, Chapter 5 presents future research directions.

## Chapter 2

## Background

Manufacturing variations are a significant problem for both digital and analog circuits in advanced CMOS process nodes and they are expected to grow in importance with each new generation. In this chapter, major variation sources are discussed and previous work on alleviating the effects of variation in analog to digital converters is described.

### 2.1 Process Variations

Process variations in modern CMOS nodes can generally be classified into two areas, systematic and random. Many of the dominant systematic variations can be predicted and addressed by using careful circuit design and layout techniques. Random variations are unpredictable and can cause significant mismatch among devices. It is the latter variability that is actually exploited by the SES approach to tune the circuits after manufacturing.

### 2.1.1 Systematic Variations

Systematic variations can be broadly classified into two sub-groups[7]:

- Across-field effects that are caused by lithography or etching processes. Location of the die on the wafer can lead to a systematic shift in device parameters, for example
due to a problem in focus or the lens in the manufacturing equipment. All devices in the same vicinity are affected the same way due to these effects.
- Layout dependent effects that result in different characteristics of identical devices in the same vicinity in the wafer. An example is variation due to the well proximity effect where threshold voltage of a MOS device close to an $n$-well can be different from an identical MOS device far away from n-wells.

Polysilicon gate pitch has a significant systematic impact on actual gate length since it is highly dependent on the surrounding poly lines in modern CMOS processes[8]. The common practice in minimizing this systematic effect is to use constant poly pitch and to add dummy poly lines where necessary. Matching poly lines also need to have the same orientation. Fig. 2.1 shows an example of a bad layout where the two devices on the top have different poly surroundings. This causes a systematic mismatch in the actual gate lengths of the two devices. Bottom layout shows how this problem can be corrected. Recommended poly pitch is generally noted on most process manuals for optimal matching of devices, and addition of one or two dummy poly lines from the edges is generally enough.


Figure 2.1: Good and Bad Layout Styles to Control Systematic Poly Variation

Well proximity effect is another layout dependent effect where threshold voltage of devices close to wells are systematically different from devices far away from them. This difference
is mainly due to the scattering of ions from the photoresist that covers the wells during ion implantation. Scattered ions fall on the close-by devices, changing the doping. Matchingcritical devices should either share the same well surroundings or should be placed sufficiently far from wells.

Stress effects are also a major source of layout dependent systematic variation. Shallow trench isolation (STI) used in CMOS processes create a mechanical stress on the devices and uneven diffusion layers can introduce mismatch in device characteristics [9, 10]. Intentionally introduced stress methods to improve electron/hole mobility are also mostly dependent on layout and can introduce systematic offsets. Placement of the active (diffusion) layer with respect to the well edges must be adjusted to make sure that the same stress is applied to identical devices.

Systematic sources of variation are widely known by analog designers and can be overcome by careful layout techniques. Matching-critical devices are laid out in close proximity in the die and common-centroid layout is employed. Fig. 2.2 shows such a layout where devices Ma and Mb are split into two pieces and arranged in a fully symmetrical configuration to overcome any systematic gradient effect in x or y axes. The surroundings of the devices should also be as identical as possible.

Restricted design rules with fixed gate lengths, high regularity in diffusion, poly and metal layers, single poly orientation and lithography solutions such as double patterning and optical proximity correction are already proposed techniques to alleviate the systematic effects in leading edge CMOS processes $[6,11]$. Although these methods are mainly discussed for logic gates and memories, analog designs will ultimately need to use the same rules for highly integrated CMOS chips.

### 2.1.2 Random Variations

Random variations are due to unpredictable and unrepeatable sources of variation in manufacturing. Random dopant fluctuation (RDF) in the transistor channel is an example


Figure 2.2: Common Centroid Style Layout
of this type of variation [12]. Channel dopants are used to adjust the threshold voltage of MOS devices to the desired value and can be on the order of tens of dopant atoms in modern short channel devices (Fig. 2.3). Small variation in the number of atoms, their location in the channel or impurities can lead to significant threshold voltage variation in the device. Several new technologies such as undoped channels, high-k metal gates, thin SOI and FinFETs are being evaluated, but tens of millivolts of variation in threshold voltage is still expected [13-17].


RDF


LER

Figure 2.3: Random Dopant Fluctuations and Line Edge Roughness

Another source of random variation is line edge roughness (LER). Microscopic deviations in the poly line forming the gate can lead to uneven channel length across the width of the
device (Fig. 2.3). These variations can lead to an effective difference in the conductance constant ( $\beta=\mu C_{o x} \frac{W}{L}$ ) and adversely affect matching.

Random sources of variation cannot be alleviated by following restricted design rules. Increasing the device size to average out the random variations improves matching only by $1 / \sqrt{W L}[18]$. This poor return with increasing device size (and hence total area and power) is problematic for many analog circuits, such as comparators in analog to digital converters (ADCs), current sources in digital to analog converters (DACs), matching of passives. In the next section, various post-manufacturing calibration techniques published in the literature will be described. The focus of this review is flash ADC architectures that place tight matching requirements.

### 2.2 Previous Work

Device matching is a particularly important factor for the performance of flash ADCs. As shown in Fig. 2.4, for an $N$ bit flash ADC, $2^{N}-1$ comparators are connected in parallel to generate a thermometer code that quantizes the analog input voltage. A resistive ladder is used to generate the reference voltages corresponding to each least significant bit (LSB) for the comparators. Each of the comparators should have less than $\pm 0.5 L S B$ input offset voltage (preferably much less) for correct operation of the ADC. The thermometer code is later converted to $N$ bit binary by digital processing.

The main advantage of the flash ADCs is their speed: Since the analog input is converted to digital in parallel, they are among the fastest type of data converters. Unfortunately, the number of comparators rises exponentially with the resolution for a flash ADC; i.e., a 1-bit increase in resolution will require doubling of the number of comparators. This leads to doubling of the power and area consumed by the comparators. Moreover, one-bit increase in resolution tightens the comparator matching requirements by $2 \times$, since LSB is halved. To achieve this requirement by simple sizing based on the well known Pelgrom's model


Figure 2.4: Basic Flash ADC Architecture
[18], we would need to increase the comparator sizing by roughly $4 \times$; further adding to the power/area penalty. Moreover, increased input capacitance of the comparators puts significant pressure on keeping the same sampling rate. In many cases, power hungry input sampling switches and preamplifiers might be required in order to keep the same sampling rate.

Various offset calibration methods have been proposed to alleviate the prohibitive increase in the power/area requirements with increasing resolution of the flash ADCs. In [19], the input is first sampled using a track-and-hold (THA) amplifier. Then, a differential amplifier with resistive loads amplifies the difference between the input and the reference voltage. This preamplifier is followed by a comparator. For offset calibration, the reference ladder has been modified to generate voltages at each $L S B / 3$ interval (as opposed to one full LSB), and the input to the preamplifiers can be connected to a window of $\pm 15$ of these references. Hence, the offset can be tuned to $\pm 5 L S B$ by connecting to a voltage within this window. Each preamplifier-comparator combination is calibrated during startup, and the fully digital calibration circuitry is integrated on the die. The calibration circuit sets the analog input to a known reference, and then finds the correct offset by adjusting the reference voltage to the preamplifier until the output of the preamplifier-comparator combination toggles between digital 0 and 1.

Comparator redundancy has been employed in the past to improve the parametric yield of the flash ADCs. Adding redundant comparators increases the probability that the offset of at least one of them will be low enough to be used in the ADC. Both redundancy and reordering of comparators (assigning comparators with high offsets to other reference levels) have been applied in [20]. Up to four redundant elements are needed to ensure negligible decrease in signal to noise and distortion ratio.

Digital to analog converter (DAC) based calibration has also been proposed. In [21], digitally controlled calibration currents are added to the outputs of the comparators to compensate for mismatches in transistors. Redundancy is employed to eliminate comparators with very high offsets that cannot be calibrated with the DAC.

In a number of flash ADCs, the reference ladder has been eliminated and the references are built-in to the comparators. In [2], comparator thresholds have been adjusted by adding extra capacitors to one of the internal nodes (Fig. 2.5a). The size of the added MOS capacitor $(P 4)$ is adjusted for each comparator so that the offset is systematically shifted to the desired trip-point. Furthermore, mismatched currents are added to the output branches (via $N 2$ devices) to cancel the undesired random offset resulting from device mismatches. Control knobs cal+ and cal- are connected to discrete levels in a 12 -level reference ladder. The outputs of the comparators are passed through digital logic to correct for bubble errors in thermometer code to binary conversion.

Fig. 2.5b shows another built-in offset method that makes use of stacked PMOS devices with different widths to introduce systematic offset [3]. Redundant comparators are added to compensate for the random offset component. For a 6-bit single-ended flash ADC, 127 comparators are used; but the achieved effective number of bits (ENOB) is only 5.05. Thus, the amount of required redundancy is significant.

Van der Plas describes another self-referenced comparator with systematic trip-point shift and random offset calibration [4]. Systematic shift is achieved by adding an intentional mismatch in the width of the input devices which introduces a slight imbalance in


Figure 2.5: Configurable Comparators in [2, 3]
discharge currents of the differential branches (Fig. 2.6). Random offsets are then canceled by selectively adding extra MOS capacitors to these branches.


Figure 2.6: Configurable Comparator in [4]

Weaver's flash ADC depends much more heavily on random offsets, generated by inherent mismatch, for the built-in references [5]. This "stochastic" flash ADC uses the linear region in the cumulative distribution function (CDF) of the random offset distribution as the full scale input range. By dividing the comparators in two groups and using different reference voltages for each group, two CDFs are attained with a systematic shift in between (Fig. 2.7). The input full scale range is then used as the midpoint of this transfer function, where
linearity is greatest. The outputs of the comparators are added in the digital domain. In this 6 -bit ADC, a total of 1152 comparators have been used to get the desired linearity (ENOB is a little less than 5.5), indicating that a significant amount of redundancy is required.


Figure 2.7: Linearity Enhancement by Combining CDFs in [5]

A summary of results for these recent flash ADCs is given in Table 2.1.
Table 2.1: Summary of Results for Recent Flash ADCs

| Reference | Technology | Resolution (bits) | Sampling Rate (MS/s) | Power (mW) | ENOB | FoM (pJ/step) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $[19]$ | $65 n m$ CMOS | 6 | 800 | 12 | 5.63 | 0.303 |
| $[20]$ | $250 n m$ CMOS | 6 | 400 | 150 | 5 | 11.72 |
| $[2]$ | 90 nm CMOS | 5 | 1750 | 2.2 | 4.7 | 0.05 |
| $[3]$ | 180nm CMOS | 6 | 0.4 | $1.66 \times 10^{-3}$ | 5.05 | 0.125 |
| $[4]$ | 90 nm CMOS | 4 | 1250 | 2.5 | 3.7 | 0.16 |
| $[5]$ | 180 nm CMOS | 6 | 8 | 0.631 | 5.29 | 2.02 |

### 2.3 Summary

This section introduced the basic sources of variation in modern CMOS processes. Systematic variations can be mostly overcome by careful layout, but analog circuits will still need to conform to the restricted design rules being introduced in the leading edge processes. Random variations are much more difficult to overcome and can adversely affect matching. Many critical analog circuits, such as comparators, are susceptible to mismatches that cause undesired input offsets.

Flash ADCs are widely used in low to medium resolution applications. Despite their high speed, they suffer from mismatches since many parallel comparators with tight matching requirements are needed. Several recently proposed digital calibration techniques to address this issue were discussed in this chapter.

## Chapter 3

## Statistical Element Selection

### 3.1 Basics

Statistical Element Selection (SES) was recently proposed to alleviate the problems caused by extreme variability in advanced CMOS processing nodes [1]. The basic concept is to use $N$ identically laid-out elements for a given circuit block (e.g., branches of input transistors in a comparator) and use one subset among the $2^{N}-1$ available subsets such that the chosen subset will satisfy the desired specification (e.g., input offset voltage).

In order to understand the main difference between scaling, redundancy and SES, consider the differential amplifier in Fig. 3.1. $N$ pairs of input NMOS transistors are labeled as $M 1 a \backslash M 1 b$ through $M N a \backslash M N b$. Each pair has its own tail NMOS transistor with gates tied to digital control signals Sel1 through SelN. Each pair can be turned on or off as desired by $S e l<1: N>$. Each transistor has different characteristics due to the manufacturing variations, and the mismatches between the pair transistors result in non-ideal effects such as input offset voltage.

In scaling, all branches from 1 to $N$ are selected. All signals Sel $<1: N>$ are connected to the same line during the design phase. The averaging effect introduced by the selection of all mismatched branches results in a lower amount of effective variation, and yields an


Figure 3.1: SES Based Differential Amplifier
improvement of $1 / \sqrt{N}$ in matching standard deviation[18]. No calibration is done after manufacturing.

For previous work that has been referred to as redundancy, branches are grouped into predetermined identical blocks during the design phase. Only one block can be selected at a time during post-manufacturing calibration. For example, assume that each pair in Fig. 3.1 forms one block, for a total of $N$ blocks. Among the available $N$ combinations, the one with the best offset specification is selected. If $N / 2$ branches form one block, there are only 2 combinations to select from during calibration.
$S E S$ is an extension of redundancy. Rather than grouping the branches into predetermined blocks, each pair is allowed to be individually selected. This is essentially a finer grain redundancy that must be carefully designed based on the statistical parameter models and the different methods that can be used for efficient digital selection of the "elements." If a total of $N / 2$ pairs is desired, the selection can be made among the $\binom{N}{N / 2}$ subsets that can be formed using the control signals. This is a significantly larger search space than redundancy. If $N=16$ and 8 pairs form one block, only two blocks are available for selection in redundancy. If any subset of size 8 can be selected (SES), 12870 combinations are available. If the subset size is not constrained to $N / 2$, any subset among $2^{N}-1$ can be selected. This exponential increase in the number of combinations results in a significant improvement in
finding a low offset combination.
Input offset voltage $\left(V_{o s}\right)$ of the differential amplifier in Fig. 3.1 is defined as the $V_{i n}=$ $V_{\text {in }+}-V_{\text {in- }}$ at which the output branch currents $\left(I_{\text {out }+}, I_{\text {out- }}\right)$ are equal. If all the transistors in the circuit are perfectly matched, $V_{o s}=0$. In practice, mismatch in threshold voltage $\left(V_{t h}\right)$ and conductance constant $\left(\beta=\mu C_{o x} \frac{W}{L}\right)$ of transistors result in unequal currents through the branches of the pairs when an equal voltage is applied to both $V_{i n}$ terminals. If the input offset of $i^{t h}$ input pair is $V_{o s, i}$ and the transconductances of all pairs are the same, then the input offset voltage of the differential amplifier is [22]:

$$
\begin{equation*}
V_{o s}=\frac{1}{N} \times \sum_{i=1}^{N} V_{o s, i} \tag{3.1}
\end{equation*}
$$

If we consider the case that only a subset of the $N$ pairs is chosen, the resulting input offset voltage is:

$$
\begin{equation*}
V_{o s}=\frac{1}{\sum_{i=1}^{N} k_{i}} \times \sum_{i=1}^{N} k_{i} V_{o s, i} \tag{3.2}
\end{equation*}
$$

where $k_{i}=1$ if the $i^{\text {th }}$ pair is chosen, and $k_{i}=0$ otherwise.
$V_{t h}$ mismatch generally dominates $\beta$ mismatch and we can write $V_{o s, i}=\Delta V_{t h, i}[23]$. Systematic sources of variation and gradient variations can be minimized by using a fully symmetrical layout and closely spaced transistors. In addition, if the input transistors are the dominant source of mismatch, the $V_{o s, i}$ distribution is centered at 0 and can be estimated as a gaussian normal distribution with $\mathcal{N}\left(0, \sigma_{o s, i}{ }^{2}\right)$. Using Eq. 3.2, we can determine that the input offset voltage of the amplifier is $\mathcal{N}\left(0, \sigma_{o s}{ }^{2}\right)$ where $\sigma_{o s}=\frac{\sigma_{o s, i}}{\sqrt{\sum_{i=1}^{N} k_{i}}}$. This follows a close resemblance to the result found in [18], where matching of MOS devices in close proximity has been shown to improve by $1 / \sqrt{\text { Area }}$.
$\sigma_{o s, i}$ can be determined from Monte Carlo Spice simulations for the given circuit. The easiest method to improve matching, and hence achieve a desired input offset specification with an arbitrarily large probability, is to increase the size of input transistors. This could be done by increasing width and/or length of each device or by adding more branches in
parallel. Unfortunately, the $1 / \sqrt{\text { Area }}$ relationship makes "select all" method (sizing) very costly in terms of area and power.

In redundancy, the selectable element is duplicated a given number of times and the best is chosen among them. As an example, consider the circuit on Fig. 3.1 where only one element is selected at a time so that the best selection among $N$ available can be done after manufacturing ( $N$ times redundancy). As will be seen in the following sections, this method significantly increases the probability that at least one good redundant element will satisfy the given input offset specification. Statistical Element Selection (SES) takes the redundancy concept one step further: If any subset among all available elements in the circuit is allowed to be selected, a large space of $2^{N}-1$ combinations can be used. This exponential increase in search space in SES partly forms its strength over both sizing and redundancy.

SES can be applied to a variety of circuits such as current sources, differential amplifiers and comparators where matching is critical. It can also be applied to passive element matching in capacitors and resistors. One specific application is flash analog-to-digital converters, where a large number of comparators with tight input offset specification is required.

As in all circuit designs, the main goal in SES is to achieve a target specification such as input offset voltage with arbitrarily high probability (e.g., 99.5\%) with lowest possible power and area. The basic parameters to determine are:

- Total number of selectable elements $(N)$
- The number of elements selected $(k)$
- Size of each element
- Total number of sets among $\binom{N}{k}$ that will be tried, determining calibration time

Since different circuits and applications require different trade-offs among these parameters, a methodology to determine the values of the basic parameters is needed. In this chapter, a Monte Carlo based methodology to determine the values of these parameters will be described for three different applications:

1. A comparator array in 65 nm bulk CMOS, intended for use in an 8 -bit Flash ADC design
2. A self-referenced 6 -bit Flash ADC in 45 nm SOI CMOS
3. A MEMS resonator array intended to be used as a filter

### 3.2 Methodology

### 3.2.1 Comparator Array in 65 nm Bulk CMOS

### 3.2.1.1 Modeling

Flash ADCs are fast, low-to-medium resolution data converters specifically suited for high-speed applications. The basic structure of a Flash ADC is shown in Fig. 3.2. For an N -bit converter, $2^{N}-1$ comparators are connected in parallel to the same analog input voltage. Each comparator is also connected to a unique reference voltage to compare the analog input. Ideally, the input offset voltage for each comparator is zero, and the immediate output of the array is a thermometer code. A digital computation block converts the thermometer code to N-bit digital output. Since high speed operation is desired, latch type


Figure 3.2: Basic Flash ADC Architecture
comparators with positive feedback (resembling sense amplifiers [24]) are commonly used
in Flash ADCs. Better noise immunity can be achieved by using a fully differential analog input and the reference voltage as shown in Fig. 3.3. Cross-coupled inverter pairs (transistors 1 through 4) provide positive feedback, whereas pre-discharge transistors (labeled 5 through 8) reset the comparator before the evaluate phase. An SR-latch typically follows the latch-type comparator to preserve the outputs through the reset phase and the outputs are kept stable through the full clock cycle [22, 25]. Inverters or buffers can be inserted after the comparator to reduce kickback charge from the SR-latch. One or more preamplification stages might be inserted directly following the analog inputs since the comparators generally have high input offset voltage and excessive kickback [26]. Track and hold amplifiers (THA) can be used before the preamplifiers to ease clock timing requirements [19, 27]. In the following discussion, no preamplifiers are assumed to be used before the comparators, but the methodology is not affected by this assumption.


Figure 3.3: Latch Type Comparator

Fig. 3.4 shows a latch type SES-based comparator where the dark sections are replicated $N$ times. Assume that each selectable element on Fig. 3.4 has an offset distribution that follows normal $\mathcal{N}\left(0, \sigma_{o s, i}\right)$ and only one element among the $N$ is selected. The probability
that this element has an absolute offset smaller than a given specification spec is:

$$
\begin{equation*}
p_{\text {success }}=\operatorname{erf}\left(\frac{s p e c}{\sigma_{o s, i} \times \sqrt{2}}\right)=1-p_{\text {fail }} \tag{3.3}
\end{equation*}
$$

$p_{\text {fail }}$ denotes the probability that this element will fall out of the given offset specification


Figure 3.4: SES-based Latch Type Comparator
(spec). To ensure good linearity of the ADC, spec should be less than $\pm 0.5 \mathrm{LSB}$. Since the offset of each element is independent, one can calculate the probability that each and every one of the available $N$ elements will fall out of the desired offset specification as:

$$
\begin{equation*}
p_{\text {fail,total }}=\left(p_{\text {fail }}\right)^{N} \tag{3.4}
\end{equation*}
$$

This is a classical example of redundancy. Let us now consider that all $N$ elements are chosen. In this case, the offset distribution follows $\mathcal{N}\left(0, \sigma_{o s, i} / \sqrt{N}\right)$. The probability that the offset is within spec (denoted by $p_{\text {fail,N }}$ ) can be calculated simply by substituting $\sigma_{\text {os }, i}$ in Eq. 3.3 with $\sigma_{o s}=\sigma_{o s, i} / \sqrt{N}$. This is a classical example of Pelgrom type sizing to reduce variability and results in lower failure probability than using only one element.

Redundancy and Pelgrom-type sizing are the two extremes for SES. Rather than selecting one at a time (redundancy), or all at once (sizing), $k$ elements among $N$ are selected at a time $(1 \leq k \leq N)$. Fig. 3.5, generated using $1 \times 10^{6}$ Monte Carlo samples in MATLAB,
shows the failure probability $\left(p_{\text {fail,total }}\right)$ as $k$ is varied when $N=20, \sigma_{o s, i}=1$ and offset specification (spec) is $10^{-2}$. In other words, we are trying to achieve an absolute offset less than $1 / 100^{\text {th }}$ of the standard deviation of each element, a very ambitious target. $p_{\text {fail }}$ for each element can be calculated from Eq. 3.3 using $\sigma_{o s, i}=1$ and spec $=10^{-2}$. The leftmost


Figure 3.5: Failure Probability for $N=20, \sigma_{o s, i}=1$, spec $=10^{-2}$
point in the contour shows the case of redundancy, where we have 20 independent subsets of only one element each $(k=1)$. The failure probability at this point in the contour can be calculated simply by $p_{\text {fail,total }}=\left(p_{\text {fail }}\right)^{20}$. The rightmost point corresponds to the case where we have only one subset of 20 elements (select all elements, $k=N=20$ ). Probability of failure for this subset, $p_{\text {fail }}$, can be calculated from Eq. 3.3 again, with spec $=10^{-2}$ and $\sigma_{o s, i}=1 / \sqrt{20}$; because we know that standard deviation decreases by $1 / \sqrt{\text { Area }}$. The failure probability at the right end of the contour is simply $p_{\text {fail,total }}=p_{\text {fail }}$, since there is only one subset of size 20. This point corresponds to Pelgrom-type sizing ( $k=N=20$ ). Clearly, orders of magnitude of improvement in failure probability is achievable compared to both redundancy and Pelgrom-type sizing if we allow $k$ to be anywhere between these two extremes; i.e. $1<k<20$.

Minimum failure probability is observed when $k=4$; however, this may not be the
optimum point when one considers that 16 unused elements are contributing to the parasitics. In the comparator example, this would slow down the circuit. In most cases it is desirable to minimize the number of unused elements, or simply maximize the $k / N$ ratio while achieving the required offset specs.

Fig. 3.6 shows the plots when both $N$ and $k$ are varied. Each blue contour corresponds to a different $N$ value $(1<N<20)$, and the x-axis shows how many elements $(k)$ are selected among $N(k \leq N)$. As the previous case, each selectable element follows $\mathcal{N}(0,1)$ and spec $=10^{-2}$. A good way to visualize the improvement in failure probability is to look at a vertical line at a given $k$ (shown for $k=10$ ), and determine the intersection points between this line and each contour. We increase $N$ until we reach the failure probability target $p_{\text {fail,total }}$ (shown for $p_{\text {fail,total }}=10^{-2}$ ). In this example, target is reached when $N=18$. One can select any $N$ above 18, but at the expense of increasing the number of unused elements. Fig. 3.6 helps us answer the following question: Given a fixed element size $\left(\mu_{\text {element }}=0\right.$


Figure 3.6: Failure Probability for $N=1$ to $20, \sigma_{o s, i}=1$, spec $=10^{-2}$
and $\left.\sigma_{\text {element }}=1\right)$ and an offset specification $\left(\right.$ spec $=10^{-2}$ for Fig. 3.6), what are the possible ( $N, k$ ) pairs that will satisfy a given failure probability specification $p_{\text {fail,total }}$ ? A MATLAB script can search through the data, find the appropriate ( $N, k$ ) pairs, and produce
the highest $k / N$ ratio for each $k$. In Fig. 3.6, these points have been marked with circles for each $k$ where the $p_{\text {fail,total }}$ specification can be met. Although not fully monotonic due to the discrete nature of the problem, we observe higher $k / N$ ratios as $k$ increases. In other words, red circles to the right have, in general, better utilization of elements compared to the ones on the left. It should be noted that any deviation from monotonicity with increasing $k$ is small.

Fig. 3.7 shows a comparison of the three methods as $N$ is varied. Each selectable element offset is assumed to follow a unit normal distribution $\mathcal{N}(0,1)$ with spec $=2 \times 10^{-2}$. Only half of all elements are allowed to be selected for $\operatorname{SES}(k=N / 2)$. For redundancy, each element forms one block ( $N$ redundant blocks). Dramatic improvement in success probability can be seen with SES compared to both redundancy and scaling.


Figure 3.7: Comparison of SES, Redundancy and Scaling

Although the previous scenario is informative, it might not be completely realistic. In most cases, designers are not restricted to choose a fixed element size; they can choose among fewer but larger elements (e.g., $\mu_{\text {element }}=0, \sigma_{\text {element }}=\sigma_{o s, i}<1$ ). For the comparator example, assume that all transistors in the replicated section have a minimum length $(L)$. Consider the following two cases:

- Case 1: $N_{1}$ total elements; in each element, all the transistors have width $W_{1}$, giving a standard deviation of $\sigma_{1}$. We are selecting $k_{1}$ elements among $N_{1}$.
- Case 2: $N_{2}$ total elements; in each element, all the transistors have width $W_{2}$, giving a standard deviation of $\sigma_{2}$. We are selecting $k_{2}$ elements among $N_{2}$.

For a fair comparison, assume that the total area in two cases is the same, i.e. $N_{1} \times W_{1}=$ $N_{2} \times W_{2}$, ignoring routing area and the storage for configuration bits. We want to determine which case has better resource utilization (has higher $k / N$ ratio). In order to achieve this goal, we first regenerate the plot in Fig. 3.6 for different $\sigma_{\text {element }} /$ spec ratios to normalize it to spec. Fig. 3.8 shows these individual plots forming the slices of a "decision cube." Using the decision cube, the designer can evaluate tradeoffs between differing element sizes for a given spec. Each slice of the cube corresponds to a different element size.


Figure 3.8: Decision Cube

The decision cube is built only once for a predetermined range of $\sigma_{\text {element }} /$ spec ratios (where spec is the offset specification). Each $\sigma_{\text {element }} /$ spec plot forms one slice of the cube. Since the cube is built on the normalized values (the $\sigma_{\text {element }} /$ spec ratio), it only needs to be built once. The same cube can be used for different designs with different resolutions or process technologies. In most practical applications, desired $\sigma_{\text {element }} /$ spec ratios would be
from $10^{-1}$ to $10^{-3}$. An arbitrary number of slices can be formed between these points, but 100 slices are generally enough to converge on a decision of $(N, k, \sigma)$ triplets that will satisfy the failure probability $\left(p_{\text {fail,total }}\right)$ target. A simple design recipe is:

1. Specify the offset specification spec.
2. Specify the failure probability target $p_{\text {fail,total }}$ for each comparator. For example, if we would like to find a configuration that will satisfy the spec $99.5 \%$ of the time, $p_{\text {fail }, \text { total }}=5 \times 10^{-3}$.
3. Specify the offset standard deviation $\left(\sigma_{\text {element }, i}\right)$ for each type of selectable element. For example, assume that the basic selectable element is a single transistor. The first selectable element type could be a transistor with width $W_{1}$ with standard deviation $\sigma_{\text {element } 1}$, and the second selectable element type could be $W_{2}$ with standard deviation $\sigma_{\text {element } 2}$. These values can be determined by running circuit simulations for the design in the given process technology.
4. Calculate the ratio $\sigma_{\text {element }, i} /$ spec for each selectable element type.
5. Input the results in steps 2 and 4 to a MATLAB script. For each selectable element type $\left(\sigma_{\text {element }, i}\right)$, the script will produce all the $(N, k)$ pairs that will satisfy the requirements in steps 1 and 2 using the cube in Fig. 3.8. Since the decision cube is pre-built, this is an efficient process step.
6. Now choose between the ( $N, k, \sigma_{\text {element }, i}$ ) triplets that satisfy the requirements in steps 1 and 2 for the specific application. We have observed that in many cases, selecting half the total available elements ( $k=N / 2$ ) results in a good trade-off between resource utilization and the number of configuration bits.

The choice between $(N, k, \sigma)$ triplets that satisfy target $p_{\text {fail,total }}$ is highly dependent on the requirements of the specific application. Consider the differential amplifier in Fig. 3.9
where the input transistors are replicated $N$ times. Assume that each input transistor has transconductance $g_{m}$, parasitic drain to bulk capacitance $C_{p}$ and that $C_{p}$ is the dominant capacitance at the output nodes. If $k$ pairs are selected and transistor output impedance is assumed to be infinite, low frequency gain of the amplifier can be calculated as $k g_{m} R_{L}$. Bandwidth is $\frac{1}{R_{L} N C_{p}}$. Gain bandwidth product (GBW) is then given as:

$$
\begin{equation*}
G B W=\text { Gain } \times \text { Bandwidth }=k g_{m} R_{L} \times \frac{1}{R_{L} N C_{p}}=\frac{k}{N} \times \frac{g_{m}}{C_{p}} \tag{3.5}
\end{equation*}
$$



Figure 3.9: Differential Amplifier
Static power consumption can be approximated as $k V_{D D} I_{b}$, where $I_{b}$ is the current consumption of each selected branch. $(N, k, \sigma)$ triplets satisfying the $p_{\text {fail,total }}$ for offset can be obtained by following the previously described design recipe. The choice among these triplets can be made by considering the requirements of the target application:

- For an application where GBW is the important factor, highest $k / N$ ratio among the valid triplets should be chosen. Increasing device width $W$ (decreasing $\sigma$ ) does not yield much benefit since both $g_{m} \approx \mu_{n} C_{o x} \frac{W}{L} \Delta V_{g s}$ and $C_{p} \approx W C_{j u n c}$ (ignoring drain junction sidewall capacitance) are linearly dependent on $W$ [28].
- For high gain, triplets with high $k$ values can be chosen. Another option is to increase device width $W$ to increase $g_{m}$, and hence look at slices with lower $\sigma /$ spec.
- $N$ can be decreased for the highest bandwidth. Reducing $W$ (high $\sigma /$ spec slices) will also help.
- To minimize the static power consumption, $k$ should be decreased.
- If the storage area of the configuration bits needs to be small, triplets with smaller $N$ should be chosen. A design with larger elements (smaller $\sigma /$ spec) will require smaller $N$ to achieve the same $p_{\text {fail,total }}$, and will have smaller storage penalty.

The decision cube in Fig. 3.8 assumes that all available subsets are searched for a given set of $N$ elements. If there is enough processing power available to perform an intelligent search, it is possible to search through all $2^{N}-1$ available combinations. An easier but less optimal option is a greedy search, where random combinations are uploaded to the differential amplifer until a successful combination is found. We can limit the number of trials to ensure that calibration time is not very long at the expense of a lower probability of finding a good combination. The maximum allowable trials can be added as a fourth dimension to the decision cube on Fig. 3.8, allowing the designer to evaluate the calibration time trade-off in addition to the $\left(N, k, \sigma_{\text {element }}\right)$ triplets.

### 3.2.2 6-Bit Flash ADC in 45nm SOI CMOS

### 3.2.2.1 Basics

In the 65 nm comparator array example, the comparators have reference inputs ( $V_{\text {ref }} \pm$ ) that are generated by a precise on-die resistor ladder. The input offset of each comparator needs to be as close to 0LSB as possible to accurately compare the inputs to the reference inputs, and the modeling in the previous section considered this fact. SES was used to select a good combination among the subsets of each comparator.

Several recent Flash ADCs in the literature eliminate the reference resistor network completely and use built-in offsets of the comparators as the reference voltages [2-5, 29]. It is indeed possible to calibrate the input offset of each comparator using SES to the desired reference voltages. In this case, mismatch variations actually help in the design of the ADC. Consider the $N$-bit Flash ADC architecture in Fig. 3.10. Each clocked comparator in the array is only connected to the input differential voltage, and the outputs of the comparators are connected to a digital backend consisting of a 1's counter (Wallace Tree Adder). The offsets of the comparators are calibrated to the desired $2^{N}-1$ reference voltages, rather than to 0LSB target in the previous case.


Figure 3.10: N Bit Flash ADC with Built-in Reference

Due to the structure of the digital backend, the comparators need not be ordered. As shown in Fig. 3.11 for a 6-bit Flash ADC, any comparator can fill any bin (reference voltage) in the full scale range. A few redundant comparators can be added for better coverage of the bins in the range. The bins have the centers at the desired reference voltages with a width of $1 L S B( \pm 0.5 L S B$ from the center of the bin $)$.

The full scale range (FSR) of the ADC with built-in references would ultimately be limited by the amount of variation in the comparator design: The higher the mismatch variations, the wider the FSR. To widen the FSR, the comparator schematic in Fig. 3.12 is used. In this schematic, 12 identical selectable elements (controlled by Sel $<1: 12>$ ) are


Figure 3.11: Bins in a 6 -bit Flash ADC
used for "fine tuning" of the offset, similar to the case in the 8-bit ADC design. There are two additional "coarse tuning" branches (controlled by Sel $<13: 14>$ ) that can sink current to systematically shift the mean of the offset distribution. Assuming that 6 elements are chosen among 12, each comparator has 924 different offset we can select from. Considering that the coarse knobs can also be turned on, a total of $2772(924 \times 3)$ different offsets can be achieved per comparator using SES. Since any comparator is allowed to fill any bin in the FSR, many combinations are available to build a self-referenced flash ADC.

The effect of the coarse tuning knobs can be seen on Fig. 3.13. The input offset distribution of the comparator with both coarse knobs turned off is centered at 0 . Minimum and maximum achievable offset (approximately $\pm 3 \sigma_{\text {offset }}$ ) limits the full scale range. Using the coarse knobs, the mean of the distribution can be shifted and hence a wider FSR can be achieved.

### 3.2.2.2 Modeling

Consider the self-referenced flash ADC architecture in Fig. 3.10. Our goal in this chapter is to determine the number of elements $(N)$ in each comparator in Fig. 3.12, the amount of shift the coarse tuning knobs introduce ( $\Lambda$ )and the size (offset) of each element ( $\sigma_{\text {element }}$ ) so


Figure 3.12: Self-referenced Flash ADC Comparator
that all the bins in the ADC can be filled with $99.5 \%$ probability. In the analysis below, a 6 -bit Flash ADC implementation is chosen as a demonstration.

SES based self-referenced ADC modeling is significantly different from the more conventional 8-bit ADC design described previously. For the 8-bit design, post-manufacturing calibration of each comparator is independent from each other and all the offsets are targeted to 0LSB. This independence allows to build a decision cube for each comparator and calculate the success probability of the chip easily by assuming independence among the comparators. For the self-referenced design, the comparators need to be assigned to bins with the other comparators in mind, the allocation should be done in the global level. Considering this dependence among the comparators, a new method is applied for the 6 -bit Flash ADC:

1. Determine the number of comparators in the ADC design. For a 6 -bit $\mathrm{ADC}, 2^{6}-1=63$ comparators are required.


Figure 3.13: Coarse Tuning to Increase Full Scale Range
2. Determine a value for the number of redundant comparators $(R)$ that are allowed in the design.
3. Determine a value for $N$, the number of selectable elements in each comparator. The offset standard deviation of each element is $\left(\sigma_{\text {element }}\right)$.
4. Determine a value for $\Lambda$, the amount of shift that the coarse knobs introduce.
5. Determine a value for the full scale range (FSR), normalized to $\sigma_{\text {element }}$.
6. Run a Monte Carlo analysis in MATLAB for the ADC with $N+R$ comparators, each element having an offset distribution of $\mathcal{N}\left(0, \sigma_{\text {element }}\right)$. Determine if all the bins in the ADC can be filled by using these comparators. From the results, calculate the probability that at least one successful configuration will be found with the given $R, N$, $\Lambda$ and $F S R$.
7. Repeat the Monte Carlo run for a set of different values for $R, N, \lambda$ and $F S R$.

The pseudo code for the described method is:

```
\(R=\left[r_{1} \cdots r_{i}\right]\)
\(N=\left[n_{1} \cdots n_{j}\right]\)
\(\Lambda=\left[\lambda_{1} \cdots \lambda_{k}\right]\)
\(F S R=\left[f s r_{1} \cdots f s r_{t}\right]\)
for all \(r\) in \(R\) do
    for all \(n\) in \(N\) do
```

```
    for all }\lambda\mathrm{ in }\Lambda\mathrm{ do
        for all fsr in FSR do
            Run Monte Carlo analysis for the ADC, calculate the success probability.
        end for
        end for
    end for
end for
```

For the 6-bit ADC in consideration, only half of the total available elements are allowed to be chosen $(k=N / 2)$. It is possible to extend the methodology by extending it to other $k$ values $(1 \leq k \leq N)$ similar to the 8 -bit ADC case. For 6 -bit ADC the chosen range of values are:

$$
\begin{gathered}
0 \leq R \leq 5 \\
8 \leq N \leq 16 \\
0 \leq \Lambda \leq 2 \sigma_{\text {element }} \\
1.5 \sigma_{\text {element }} \leq F S R \leq 6.5 \sigma_{\text {element }}
\end{gathered}
$$

For each of the cases above, $10^{4}$ Monte Carlo runs have been performed. In each run, all the available subset offsets for each comparator are computed. A heuristic algorithm is used to check if all the bins in the FSR can be filled (the bin width is $\pm 0.5 L S B$ from the center of the bin). The algorithm runs through all the comparators (with all their subsets) to fill the Bin 1 on Fig. 3.11. The first comparator found to fill the bin is assigned to Bin 1, and that comparator is marked "no longer available". The algorithm then skips to Bin 63 and repeats the process. Bins are filled from the edges to the center and the order of the bin filling in this example is $[1,63,2,62, \cdots, 32]$. If at any time during the process a bin cannot be filled with any remaining comparator, that Monte Carlo run is marked as a failure.

The simulated probability of finding a good configuration (i.e. filling of all the bins) for the 6 -bit ADC is given in the z-axis on Fig. 3.14. In this figure, no coarse tuning is applied and no redundants are included ( $\Lambda=0, R=0$ ).

From Fig. 3.14, it is clear that the maximum achievable FSR is approximately $2.5 \sigma_{\text {element }}$ to $3 \sigma_{\text {element }}$ for a reasonable product yield. As the number of selectable elements increases,


Figure 3.14: Simulated Success Probability for 6-Bit ADC, $\Lambda=0, R=0,8 \leq N \leq 16$
a significant increase in success probability is observed; but the FSR is still limited. This is expected, since the minimum/maximum achievable offsets (and hence the bounds of the FSR) are limited by the amount of variation.

Fig. 3.15 shows the same plot when 2 redundant comparators are added to the design. Addition of even a small number of redundant comparators increases the success probability significantly, but does not have a big impact on FSR. The fundamental limit of variation is still existent. This observation is in fact unconventional: More mismatch variation is actually desired by this application to increase the FSR.

Fig. 3.16 shows a comparison of the success probability when the coarse knobs are added with different amounts of shift they introduce. A clear increase in achievable FSR is noted when the shift amount is increased. It is important to note that when the shift amount is increased beyond a certain level, the success probability for the 6 -bit ADC decreases for low FSR values. This is mainly due to the fact that the distributions overlap less in 3.13, making


Figure 3.15: Simulated Success Probability for 6-Bit ADC, $\Lambda=0, R=2,8 \leq N \leq 16$
it harder to fill the tight bins with low FSR values. However, this is not a big concern since higher FSR values are generally desired.

Finally, Fig. 3.17 shows the plots when 5 redundant comparators are added to the design. Success probability increases significantly with the relatively few additional redundant comparators. It should be noted that any unused comparators in the design can be turned off easily by turning off all the elements shown in Fig. 3.12, hence they have minimal impact on the total power consumption. Based on the modeling results, the comparators in the actual silicon implementation are chosen to have 12 elements with $\Lambda \approx 1.5 \sigma_{\text {element }}$.

The results of the described methodology allows the designer to determine the most suitable design choice among the various available trade-offs ( $N, R, \Lambda, F S R$ ). The process can be applied to different self-referenced ADC designs as the target application requires, such as a different ADC resolution. The available subset space can be limited if an application requires a maximum calibration time constraint.


Figure 3.16: Simulated Success Probability for 6 -Bit ADC, $\Lambda=0$ to $2 \sigma_{\text {element }}, R=0,8 \leq$ $N \leq 16$

Yield with $0.00 \times \sigma_{\text {Element }}$ Systematic Shift and 5 Redundants( 6 bit ADC)


$$
\text { Yield with } 1.50 \times \sigma_{\text {Element }} \text { Systematic Shift and } 5 \text { Redundants( } 6 \text { bit ADC) }
$$



Yield with $1.00 \times \sigma_{\text {Element }}$ Systematic Shift and 5 Redundants(6 bit ADC)


Yield with $2.00 \times \sigma_{\text {Element }}$ Systematic Shift and 5 Redundants( 6 bit ADC)


Figure 3.17: Simulated Success Probability for 6 -Bit ADC, $\Lambda=0$ to $2 \sigma_{\text {element }}, R=5,8 \leq$ $N \leq 16$

### 3.2.3 SES Based MEMS Resonator Array

### 3.2.3.1 Basics

Microelectromechanical systems (MEMS) based resonators can achieve very high quality factors ( $Q>1000$ ), a characteristic highly desirable for (RF) systems [30]. The ultimate goal is to integrate these resonators in standard CMOS process to be used as on-chip channel select or band-pass filters, or mixer-filters. Unfortunately, these high-Q systems are prone to manufacturing tolerances and even a few tenths of a percent variation in center frequency can throw the resonator out of the desired filter band.

A simple block diagram of a superheterodyne RF receiver is shown in Fig. 3.18, where a MEMS resonator is used as a mixer-filter. As an example, consider a Personal Communications Service (PCS) signal at $f_{P C S}=1.9 G H z$ received through the antenna and filtered using a band-pass filter (BPF). The signal is then amplified by the low noise amplifier (LNA). The MEMS resonator mixes the LNA output with the near-1.9GHz local oscillator (LO) signal $f_{L O}$, resulting in a low frequency output at $\left|f_{P C S}-f_{L O}\right|$. Resonant frequency $\left(f_{R E S}\right)$ of the MEMS device must be at $\left|f_{P C S}-f_{L O}\right|$ : The sharp frequency response ( $Q=\frac{2 \pi \times f_{R E S}}{\text { Bandwidth }}$ ) of the resonator rejects all frequencies outside a narrow band of $f_{R E S}$, hence the name mixer-filter. The output is then converted to the digital domain using an ADC , and the baseband output is sent to a digital signal processor (DSP). In today's commercial communicaton systems, transistors and LC filters are used for on-chip mixing and filtering operations instead of MEMS devices. This is in part due to the large variation in $f_{R E S}$ with manufacturing. In addition, antenna and the BPF are generally off-chip components because of the difficulty in integrating them on the CMOS die.

As a further improvement of this concept, the MEMS device could potentially be placed right after the antenna, replacing the costly off-chip BPF. The desired band or channel would then be filtered by the resonator and amplified by an LNA on the die [30]. It is possible to use the MEMS resonator as a mixer-filter to downconvert the input signal(Fig. 3.19).


Figure 3.18: Superheterodyne Receiver Architecture

However, excessive insertion loss of MEMS devices and tight signal to noise ratio (SNR) requirements might limit the feasibility of putting the resonator earlier in the receiver chain than the LNA.


Figure 3.19: RF Receiver Architecture with On-die Channel/Band Filtering

In most CMOS-compatible MEMS resonators, electrostatic drives are used to convert electrical inputs to a mechanical resonance. The mechanical resonance induces a change in capacitance at the output node, inducing an electrical output current. Due to the losses in the energy conversion, high insertion losses are prevalent. This is problematic for RF systems where the received signal power is generally very low. Electrically parallel and mechanically coupled arrays of resonators have previously been proposed to increase the effective resonator gain and improve the SNR of the systems [30-34]. One advantage of mechanical coupling
is to pull the frequencies of the individual resonators together to compensate for process variations. The net result is similar to the Pelgrom type sizing in transistors: Standard deviation of the resonant frequency of $N$ mechanically coupled resonators is $1 / \sqrt{N}$ of the standard deviation the individual resonators $\left(\sigma_{f_{R E S}, \text { array }}=\frac{\sigma_{f_{R E S}, \text { individual }}}{\sqrt{N}}\right)$ [34]. However, mechanical coupling can introduce other modes of resonance to the array, resulting in a frequency response with many unacceptable peaks [32].

Another way of compensating for variations in the resonant frequency is to tune the DC bias or polarization voltages $\left(V_{d c}\right)$ that many MEMS devices require [33]. However, sensitivity of $f_{R E S}$ to $V_{d c}$ is generally low and tens of volts of $V_{d c}$ change might be required to compensate for process variations. In the case of multiple resonators in an array, fine tuning of each resonator with $V_{d c}$ can quickly become infeasible.

Given the similarity of the problem in MEMS devices to the previously described transistor based case studies, it is clear that SES can be very beneficial in the case of an array of MEMS resonators with large-scale process variations. Consider an array of identical, electrically parallel MEMS resonators shown in Fig. 3.20. Assume that among the $N$ resonators in the array, at least $k$ need to be selected to have enough gain as dictated by the RF system design. Due to the manufacturing variations, not all of the resonators will have the same center frequency. Selected resonators should fall within a small frequency band, close to the bandwidth of one resonator, so that the combined electrical response does not have large ripples(Fig. 3.21). Undesired resonators in the array can be turned off via $V_{d c}$ (not shown). Combined electrical output can then be amplified and further processed on the chip. The array can be used as a filter or mixer-filter as previously described.

If a larger filter bandwidth is required as in a band-pass filter, the approach of binning the resonators can be extended as shown in Fig. 3.22. The band of interest can be sliced into several bins, each bin covering a frequency range of approximately the bandwidth of each resonator. Among all the available resonators in the array, the bins can be filled one by one until each bin has enough resonators for the gain requirement as dictated by the


Figure 3.20: Electrically Parallel Resonator Array


Figure 3.21: Frequency Binning of Resonators

RF architecture. This approach requires a global consideration of the available resonators, similar to the 6 -bit ADC design problem. In the following section, an adaptation of the SES methodology will be described for a MEMS resonator array intended to be used as a filter/mixer-filter.

### 3.2.3.2 Modeling

In this section, our design process is described for a filter/mixer-filter consisting of an array of nonideal MEMS resonators. A basic square frame resonator described in [35] is used as the building block of the array, however, the methodology is not dependent on the resonator choice.


Figure 3.22: Frequency Binning of Resonators in a Large Band

Resonators are modeled in Cadence Virtuoso design environment using existing verilogA NODAS modules in the CMU MEMS laboratory [36]. Minor modifications are made to NODAS library modules to add process (global) and mismatch (local) variations. Due to the lack of rigorous existing characterization data for the resonators, estimated variation values are used for various mechanical properties such as Young's modulus, width and thickness.

A block diagram describing the filter design flow is shown in Fig. 3.23. A square frame resonator is built in Cadence using NODAS mechanical beam verilogA modules. Estimated variation parameters are used to run Cadence Monte Carlo simulation for one square frame resonator and the resonant frequency $\left(f_{R E S}\right)$, gain $\left(G_{R E S}\right)$ and quality factor $(Q)$ for each run are extracted. Results are fed into MATLAB to build an RLC equivalent circuit of the resonator with a transfer function given as [37]:

$$
\begin{equation*}
T F(s)=\frac{s C}{L C s^{2}+s R C+1}=\frac{s / L}{s^{2}+s \frac{R}{L}+\frac{1}{L C}}=\frac{s / L}{s^{2}+2 \zeta \omega_{n} s+\omega_{n}^{2}} \tag{3.6}
\end{equation*}
$$

where $\omega_{n}=2 \pi f_{R E S}=\frac{1}{\sqrt{L C}}$ and $\zeta=\frac{1}{2 Q}=\frac{R}{2} \sqrt{\frac{C}{L}}$. The gain of the system at resonance is $|T F|_{s=j 2 \pi f_{R E S}}=1 / R$. Monte Carlo simulations in Cadence are run separately for process and mismatch variations to identify the relative importance of each type of variation. RLC values calculated in MATLAB are assumed to follow a joint (multivariate) normal distribution with
a probability density function (pdf):

$$
\begin{equation*}
\mathcal{N}(\mu, \Sigma)=\frac{1}{(2 \pi)^{3 / 2}|\Sigma|^{1 / 2}} e^{-\frac{1}{2}(x-\mu)^{\prime} \Sigma^{-1}(x-\mu)} \tag{3.7}
\end{equation*}
$$

where $\mu$ is the vector of mean (nominal) values and $\Sigma$ is the covariance matrix for RLC values.


Figure 3.23: MEMS Filter Design Flow

Desired filter bandwidth $(B W)$, bin spacing $(\Delta)$ in the bandwidth, filter gain $\left(G_{\text {filter }}\right)$ and calculated nominal RLC values are used to find a good "bin profile" in MATLAB using Monte Carlo simulations. The bin profile is the number of resonators assigned to each bin in the bandwidth as shown in Fig. 3.24. If all bins in the bandwidth have the same number of resonators, peaks are observed at the edges of the filter response. This is due to the abrupt cut-off of resonators at the filter edges, and the lack of negating effect of resonators in nearby bins. In order to prevent these peaks, a staggered decrease of resonators from center of the band to the edges is used (Fig. 3.24). If the output of the resonators in the red shaded edge bins is inverted, a sharper skirt roll-off can be achieved (shown in the red response curve on

Fig. 3.24). In MATLAB, a good bin profile is found using the following methodology:

1. Given $B W$ and $\Delta$, find the center frequency for each bin. Calculate the RLC values for each bin using the mean $Q$ and $G_{R E S}$ data from the Cadence Monte Carlo simulations and Eq. 3.6. $f_{\text {RES }}$ for each bin is the center frequency of each bin.
2. Determine a set of values for the maximum number of resonators allowed in each bin $\left(N_{\max }\right) . N_{\max }$ directly affects filter gain $G_{\text {filter }}$.
3. Determine the number of bins at the edges that can have fewer than $N_{\max }$ resonators. The profile is constrained to be symmetrical around the midpoint of the filter bandwidth. Only a monotonic increase of resonators per bin is allowed from the edges to the center of the filter bandwidth. Bins at the farthest edge of the filter are allowed to have inverted responses for sharper stopband roll-off.
4. Build a random bin profile based on the given constraints. Find the frequency response of the filter with this bin profile based on the RLC values found in step 1. Calculate an error value for the profile based on the flatness of the frequency response in the passband and the attenuation in the stopband. Error value is lower if the response is flat in passband, and attenuation is high in the stopband.
5. Repeat step $4 T$ times and select the "best bin profile" with the smallest error. T can be increased to find more optimal designs, or a brute force approach by searching through all possible bin profiles can be used.

Simulated annealing can also be used for the bin profile optimization. We have found empirically that $T \approx 1000$ yields acceptable results for this project. The best bin profile for a set of different $N$ values is found using the above approach.

The final step in completing the filter design is the Monte Carlo simulations in MATLAB. The following steps are executed for each Monte Carlo run:


Figure 3.24: MEMS Filter Bin Profile

1. Create $N_{i} \leq N_{\max }$ resonators for each bin in the bandwidth. $N_{i}$ is the number of resonators for bin $i$, as dictated by the best bin profile. The resonators are defined by the equivalent RLC model in Eq. 3.6. Variations are added using Eq. 3.7. Mean values for L and C are adjusted for each bin, since they define the resonant frequency $f_{R E S}$. Each resonator among $N_{i}$ has different $f_{R E S}, Q, G_{R E S}$ parameters due to the added variations.
2. Create $N_{\text {extra }}$ extra resonators for each bin. These are redundant resonators to ensure good statistical success probability that all of the bins in the bandwidth will be filled. They are generated in the same way as the resonators in the previous step (Fig. 3.25).
3. Determine if all the bins in the bandwidth can be filled as the best bin profile dictates. Note that some of the resonators will not be used. If all the bins can be filled, the frequency response of the filter is computed. If the filter gain $\left(G_{\text {filter }}\right)$ has less than $\pm 1 d B$ ripple in the passband (to ensure good passband characteristics), the Monte Carlo run is counted as a success.

Fig. 3.26 shows the results of $10^{4}$ MATLAB Monte Carlo simulations for a MEMS filter/mixer-filter designed for a $10 \mathrm{MHz}-10.1 \mathrm{MHz}$ passband. Bin spacing $(\Delta)$ is chosen


Figure 3.25: Bin Profile with Extra Added Resonators
as $3 k H z$ for a square frame resonator with a mean $Q$ of 2000 . Several curves are generated for different $N_{\max }$ values.

Results in Fig. 3.26 are obtained using only mismatch (local) variations turned on. Estimated process (global) variations result in a resonant frequency standard deviation ( $\sigma_{f_{\text {RES }}}$ ) of more than 350 kHz , a value larger than the entire passband. This requires a large number of backup bins filled with resonators extending far out from the edges of the passband and would make SES infeasible. Fortunately, process variations can be alleviated in the mixerfilter operation by shifting the LO frequency $\left(f_{L O}\right)$ to compensate for the global $f_{R E S}$ shift. Phase locked loops with a wide tuning range make this trade-off possible [38]. $\sigma_{f_{R E S}}$ due to mismatch is 25 kHz .

Based on the results on Fig. 3.26, $90 \%$ success probability can be achieved for $N_{\max }=8$ and $N_{\text {extra }}=12$. Based on the given $B W$ and $\Delta$ values, a total of 618 resonators are required to guarantee the $90 \%$ success probability, among which 210 are used. The designed square frame resonator has an area of approximately $200 \mu m \times 200 \mu m$ and a total die area of approximately $25 \mathrm{~mm}^{2}$ is required to accommodate all resonators. Although the required area is large, several important points need to be considered:


Figure 3.26: Expected Success Probability for the MEMS Filter (Only Mismatch Variations Turned On)

- The analysis assumes a basic square frame resonator with $f_{R E S} \approx 10 \mathrm{MHz}$. Other resonator designs with smaller area (with generally higher $f_{R E S}$ ) can reduce the area penalty significantly.
- The value of $N_{\max }$ is highly dependent on $G_{\text {filter }}$ and the system SNR requirements. Based on a specific application and a different resonator design, $N_{\max }$ can be lower.
- Mismatch variation values used in the analysis are estimated values and can be pessimistic. A test run in the TSMC $0.35 \mu \mathrm{~m}$ process has been completed by colleagues in CMU MEMS laboratory to collect statistical variation data. Results from this run will ultimately be used to better estimate the scale of variations that can be expected from a commercial scale CMOS process.
- Even with the large area, potential cost savings from removing the off-chip BPF in Fig.
3.19 can be significant if resonators are integrated on the same die with baseband RF circuits. A stacked two-chip solution can also be considered with MEMS mixer-filters and the rest of the RF baseband integrated on the same package.


### 3.2.3.3 Simulation Results

A Cadence Monte Carlo simulation has been performed to observe the improvement in the MEMS filter array with and without SES. A block diagram of the simulation steps is shown in Fig. 3.27.


Figure 3.27: Simulation Setup for the MEMS Filter

An array of 618 resonators, as modeled with verilogA, form a schematic design based on the results obtained from the previous section. 10 resonators are connected to one transimpedance amplifier to amplify the output of the resonators. LO input is set at $1 V \mathrm{DC}$, and RF input is set at $1 \mathrm{~V} \mathrm{AC} . V_{p}$ is set to 20 V DC for all resonators (all on) and a Monte

Carlo analysis is run with 1 sample. Resonant frequency $\left(f_{R E S}\right)$ of each resonator is stored in a data file. The file is read into MATLAB and a selection based on the best bin profile is performed. The selection data is then used to modify the $V_{p}$ of each resonator and the same Monte Carlo analysis is run in Cadence. For comparison to SES, the same simulation is repeated when there are 8 resonators in each bin and all of them are turned on (no selection). Results are shown in Fig. 3.28. For the SES array, magnitude variation in the passband is lower and the attenuation in the stopband is higher. Phase response is also considerably smoother.

### 3.3 Summary

The basics of the Statistical Element Selection methodology were introduced in this chapter. SES relies on selecting a subset of identically laid-out, small elements to achieve a desired specification. Its application to three different circuits were described in detail and simulation results were presented.

Simulation results show that SES can achieve orders of magnitude matching improvement compared to both redundancy and sizing. This is partially due to the vast number of subsets available even with small number of elements. A decision cube was built to aid in the design of SES-based flash ADCs with a traditional reference ladder based architecture.

SES also allowed us to benefit from the random variations to build a 6 -bit, self-referenced flash ADC where input offsets were calibrated to the desired reference levels. Design steps to estimate the product yield, determine the number of selectable elements and the element size were described.

As a further extension of the concept, a filter consisting of an array of MEMS resonators were described. SES was used to find the number of required resonators to achieve a given product yield. Circuit simulations were performed to measure the MEMS filter response with and without the application of SES.


Figure 3.28: Comparison of Magnitude and Phase for the MEMS Filter with SES and without SES

## Chapter 4

## Design Details and Silicon Results

### 4.1 Comparator Array in 65nm Bulk CMOS

### 4.1.1 Test Chip Architecture

A test chip consisting of comparators in 65 nm bulk CMOS was designed and fabricated in order to verify the modeling results. The comparator in Fig. 3.4 with 32 selectable elements has been used as the basic building block, out of which 16 are chosen. Each die includes 255 comparators, intended to be used for an 8-bit ADC (Fig. 4.1, [22]). The architecture of the test chip and the timing diagram for calibration is given in Fig 4.2.

The number of available selectable elements, the subset size and the size of each element have been determined by using the methodology in the previous section. Maximum allowed calibration steps per comparator is chosen as 10,000 . The full scale range (FSR) of the intended 8 -bit ADC is 1 V , giving a least significant bit (LSB) of 3.9 mV . A comparator is defined as "within the specification" if at least one combination among the 10,000 steps results in an input offset voltage amplitude smaller than 0.5 LSB . The design point is chosen so that all 255 comparators will be within the specification with $99.5 \%$ probability. During the design of the comparator, transistors in the shared block are sized such that their effect on the overall offset is much smaller than the replicated transistors. The offset distribution of


Figure 4.1: Die Photo of 65 nm Test Chip
the comparator was obtained by running a Monte Carlo simulation during the design phase. Threshold voltage mismatch in the input transistors ( $\Delta V_{t h, i n}$ ) and the shared transistors ( $\Delta V_{\text {th shared }}$ ) were noted for each run. Using an approach similar to [39], a linear offset model was built from the Monte Carlo simulation results:

$$
\begin{equation*}
\text { Offset }=\left(a \times \Delta V_{t h, i n}\right)+\left(b \times \Delta V_{\text {th,shared }}\right) \tag{4.1}
\end{equation*}
$$

The sizes of the transistors in the replicated section are increased until their effect on the offset is much smaller than the input transistors $(b \ll a)$. The total width of the shared transistors is comparable to the sum of the widths of the replicated transistors.

There are $8160(=255 \times 32)$ select flip-flops that store the configuration bits for the 255 comparators. The differential output (2 bits) of each comparator is stored in 2 scan flipflops, yielding a total of 510 output scan flip-flops. In order to find the input offset of each comparator, the timing diagram on Fig. 4.2 needs to be examined. In region 1, configuration bits are scanned into the select flip-flops by using Scan In input and running Select Clk. Scan Enable is held low during this period. After all the selection bits are scanned in, Core Clk for the latch type comparator is run a few times to allow the outputs of the comparators


Figure 4.2: Comparator Array Test Chip Architecture
to settle and clear any metastability in the latches(region 2). Comparator outputs are then loaded to the output scan flip-flops, which are subsequently put into scan mode by raising Scan Enable (region 3). The differential output for each comparator is then read from Scan Out by toggling Scan Clk. The inputs ( $V_{\text {in+/- }}$ ) are swept in small steps and the outputs of the comparators are read for about 50 times through the output scan chain. At each input step, the number of times that each comparator outputs a value of 1 is noted. The input voltage vs. number of 1's curve is then fitted to a Gaussian cumulative distribution function, whose mean is used as the input offset voltage of the comparator for the given configuration.

### 4.1.2 Test Setup

The test setup for the measurements is shown on Fig. 4.3 [22]. The setup is automated using built-in MATLAB toolboxes on the PC. Keithley 2400 sourcemeters with high precision are used for input voltages, and Agilent E3648A DC sources are used to supply the power to the core, I/Os, and the voltage references for the resistor ladder on the die. Core power supply is set at 0.8 V and both ends of the resistor ladder are set at 0.4 V . The chip is bonded in a QFN package and connected to a PC board using a compatible socket. Using the test socket, packaged die can be changed easily for statistical data collection.


Figure 4.3: Measurement Setup

Only a maximum of 10,000 calibration steps per comparator is allowed among more than $600 \times 10^{6}$ available combinations for each comparator. Since it is not possible to go through each of the 10,000 combinations per comparator due to measurement time constraints, the following method is applied to find the best sets:

1. Randomly determine 10,000 subsets of size $k=16$ that each comparator can be configured to. These are the same for all comparators.
2. Determine a number of these subsets ( $X$ among 10,000 ) to be loaded to each comparator. Store these subsets in a selection matrix:

$$
\operatorname{Sel}_{X \times N}=\left(\begin{array}{cccc}
S_{1,1} & S_{1,2} & \cdots & S_{1, N} \\
S_{2,1} & S_{2,2} & \cdots & S_{2, N} \\
\vdots & \vdots & \ddots & \vdots \\
S_{X, 1} & S_{X, 2} & \cdots & S_{X, N}
\end{array}\right)
$$

Each row of the matrix contains the configuration bits for each of the $N=32$ elements. If element $e$ of subset $s$ is selected, $S_{s, e}=1$, and 0 otherwise. The sum of each row is $k=16$.
3. Measure the offset of each subset in the selection matrix and store it in a measured offset vector:

$$
M O_{X \times 1}=\left(\begin{array}{c}
m o_{1} \\
m o_{2} \\
\vdots \\
m o_{X}
\end{array}\right)
$$

4. Find the estimated offset of each element in each comparator using the least squares solution in MATLAB:

$$
\begin{equation*}
I O_{N \times 1}=\frac{1}{k} \cdot\left(\text { Sel }_{X \times N}\right)^{-1} \cdot M O_{X \times 1} \tag{4.2}
\end{equation*}
$$

5. Using the estimated offsets of each element, in MATLAB find the $T$ best subsets among the 10,000 subsets that are predicted to have less than 0.5 LSB offset.
6. Upload the $T$ subsets for each comparator to the test chip, and record the measured offset for each trial. For each comparator, select the subset that gives the lowest measured offset.

### 4.1.3 Measurement Results

A 65 nm bulk CMOS test chip consisting of 255 comparators was fabricated. Comparator offsets from 13 different die (3315 comparators) were measured and calibrated using the methodology described above.

Figure 4.4 shows the histograms before and after SES has been applied to the comparators. The top plot shows the offset histogram when all 32 laid out elements are turned on (no selection). The bottom plot shows the resulting histogram after SES has been applied to find the best subset among 10,000 allowable for each comparator. Subset size is $k=16$. Close to two orders of magnitude is improvement in $\sigma_{\text {offset }}$ is observed, from 11.21 mV to 0.35 mV .


Figure 4.4: Measured Offset Histograms Before and After SES (3315 comparators)

Figure 4.5 shows the histograms for 2 X and 4 X redundancy as applied to the comparators. For 2 X redundancy, 32 elements in each comparator are divided into two blocks of 16 (the first 16, and the last 16). The block with the lower offset is selected. For 4X redundancy, 32 elements are divided into four blocks of 8 elements, and the lowest offset combination among the 4 is selected. Although improvement is observed compared to select all (32/32), redundancy lags significantly behind SES in terms of performance. Redundancy results are collected from a smaller test sample of 5 die ( 1275 comparators).


Figure 4.5: Measured Offset Histograms for Redundancy (1275 comparators)

Success probability, defined as the number of comparators that have less than $\pm 0.5 L S B$ offset, is $15 \%$ for "select all" (32/32). Success probability for 2 X and 4X redundancy is $25 \%$ and $28 \%$, respectively. SES gets $99.5 \%$ success for select 16 over 32 as expected by modeling.

Fig. 4.6 shows the SES success probability and $95 \%$ confidence intervals as $N$ is varied from 16 to 32 . $k=N / 2$ for each $N$. Number of tested comparators is 3315 . Success probability increases almost monotonically with increasing $N$, and above $98 \%$ success is observed for all cases.


Figure 4.6: Measured Success Probability for SES, $N=16$ to 32 (3315 Comparators)

Fig. 4.7 shows standard deviation of the minimum measured offset data from one die (255 comparators) for $N=14$ and $1 \leq k \leq 14$. In this test, two branches in Fig. 3.4 form one selectable element: Branches controlled by Sel $<1: 2>$ form element ${ }_{1}$, Sel $<3: 4>$ form element $_{2}$, and so on. Branches $S$ el $<27: 28>$ form the last selectable element, element ${ }_{14}$, and the last four branches are always turned off. For each $k$ value, all the subsets of size $k$ (total $\binom{N}{k}$ subsets) are loaded consecutively to the chip and the offset is measured. The subset with the minimum offset among $\binom{N}{k}$ is chosen and its value noted for each of the 255 comparators. The plot shows the standard deviation of the minimum offset from the 255 points for each $k$ value. The measured results validate the overall type of expected curve on Fig. 3.5 with a minimum occuring at $k=4$.


Figure 4.7: Measured Offset Standard Deviation Contour, $N=14$ (255 comparators)

### 4.2 6-Bit Flash ADC in 45nm SOI CMOS

### 4.2.1 Test Chip Architecture

The architecture of the 6 -bit ADC closely follows the 8 -bit design shown on Fig. 4.2. The major differences are the number of comparators (68 comparators instead of 255) and the lack of $V_{\text {ref }}$ signals in the 6 -bit design. Total active die size, excluding decoupling, is $0.2 \mathrm{~mm} \times 0.2 \mathrm{~mm}$. Fig. 4.8 shows a die photo of the fabricated ADC . In addition to the coarse knobs, only 6 of the 12 identical elements are allowed to be chosen.


Figure 4.8: 6-bit ADC Die Photo

A basic layout diagram of the ADC is given on Fig. 4.9. Comparators are split into two equal halves and the input and clock signals are routed in the center using an H -tree routing in top metal layers. Direction of the Scan In and Scan Clk signals are shown in the diagram. Each comparator block consists of:

- SES based comparator on Fig. 3.12. Standard floating body transistors in SOI technology are used. Input NMOS transistors (M1-M2) are near minimum size and have higher threshold than the latch, precharge and clock transistors (M3-M14) to increase their effect on the input offset.
- An SR-latch following the comparator.
- An AND gate following the SR-latch to disable the comparator output (5 unused comparators are turned off).
- 15 D-type flip flops forming a section of the select scan chain in Fig. 4.2: 12 for branch selection, 2 for coarse tuning knobs, and 1 for disabling comparator output.
- 2 scan flip-flops to store the outputs of the comparator (output flip-flops in Fig. 4.2).


Figure 4.9: Layout Diagram of the 6 Bit ADC

Comparator outputs are added using a Wallace Tree adder, followed by a ripple carry adder. A subsampler follows the final 6-bit output because the I/O pad speed is limited
to approximately 250 MHz . A simple schematic of the subsampler (digital clock divider) is given in Fig. 4.10, where a division ratio of 2,4 or 8 can be used to subsample the outputs. Full speed output can also be used to output from the pads.


Figure 4.10: Digital Clock Divider (Subsampler)

As process scaling continues, restricted layout rules become more common in CMOS circuits to improve the printing of critical dimensions. Dense memory circuits, such as SRAMs, have become one of the first to adopt regular layouts. Fig. 4.11 shows die images for different process nodes for Intel Corporation [6]. Single poly orientation has been introduced while moving from 90 nm to 65 nm , and fixed poly pitch for logic has been introduced at 45 nm . Logic layout rules have become tighter and started to resemble SRAMs at 45 nm . Regularity was also introduced in less critical layers such as Metal1. For future nodes, analog circuits on the same CMOS die will also be required to follow similar restricted rules to control systematic variability.


(b) 65 nm SRAM

(c) 45 nm SRAM

(d) 45 nm Logic

Figure 4.11: Process Scaling for Intel Corporation [6]

SES allowed us to build an almost digital ADC with near minimum size devices. No resistors or capacitors were used, except for decoupling. Fig. 4.12 shows a section (two selectable elements) of the extremely regular layout style used in the comparator. The layout fabric constrains the active (RX), poly (PC) and Metal1 through Metal3 (M1 to M3) layer widths and spacings. The same layout fabric was used for SRAM memory and digital standard cells in the same 45 nm run by colleagues at CMU. The design is compatible with future digital CMOS processes where tight layout design rules are expected for extreme regularity.


Figure 4.12: Section of Regular Comparator Layout for 6-bit ADC

### 4.2.2 Test Setup

Test setup for the self-referenced design is similar to the comparator array test chip and a diagram is given on Fig. 4.13. An Agilent N5181A source is used for analog input. The analog input is converted to differential mode by using $180^{\circ}$ phase shift splitter(Mini Circuits ZFSCJ-2-1-S). Bias-T's are used to add the common mode voltage ( $V_{c m}$ ) and the inputs are terminated on the board using $50 \Omega$ resistors. Agilent 81134A pulse pattern generator is used as the clock source and the digital outputs are measured using the logic analyzer. The clock source and the analog input source are connected to the same reference 10 MHz frequency (not shown).

During calibration, the Keithley 2400 sources are swept around $V_{c m}$ to find the input offset voltage of the comparators while the analog inputs (N5181A) are set to 0 . During dynamic testing, Keithley sources are set to $V_{c m}$ as the analog input is applied. The logic analyzer is triggered by the clock source (81134A) to capture the 6-bit digital output from the chip. The trigger frequency is adjusted according to the subsampling ratio used in testing.

The chip is packaged in a QFN 44 pin package but not soldered to the test board. A high speed compression mount test socket from Emulation Technologies (ET2300C) is used. Inputs are delivered via SMA connectors.

### 4.2.3 Measurement Results

Differential and integral nonlinearity (DNL and INL) plots for the ADC are shown in Fig. 4.14. There are no missing codes due to the inherent architecture of the ADC. All the measurements are performed at $V_{d d}=0.85 \mathrm{~V}$ and $V_{c m}=0.45 \mathrm{~V}$. Full scale range $\left(V_{F S}\right)$ is 300 mV peak to peak.

Fig. 4.15 shows the FFT magnitude plot of the ADC output for $f_{S}=500 \mathrm{MS} / \mathrm{s}$ and $f_{\text {in }}=243.1 \mathrm{MHz}$. The output is subsampled on the die with a ratio of 4 , and the input signal folds into $\approx 7 M H z$. The highest harmonic $\left(3^{r d}\right)$ is $35 d B$ below the carrier.

Fig. 4.16 shows the Spurious Free Dynamic Range (SFDR), Signal to Noise and Dis-


Figure 4.13: Test Setup for 6-bit ADC
tortion Ratio (SINAD) and Signal to Noise Ratio (SNR) plots of the ADC output for $f_{S}=500 \mathrm{MS} / \mathrm{s}$. The ADC achieves a maximum SINAD of $29.3 d B$, yielding an ENOB of 4.6 bits. SFDR is significantly higher, remaining above $35 d B$ throughout the input range. Static power consumption is 0.085 mW , and total power consumption at $f_{\text {in }}=243 \mathrm{MHz}$ is 1.9 mW . Using $F o M=\frac{\text { Power }}{2^{E N O B} \times f_{S}}$, a figure of merit of $157 \mathrm{fJ} /$ step is achieved.

A MATLAB simulation using the DNL and INL data in Fig. 4.14 yields an ENOB of 5.3bits. Loss of another 0.7 bits in measurement can be attributed to other noise sources in the ADC including comparator noise, power supply noise and clock jitter. The main component among these is thought to be the power supply noise. In order to measure the sensitivity of input offset voltage $\left(V_{o s}\right)$ to $V_{d d}, 100$ Monte Carlo Spectre simulations are performed for $V_{d d}=0.85 \mathrm{~V}$ and the input offset voltage is noted for each run. The simulation is repeated for $V_{d d}=0.84 \mathrm{~V}$ and $V_{d d}=0.86 \mathrm{~V}$ with the same seed to the random number generator. The


Figure 4.14: DNL and INL plot for the ADC
Table 4.1: Mean difference of input offset voltage with respect to $V_{d d}$

|  | $V_{d d}: 0.85 \mathrm{~V} \rightarrow 0.84 \mathrm{~V}$ | $V_{d d}: 0.85 \mathrm{~V} \rightarrow 0.86 \mathrm{~V}$ |
| :---: | :---: | :---: |
| Both coarse knobs OFF | $<0.5 \mathrm{mV}$ | $<0.5 \mathrm{mV}$ |
| Positive coarse knob ON | $-3.7 m \mathrm{~V}$ | 3.9 mV |
| Negative coarse knob ON | $3.9 m \mathrm{~V}$ | -3.9 mV |

difference in the offsets for each run is calculated, and means of the difference in offsets are noted in Table 4.1. Results are shown when both of the coarse knobs are turned off, and when either one is turned on. When the coarse knob to introduce positive shift is turned on, 10 mV decrease in $V_{d d}$ results in a mean decrease of 3.7 mV in offsets (i.e., the amount of introduced systematic shift decreases, moving the mean of the distribution closer to zero). Similarly, 10 mV increase in $V_{d d}$ increases the introduced systematic offset by 3.9 mV (offsets move further away from zero). Thus, a peak-to-peak change of close to 8 mV of offsets is observed when $V_{d d}$ changes by 20 mV . Given that $L S B \approx 5 \mathrm{mV}$, significant degradation in ENOB can be expected. Measured offsets, as $V_{d d}$ is changed from $0.84 V$ to 0.86 V , closely track the values in Table 4.1.

Fig. 4.17 shows the FFT plot of the ADC output at $f_{S}=1000 M S / s$ and $f_{\text {in }}=$ 497.1 M Hz. Fig. 4.18 shows the dynamic test results for $f_{S}=1000 \mathrm{MS} / \mathrm{s}$. Low frequency SINAD and SFDR are $28.5 d B$ and $38 d B$, respectively. Power at $f_{\text {in }}=243 M H z$ is 3.5 mW , and $F o M=160 \mathrm{fJ} /$ step.


Figure 4.15: FFT Magnitude Plot for $f_{S}=500 \mathrm{MS} / \mathrm{s}$ and $f_{\text {in }}=243.1 \mathrm{MHz}$


Figure 4.16: Dynamic Test Results for $f_{S}=500 \mathrm{MS} / \mathrm{s}$


Figure 4.17: FFT Magnitude Plot for $f_{S}=1000 \mathrm{MS} / \mathrm{s}$ and $f_{\text {in }}=497.1 \mathrm{MHz}$


Figure 4.18: Dynamic Test Results for $f_{S}=1000 M S / s$

### 4.3 Summary

In this chapter, measurement results from two separate CMOS designs were presented. Measurement results clearly demonstrate the feasibility of SES.

The first design was manufactured in 65 nm bulk CMOS and consists of a comparator array intended for an 8-bit flash ADC with a traditional reference ladder architecture. Data showed that with sizing, only $15 \%$ of the comparators met the desired offset specification $( \pm 0.5 L S B)$. Redundancy improved this to $28 \%$ when 4 redundant blocks were used, while keeping the total area equivalent to sizing. SES achieved better than $99.5 \%$ success probability with the same die area. For redundancy and SES, area required for the storage bits have not been considered since this is dependent on the chosen storage technology.

The second design is a self-referenced 6-bit flash ADC that was manufactured in 45 nm SOI CMOS. The comparators were laid out in a restricted pattern to be compatible with the tight design rules expected in future CMOS processes. The ADC achieved $1 G S / s$ sampling rate with up to $28.5 d B$ SINAD and $38.5 d B$ SFDR. Power consumption was almost fully dynamic at $3.5 m W\left(f_{\text {in }}=243 \mathrm{MHz}\right)$, resulting in an FoM of $160 \mathrm{fJ} /$ step. High sensitivity of the comparator to power supply noise, especially when systematic offset knobs were turned on, caused degradation in SINAD.

## Chapter 5

## Conclusions and Future Work

Process variations in advanced CMOS process nodes limit the benefits of scaling for analog designs. In the presence of increasing random intra-die variations, mismatch becomes a significant design challenge. Many critical analog circuits, such as comparators in flash ADCs, are susceptible to mismatches that cause undesired input offsets.

In this dissertation, the basics of the Statistical Element Selection methodology were introduced in detail. SES is a post-manufacturing calibration step that relies on selecting a subset of identically laid-out, small elements to achieve a desired specification, such as input offset voltage. Its application to three different circuits were described. Simulation and measurement results were presented.

The first circuit is a comparator array in 65 nm bulk CMOS intended to be used in a traditional reference ladder based flash ADC architecture. Steps in the SES methodology were described and a decision cube was built to aid in the design of the ADC. Measurement results showed that only $15 \%$ of the comparators met the desired offset specification $( \pm 0.5 L S B)$ with sizing. Redundancy improved this to $28 \%$ when 4 redundant blocks were used, while keeping the total area equivalent to sizing. SES achieved better than $99.5 \%$ success probability with the same die area. For redundancy and SES, area required for the storage bits were not considered since this is dependent on the chosen storage technology.

An 8-bit ADC was designed by a colleague using this methodology and achieved $1.5 \mathrm{GS} / \mathrm{s}$ sampling rate with an FoM of $0.42 p J /$ step [22].

SES also allowed us to benefit from the random variations to build a 6 -bit, self-referenced flash ADC where input offsets were calibrated to the desired reference levels. Design steps to estimate the product yield, determine the number of selectable elements and the element size were described. The comparators were laid out in a restricted pattern to be compatible with the tight design rules expected in future CMOS processes. The ADC was manufactured in 45 nm SOI CMOS and achieved $1 G S / s$ sampling rate with up to $28.5 d B$ SINAD and $38.5 d B$ SFDR. Power consumption was almost fully dynamic at $3.5 \mathrm{~mW}\left(f_{\text {in }}=243 \mathrm{MHz}\right)$, resulting in an FoM of $160 \mathrm{fJ} /$ step. High sensitivity of the comparator to power supply noise, especially when systematic offset knobs were turned on, caused degradation in SINAD.

This work can benefit from a better comparator design with less sensitivity to the power supply noise (PSN). Coarse tuning knobs can be replaced with MOS capacitors with select switches, similar to [2]. The test socket can be eliminated and the package directly soldered to the printed circuit board to reduce parasitic inductance and PSN. Assuming the PSN issue is resolved, the architecture has the potential to be scaled to higher sampling frequencies. Since power consumption is almost fully dynamic, similar FoM results can be expected.

6 -bit ADC measurement data was collected from only one die. Measurement results from other die can yield valuable information about the average ENOB that can be expected using SES. A future version of this chip can integrate calibration circuitry on the die to speed up the calibration process.

As a further extension of the concept, a filter consisting of an array of MEMS resonators was described. SES was used to find the number of required resonators to achieve a given product yield. Circuit simulations were performed to measure the MEMS filter response with and without the application of SES. Results showed clear improvement of the filter response, both in magnitude and phase, when SES was used.

Simulations were performed on estimated variation models for the resonators. Measure-
ment results from a test run to gather statistical data about resonators are crucial. This will help guide further research about the feasibility of SES for MEMS resonators. Other resonator styles with possibly higher gain and smaller area can also be investigated. A complete RF system can be built and the results can be compared to the expected simulation results to demonstrate the feasibility of the concept in silicon.

## Bibliography

[1] X. Li, B. Taylor, Y. Chien, and L. T. Pileggi, "Adaptive post-silicon tuning for analog circuits: Concept, analysis and optimization," in IEEE/ACM International Conference on Computer-Aided Design, Nov. 2007, pp. 450-457.
[2] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. V. der Plas, "A 2.2 mW 1.75 GS/s 5 Bit Folding Flash ADC in 90 nm Digital CMOS," Solid-State Circuits, IEEE Journal of, vol. 44, no. 3, pp. 874-882, March 2009.
[3] D. C. Daly and A. P. Chandrakasan, "A 6-bit, 0.2 V to 0.9 V Highly Digital Flash ADC With Comparator Redundancy," Solid-State Circuits, IEEE Journal of, vol. 44, pp. 3030-3038, November 2009.
[4] G. V. der Plas, S. Decoutere, and S. Donnay, "A 0.16pJ/Conversion-Step 2.5mW $1.25 \mathrm{GS} / \mathrm{s} 4 \mathrm{~b}$ ADC in a 90 nm Digital CMOS Process," in IEEE International SolidState Circuits Conference - Digest of Technical Papers, Feb. 2006, p. 2310.
[5] S. Weaver, B. Hershberg, D. Knierim, and U. Moon, "A 6b Stochastic Flash Analog-toDigital Converter Without Calibration or Reference Ladder," in IEEE Asian Solid-State Circuits Conference, Nov. 2008, pp. 373-376.
[6] C. Webb, "45nm Design For Manufacturing," Intel Technology Journal, vol. 12, pp. 121-130, June 2008.
[7] K. Agarwal and S. Nassif, "Characterizing Process Variation in Nanometer CMOS," in ACM/IEEE Design Automation Conference, June 2007, pp. 396-399.
[8] P. Gupta and F.-L. Heng, "Toward a Systematic-Variation Aware Timing Methodology," in Design Automation Conference, June 2004, pp. 321-326.
[9] G. Scott, J. Lutze, M. Rubin, F. Nouri, and M. Manley, "NMOS Drive Current Reduction Caused by Transistor Layout and Trench Isolation Induced Stress," in IEEE International Electron Devices Meeting, December 1999, pp. 827-830.
[10] H. Fukutome, Y. Momiyama, A. Satoh, Y. Tamura, H. Minakata, K. Okabe, E. Mutoh, K. Suzuki, A. Usujima, H. Arimoto, and S. Satoh, "Carrier Profile Designing to Suppress Systematic $V_{t h}$ Variation Related with Device Layout by Controlling STI-enhanced Dopant Diffusions Correlated with Point Defects," in IEEE International Electron Devices Meeting, December 2009, pp. 53-56.
[11] T. Jhaveri, V. Rovner, L. Liebmann, L. Pileggi, A. J. Strojwas, and J. D. Hibbeler, "CoOptimization of Circuits, Layout and Lithography for Predictive Technology Scaling Beyond Gratings," IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 29, pp. 509-527, April 2010.
[12] J. T. Watt and J. D. Plummer, "Dispersion of MOS Capacitance-Voltage Characteristics Resulting from the Random Channel Dopant Ion Distribution," IEEE Transactions on Electron Devices, vol. 41, pp. 2222-2232, November 1994.
[13] N. Sugii, R. Tsuchiya, T. Ishigaki, Y. Morita, H. Yoshimoto, and S. Kimura, "Local $V_{t h}$ Variability and Scalability in Silicon-on-Thin-BOX (SOTB) CMOS With Small Random-Dopant Fluctuation," IEEE Transactions on Electron Devices, vol. 57, pp. 835-845, April 2010.
[14] T.Matsukawa, S.O'uchi, K.Endo, Y.Ishikawa, H.Yamauchi, Y.X.Liu, J.Tsukada,
K.Sakamoto, and M.Masahara, "Comprehensive Analysis of Variability Sources of FinFET Characteristics," in Symposium on VLSI Technology, June 2009, pp. 118-119.
[15] M.-H. Chiang, J.-N. Lin, K. Kim, and C.-T. Chuang, "Random Dopant Fluctuation in Limited-Width FinFET Technologies," IEEE Transactions on Electron Devices, vol. 54, pp. 2055-2060, August 2007.
[16] Y. Li, C.-H. Hwang, T.-Y. Li, and M.-H. Han, "Process-Variation Effect, MetalGate Work-Function Fluctuation, and Random-Dopant Fluctuation in Emerging CMOS Technologies," IEEE Transactions on Electron Devices, vol. 57, pp. 437-447, February 2010.
[17] K. Cheng, A. Khakifirooz, P. Kulkarni, S. Ponoth, J. Kuss, D. Shahrjerdi, L. F. Edge, A. Kimball, S. Kanakasabapathy, K. Xiu, S. Schmitz, A. Reznicek, T. Adam, H. He, N. Loubet, S. Holmes, S. Mehta, D. Yang, A. Upham, S.-C. Seo, J. L. Herman, R. Johnson, Y. Zhu, P. Jamison, B. S. Haran, Z. Zhu, L. H. Vanamurth, S. Fan, D. Horak, H. Bu, P. J. Oldiges, D. K. Sadana, P. Kozlowski, D. McHerron, J. O’Neill, and B. Doris, "Extremely thin SOI (ETSOI) CMOS with record low variability for low power system-on-chip applications," in IEEE International Electron Devices Meeting, December 2009, pp. 49-52.
[18] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," Solid-State Circuits, IEEE Journal of, vol. 24, pp. 1433-1439, October 1989.
[19] C.-Y. Chen, M. Q. Le, and K. Y. Kim, "A Low Power 6-bit Flash ADC With Reference Voltage and Common-Mode Calibration," Solid-State Circuits, IEEE Journal of, vol. 44, no. 4, pp. 1041-1046, April 2009.
[20] C. Donovan and M. Flynn, "A "Digital" 6-bit ADC in $0.25-\mu$ m CMOS," Solid-State Circuits, IEEE Journal of, vol. 37, no. 3, pp. 432-437, March 2002.
[21] S. Park, Y. Palaskas, and M. Flynn, "A 4-gs/s 4-bit flash adc in 0.18- $\mu \mathrm{m}$ cmos," Solid-State Circuits, IEEE Journal of, vol. 42, no. 9, pp. 1865-1872, September 2007.
[22] J. Proesel, "Flash analog-to-digital converter design based on statistical post-silicon calibration," Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, May 2010.
[23] P. R. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," SolidState Circuits, IEEE Journal of, vol. 40, no. 6, pp. 1212-1224, June 2005.
[24] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," Solid-State Circuits, IEEE Journal of, vol. 39, no. 7, pp. 1148-1158, July 2004.
[25] S. Weaver, B. Hershberg, P. Kurahashi, D. Knierim, and U.-K. Moon, "Stochastic flash analog-to-digital conversion," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. PP.
[26] A. Ismail and M. Elmasry, "A 6-Bit 1.6-GS/s Low-Power Wideband Flash ADC Converter in $0.13-\mu \mathrm{m}$ CMOS Technology," Solid-State Circuits, IEEE Journal of, vol. 43, no. 9, pp. 1982-1990, September 2008.
[27] E. Alpman, H. Lakdawala, L. R. Carley, and K. Soumyanath, "A 1.1V 50mW 2.5GS/s Time-Interleaved C-2C SAR ADC in 45nm LP Digital CMOS," in IEEE International Solid-State Circuits Conference - Digest of Technical Papers, Feb. 2009, pp. 76-77.
[28] B. Razavi, Design of Analog CMOS Integrated Circuits. Tata McGraw-Hill, 2001.
[29] T. Sundstrom and A. Alvandpour, "Utilizing Process Variations for Reference Generation in a Flash ADC," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, pp. 364-368, May 2009.
[30] C. T.-C. Nguyen, "MEMS Technology for Timing and Frequency Control," IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 54, pp. 251-270, February 2007.
[31] D. Weinstein, S. Bhave, M. Tada, S. Mitarai, and S. Morita, "Mechanical Coupling of 2D Resonator Arrays for MEMS Filter Applications," in IEEE International Frequency Control Symposium, May 2007, pp. 1362-1365.
[32] P. Stephanou, G. Piazza, C. White, M. Wijesundara, and A. Pisano, "Mechanically Coupled Contour Mode Piezoelectric Aluminum Nitride MEMS Filters," in IEEE International Conference on Micro Electro Mechanical Systems, April 2009, pp. 58-63.
[33] K. Wang and C. T.-C. Nguyen, "High-Order Medium Frequency Micromechanical Electronic Filters," Journal of Microelectromechanical Systems, vol. 8, pp. 534-557, December 1999.
[34] Y. Lin, W.-C. Li, B. Kim, Y.-W. Lin, Z. Ren, and C. T.-C. Nguyen, "Enhancement of Micromechanical Resonator Manufacturing Precision Via Mechanically-Coupled Arraying," in IEEE International Frequency Control Symposium, April 2009, pp. 58-63.
[35] C.-C. Lo, F. Chen, and G. Fedder, "lntegrated HF CMOS-MEMS Square-Frame Resonators with On-Chip Electronics and Electrothermal Narrow Gap Mechanism," in International Conference on Solid-State Sensors, Actuators and Microsystems-Digest of Technical Papers, June 2005, pp. 2074-2077.
[36] J. E. Vandemeer, "Nodal design of actuators and sensors (nodas)," Master's thesis, Carnegie Mellon University, Pittsburgh, PA, May 1998.
[37] G. K. Fedder, personal communication, 2010.
[38] V. Giannini, P. Nuzzo, C. Soens, K. Vengattaramane, J. Ryckaert, M. Goffioul, B. Debaillie, J. Borremans, J. V. Driessche, J. Craninckx, and M. Ingels, "A 2-mm² 0.1-5

GHz Software-Defined Radio Receiver in 45-nm Digital CMOS," Solid-State Circuits, IEEE Journal of, vol. 44, pp. 3486-3498, December 2009.
[39] L. Pileggi, G. Keskin, X. Li, K. Mai, and J. Proesel, "Mismatch analysis and statistical design at 65 nm and below," September 2008, pp. $9-12$.

