Optimizing IC Testing for Diagnosability, Effectiveness and Efficiency

2016-02-04T00:00:00Z (GMT) by Cheng Xue
Chip testing is an important step of integrated circuits (“chip”) manufacturing. It involves applying tests to each manufactured chip using expensive testers (automatic test equipment) to identify and reject bad (malfunctioning) chips. Various types of manufacturing defects (shorts, disconnects, missing vias, etc.) can occur during fabrication and cause a chip to malfunction. Testing not only needs to verify every gate, cell, interconnect, etc. are operational as expected, but also needs to help identify and analyze existing manufacturing defects so that improvements in fabrication, design and even test can be made in a timely manner.
The cost of developing high-quality tests is being driven up by the increasing complexity of defect behaviors. New processing technologies introduce new types of defects, some of which only occur in certain circuit/layout configurations. It is no longer possible to detect all types of defects using only conventional stuck-at tests. More sophisticated fault models and test metrics have been developed to guide the test development toward better defect detection, but they also require a significantly larger volume of tests to achieve acceptable coverage. Test engineers need to reduce test volume in order to save test cost (i.e., achieving high test efficiency), and at the same time prevent most bad chips from escaping test (i.e., achieving high test effectiveness). The ability to diagnose a failing chip precisely and accurately (diagnosability) also depends on the tests applied. This important characteristic of test is often downplayed in production testing, but could be very important during yield ramp-up for quickly discovering major yield-loss contributors. In this dissertation, four new methods are developed to improve the state of the art for test development, either in terms of diagnosability, test effectiveness or test efficiency. These methods can be used in conjunction, or individually for achieving a specific prioritized, goal in test development.
First, a test-reordering method is developed to improve the diagnosability of production tests. To our knowledge, this is the first-ever work that examine the impact of test order on logic diagnosis. Due to constraints such as limited test time or tester memory, a commonly-used practice during production testing is to only record the first few failing tests or pins for a failing chip. This recording of an incomplete tester response adversely affects the outcome of diagnosis because less information is provided for diagnosis. The proposed test-reordering method tries to find an optimal test ordering that can better distinguish stuck-at faults when recorded tester response is incomplete. Since the set of candidate defect sites is typically obtained based on stuck-at faults, faults that are distinguished from each other are unlikely to become the candidate defect site at the same time, which leads to better outcome for diagnosis.
Second, a fault-model evaluation method (DELAY-METER) is developed to improve test effectiveness. Various delay fault models are proposed in previous work to capture defects that escape slow-speed testing, but which models should be used to guide the generation of tests for at-speed testing remain an open question. The conventional method is to evaluate fault-model effectiveness using test experiments involving actual fabricated chips, in other words, tests are developed using various fault models and applied to a population of chips to determine which tests are best at detecting defects. Alternatively, DELAY-METER evaluates the effectiveness of delay fault models using readily-available fail data from production testing, so that an optimal mix of delay fault models can be chosen for at-speed testing.
Third, a defect-level prediction model (the DDP model) is developed to balance test effectiveness and test efficiency. Defect level (DL) represents the fraction of defective chips among all chips that pass tests. However DL is difficult to measure directly and be able to predict during test development is of critical importance. Conventional DL prediction models become insufficient when tests are generated from multiple fault models. The DDP model learns the defect detection probability (DDP) of multiple fault models from diagnosis, and combines it with the coverages of multiple fault models to provide a more accurate prediction. The more accurate prediction of DL by the DDP model thus enables a better trade-off analysis between test effectiveness and test efficiency.
Finally, a test-selection method is developed to improve test efficiency. Test time reduction (TTR) is a focus of research in test development to save test cost and improve test efficiency. One method for TTR involves identifying a subset of tests from a large baseline test set. Test selection can be performed based on actual tester data measured from tested chips, or data taken from the simulation of the circuit design that has faults/defects injected. Previous work that uses simulation for test selection are only applied to archaic benchmark circuits that are too small to be meaningful. A one-pass test-selection method is developed in this dissertation that identifies a subset of tests that maximize fault-model coverage while requiring relatively limited CPU time and memory.
To demonstrate the practical utility of the four methods developed in this dissertation, several real designs from industry are used in various experiment. Specifically, an ASIC and GPU designs and test data taken from a large population of actual fabricated chips are used to experiment the proposed test-reordering method and test-selection method. Experiments results demonstrate the improvement in diagnosability and test efficiency, respectively. The same ASIC design and test data is not only used by DELAY-METER to evaluate the effectiveness of different delay fault models, but also used by the DDP model as both a training data for building a DL-prediction model, and a verification data for verifying the prediction accuracy.