posted on 2015-12-15, 00:00authored byAdam Hartman
The International Technology Roadmap for Semiconductors has identified reliability as a growing challenge for users and designers of all types of integrated circuits. In particular, the occurrence of wearout faults is expected to increase exponentially as manufacturing processes scale below 65nm. By acknowledging the importance of these faults and the resulting failures, designers can take steps to improve the expected lifetime of the system. Several system-level techniques, such as communication architecture design and slack allocation, are capable of mitigating the effects of wearout faults and improving system lifetime. Task mapping optimization is another system-level technique that can be applied at both design time and runtime to enhance system lifetime and has several advantages over other lifetime optimization techniques. The first advantage that task mapping has over other system-level techniques is that it is more flexible. We show that task mapping can positively impact system lifetime in a number of scenarios and does not rely on redundancy or complex reconfiguration mechanisms, although both of those provide additional benefit. The second advantage of using task mapping to improve system lifetime is a lower cost compared to other techniques. Other lifetime improvement techniques seek to augment systems in a cost-effective way to mitigate the effects of wearout faults while task mapping does not necessarily require additional investment in hardware to achieve similar effects. The final, and perhaps most significant, advantage of task mapping is its ability to dynamically manage lifetime as the system is running. While decisions made by other system- and circuit-level techniques must be finalized before the system is manufactured, the task mapping can continue to change to account for the actual state of the system in real time. We propose two distinct task mapping techniques to be used at the two different times during which optimization can occur. At design time, we take advantage of abundant computational resources to perform an intelligent search of the initial task mapping solution space using ant colony optimization. At runtime, we leverage information from hardware sensors to quickly select good task mappings using a meta-heuristic. Our two techniques can be used together or in isolation depending on the use case and design requirements of the system. This thesis makes the following intellectual contributions: Lifetime-aware design-time task mapping - Ours is the first approach to search for initial task mappings that directly optimize system lifetime rather than optimizing other metrics which only influence system lifetime, like temperature and power. Because this technique is meant for use at design time, we employ a powerful search algorithm called ant colony optimization, which takes advantage of a designer’s computational resources to find a near-optimal task mapping. Our lifetime-aware design-time task mapping improves system lifetime by an average of 32.3% compared to a lifetime-agnostic approach across a range of real-world benchmarks. Lifetime-aware runtime task mapping - Ours is the first approach to dynamically manage the lifetime of embedded chip multiprocessors at runtime through the use of task mapping. By leveraging data from hardware sensors and information about the system state, our metaheuristic approach is able to find high-quality task mappings which extend system lifetime without performing a costly search of the solution space. Our lifetime-aware runtime task mapping improves system lifetime by an average of 7.1% compared to a runtime temperatureaware task mapping approach, and in the best case, system lifetime was improved by 17.4%. Our approach also improved the amount of time until the first component failure by 14.6% on average and 33.9% in the best case. Evaluation of lifetime-aware task mapping - We measure the improvement in system lifetime resulting from our task mapping techniques across a range of benchmarks. We also compare our lifetime-aware techniques to others which attempt to indirectly optimize system lifetime to show that direct optimization is the only way to achieve maximum lifetime. For example, we show that task mappings that are near optimal in terms of average initial component temperature can result in a range of system lifetimes that is up to 53.2% of the optimal lifetime; clearly, low temperature does not imply long lifetime. Co-optimization of competing lifetime metrics - The wide range of use cases for embedded chip multiprocessors means that different systems will have different design goals. We consider how the pertinent measure of lifetime changes in different use cases, and analyze the degree to which these competing lifetime metrics can be co-optimized. Best practices for a system lifetime simulator - We created a simulator which estimates the lifetime of an embedded chip multiprocessor executing one or more applications. The simulator is detailed enough to capture the effects of various system-level design techniques on lifetime, and thus, it is valuable to the field of lifetime optimization research even outside the context of task mapping. In summary, lifetime optimization for embedded chip multiprocessors is required so that cuttingedge manufacturing processes can continue to be used for a wide range of systems. Our research mitigates the problem of increasingly common wearout faults by proposing and evaluating a pair of design- and runtime task mapping techniques that enhance system lifetime across a broad range of use cases.