Lifetime-Aware Task Mapping for Embedded Chip Multiprocessors

2015-12-15T00:00:00Z (GMT) by Adam Hartman
The International Technology Roadmap for Semiconductors has identified reliability as a growing
challenge for users and designers of all types of integrated circuits. In particular, the occurrence
of wearout faults is expected to increase exponentially as manufacturing processes scale below
65nm. By acknowledging the importance of these faults and the resulting failures, designers can
take steps to improve the expected lifetime of the system. Several system-level techniques, such
as communication architecture design and slack allocation, are capable of mitigating the effects of
wearout faults and improving system lifetime. Task mapping optimization is another system-level
technique that can be applied at both design time and runtime to enhance system lifetime and has
several advantages over other lifetime optimization techniques.
The first advantage that task mapping has over other system-level techniques is that it is more
flexible. We show that task mapping can positively impact system lifetime in a number of scenarios
and does not rely on redundancy or complex reconfiguration mechanisms, although both of those
provide additional benefit. The second advantage of using task mapping to improve system lifetime
is a lower cost compared to other techniques. Other lifetime improvement techniques seek to augment
systems in a cost-effective way to mitigate the effects of wearout faults while task mapping
does not necessarily require additional investment in hardware to achieve similar effects. The final,
and perhaps most significant, advantage of task mapping is its ability to dynamically manage lifetime
as the system is running. While decisions made by other system- and circuit-level techniques
must be finalized before the system is manufactured, the task mapping can continue to change to
account for the actual state of the system in real time.
We propose two distinct task mapping techniques to be used at the two different times during
which optimization can occur. At design time, we take advantage of abundant computational resources to perform an intelligent search of the initial task mapping solution space using ant colony
optimization. At runtime, we leverage information from hardware sensors to quickly select good
task mappings using a meta-heuristic. Our two techniques can be used together or in isolation
depending on the use case and design requirements of the system.
This thesis makes the following intellectual contributions:
 Lifetime-aware design-time task mapping - Ours is the first approach to search for initial task
mappings that directly optimize system lifetime rather than optimizing other metrics which
only influence system lifetime, like temperature and power. Because this technique is meant
for use at design time, we employ a powerful search algorithm called ant colony optimization,
which takes advantage of a designer’s computational resources to find a near-optimal
task mapping. Our lifetime-aware design-time task mapping improves system lifetime by
an average of 32.3% compared to a lifetime-agnostic approach across a range of real-world
benchmarks.
 Lifetime-aware runtime task mapping - Ours is the first approach to dynamically manage
the lifetime of embedded chip multiprocessors at runtime through the use of task mapping.
By leveraging data from hardware sensors and information about the system state, our metaheuristic
approach is able to find high-quality task mappings which extend system lifetime
without performing a costly search of the solution space. Our lifetime-aware runtime task
mapping improves system lifetime by an average of 7.1% compared to a runtime temperatureaware
task mapping approach, and in the best case, system lifetime was improved by 17.4%.
Our approach also improved the amount of time until the first component failure by 14.6% on
average and 33.9% in the best case.
 Evaluation of lifetime-aware task mapping - We measure the improvement in system lifetime
resulting from our task mapping techniques across a range of benchmarks. We also compare
our lifetime-aware techniques to others which attempt to indirectly optimize system lifetime
to show that direct optimization is the only way to achieve maximum lifetime. For example,
we show that task mappings that are near optimal in terms of average initial component temperature
can result in a range of system lifetimes that is up to 53.2% of the optimal lifetime;
clearly, low temperature does not imply long lifetime.  Co-optimization of competing lifetime metrics - The wide range of use cases for embedded
chip multiprocessors means that different systems will have different design goals. We consider
how the pertinent measure of lifetime changes in different use cases, and analyze the
degree to which these competing lifetime metrics can be co-optimized.
 Best practices for a system lifetime simulator - We created a simulator which estimates the
lifetime of an embedded chip multiprocessor executing one or more applications. The simulator
is detailed enough to capture the effects of various system-level design techniques on
lifetime, and thus, it is valuable to the field of lifetime optimization research even outside the
context of task mapping.
In summary, lifetime optimization for embedded chip multiprocessors is required so that cuttingedge
manufacturing processes can continue to be used for a wide range of systems. Our research
mitigates the problem of increasingly common wearout faults by proposing and evaluating a pair of
design- and runtime task mapping techniques that enhance system lifetime across a broad range of
use cases.