Carnegie Mellon University
Browse
Atli_cmu_0041E_11194.pdf (25.9 MB)

Soft Error Analysis and Design Space Exploration of Radiation-Hardened System-on-a-Chip Platforms

Download (25.9 MB)
thesis
posted on 2024-05-31, 19:28 authored by Ahmet Oguz AtliAhmet Oguz Atli

 System-on-chips (SoCs) with specialized accelerators have become prevalent as compute-intensive applications such as deep neural networks (DNNs) have become more widespread. Such SoCs with DNN accelerators are getting deployed more and more in security-critical fields like autonomous driving and space exploration, which makes the reliability of such platforms very important. Particle radiation present in Earth’s atmosphere and space poses a reliability threat to safety-critical applications, as the free charge collected by the storage nodes of an integrated circuit (IC) can flip the stored bits and cause system-level failure. IC designers, especially ones working for the compute requirements of space missions, employ structured redundancy techniques such as error correcting codes (ECC) and double/triple modular redundancy (DMR/TMR) to detect and correct the soft errors in their designs. However, such methods have high power, performance and area (PPA) overheads, and require long RTL design and verifi?cation cycles. Due to the manual RTL edits necessary to implement each hardening method and the long and slow fault injection (FI) experiments required to assess the reliability of the final system, designers often can’t iterate on their design sufficiently many times to minimize the incurred PPA costs. 

In this work, we introduce RadTool, a toolset that aids the designers with architecting complex radiation-hardened designs such as SoCs with multiple clock domains and accelerators. RadTool contains FIERA, an FPGA-accelerated soft error injection tool, and RIERA, an automated RTL redundancy insertion tool. FIERA makes use of FPGA-based emulation for error injection, so it is 4-5 orders of magnitude faster than RTL simulation-based approaches. FIERA instruments the FIRRTL representation of the input RTL and employs stop-clock fault injection, which makes the platform flexible and extensible, and enables 47% more efficient LUT mapping of the instrumentation logic compared to previous work. RIERA similarly operates on FIRRTL and can insert radiation hardening methods such as DMR, TMR, and scrubber-backed ECC to any design hierarchy with the desired level of granularity. The Bayesian optimization engine in RIERA can also search through different radiation-hardening methods and come up with Pareto-optimal designs in terms of their radiation-hardness and PPA overhead. The combination of both these tools allows designers to rapidly prototype different radiation-hardened designs and evaluate each of them with accelerated FI experiments, greatly tightening their design and verification loop.  

History

Date

2024-05-10

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Ken Mai

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC