Nguyen_cmu_0041E_10590.pdf (2.41 MB)
Download file

Dynamically Managing FPGAs for Efficient Computing

Download (2.41 MB)
thesis
posted on 04.12.2020, 20:48 by Marie NguyenMarie Nguyen
Field-programmable gate arrays (FPGAs) have undergone a dramatic transformation from a logic technology to a computing technology. This transformation is pulled by the computing industry's need for more power/energy efficient than software can achieve and, at the same time, more flexability than ASICs. Nonetheless, FPGA designers still share a similar design methodology with ASIC designers. Most notably, at design time, FPGA designers commit to a fixed allocation of logic resources to modules in a design. In
other words, FPGAs are mostly still used like an \ASIC" despite being runtime reprogrammable. Through partial reconfiguration (PR), parts of an FPGA design can be reconfigured at runtime while the remainder continues to operate without disruption. PR enables what has been possible on general-purpose processors for decades. For instance, multiple tasks can be time-multiplexed on a smaller FPGA, which can result in area/device cost, power and energy reduction, compared to statically mapping tasks on a larger FPGA. PR can become a relevant technology for an emerging class of AI-driven applications that (1) need to support many compute intensive tasks with real-time requirements and (2) are often deployed on a small, low-end
FPGA due to area, cost, power or energy concerns (e.g., smart cars/robots/cameras at the Edge). For such
applications, using a large expensive FPGA is typically not a viable option. Though PR is a promising technology and has been supported by FPGA tools for over a decade, it is
still a feature waiting to be proven for its commercial value. The reconfiguration time (between few to tens of milliseconds on today's FPGAs), also referred as PR time, is often considered as one of the major hurdles preventing a more widespread use of PR. While the non-trivial PR time represents a technical challenge, we believe that a more important question to address is \When, how and why should an FPGA designer consider using PR?". Addressing this question requires to (1) identify applications that can tolerate PR time and still bene?t from a PR approach, (2) design good architectural and runtime management strategies to build efficient designs leveraging PR, and (3) evaluate whether the area/device cost, power or energy benefits are important enough to justify a transition from a statically mapped design. This thesis seeks to advance the state-of-the-art in the dynamism of computing FPGAs by tackling the aforementioned challenges. Specifically, we demonstrate that a design exploiting PR can be more area/device cost, power or energy efficient than a statically mapped design (ASIC-style design) with slack. Slack occurs
when all resources occupied by an ASIC-style design are not active all the time. Using PR, a designer can attempt to reduce slack by changing the allocation of resources over time. In this work, we identify slack's reduction as the most important opportunity for improvement available to PR-style designs. We refer to a PR-style design as a design in which logic resources are allocated to different modules of one design over time using PR. We develop efficient PR allocation and execution strategies to reduce slack, and show through analytical modeling and implemented designs that a PR-style design can outperform an ASIC-style design
in challenging scenarios that have to deliver required performance under strict area, cost, power, and energy
constraints. Further, we leverage the findings and analysis from our theoretical investigation to develop a
soft-logic-realized framework for accelerating computer vision with real-time requirements (30+ fps). This
framework includes the necessary architectural and runtime management strategies to support spatial and temporal sharing of the FPGA fabric at a very ?ne-grain (i.e. the time interval between recon?figurations is within millisecond range) while meeting performance requirements. Using the framework, we design and implement efficient PR-style designs to quantify the performance, area/device cost, power and energy benefits of PR-style designs relative to ASIC-style designs and to software implementations. Notably, we show that a PR-style design can be more power and energy efficient than an ASIC-style design even when frequently reconfiguring the fabric (i.e. when more than half of the execution time is spent reconfiguring the fabric)
and under specific conditions. We also make projections on the impact of higher PR speed on the costs and benefits of using PR at a very ?ne-grain. Through our study, we ?nd that, while higher reconfiguration speed can make a PR-style more area/device cost efficient, the power/energy overhead incurred in a PR-style design due to, for instance, fabric reconfigurations and additional data movement can make a PR approach less power/energy efficient than an ASIC-style design.

History

Date

14/09/2020

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

James C. Hoe

Usage metrics

Exports