Pigasus: Efficient Handling of Input-Dependent Streaming on FPGAs
Field-programmable gate arrays (FPGAs) have achieved well-demonstrated success in a variety of real-time streaming applications, such as signal processing, machine learning inference, and video processing. In these applications, the processing accelerated by FPGAs is regular and `input-independent,' and thus the designs' behavior and performance are fixed. Accelerating real-time `input-dependent' streaming applications on FPGAs presents many interesting new challenges. For example, network intrusion detection and prevention systems (IDS/IPS) need to identify malicious networktraffic, meaning different traffic will trigger different operations - thus activating different resource and performance bottlenecks - and making astatic, fixed-performance FPGA design infeasible. Specifically, to achieve fixed performance, the design must allocate resources for handling worst-case scenarios, evenif they happen rarely, thus losing the opportunity to use the same resources to improve common-case performance. For instance, since regular expression patterns arerarely triggered, the limited on-chip SRAM space could instead be used to make the string pattern matching component - which is exercised by almost every packet - larger and faster.
This thesis investigates novel design solutions to enable effecient handling of stream processing with input dependence in the problem context of IDS/IPS. The ?rst part of this thesis focuses on achieving high performance under resource constraints for given inputs, by accelerating common cases while handling uncommon cases efficiently at different levels of the system. Specifically, we propose three ideas:(1) FPGA-first architecture to allow the common datapath to sit entirely on FPGA fabric, while only offloading the complex, rarely triggered last pipeline stage to the CPU; (2) a fast-slow path design for TCP reassembly, where the rarely-triggered slow path can have a memory-saving data structure with non-deterministic performance without interfering with the performance of the fast path; and (3) hierarchical filters that use compact filters in front to keep up with the line-rate and thus reduce the resource consumption of the later, more expensive stages, which only need to
keep up with the hit rate of the previous filter. A key concern of the aforementioned design approach is that it is vulnerable to overfitting if the workload changes. The second part of this thesis tackles this problem, allowing for effecient and easy adaptation of the design to changing inputs at both compile time and runtime. In particular, we introduce two techniques: (1) a disaggregated architecture that enables easy scaling up, down, or out of particular components at compile time to cater to different expected traffic profiles; (2) a dynamic spillover mechanism to route the spillover tra?c to backup streaming kernels that can be brought up on demand to absorb the increase in workload at runtime.
Pigasus, the 100Gbps IPS embodying the ideas in this thesis, has been opensourced on ttps://github.com/cmu-snap/pigasus. End-to-end benchmarking with a variety of traces shows that Pigasus IPS can operate at 100Gbps using just 1 Intel Stratix 10 MX FPGA and an average of 5 cores of an Intel i9 processor, 50x more efficient than ?xed-performance designs. The disaggregated architecture shows better scalability, reusability, and portability, with negligible performance and resource overhead relative to the static design. Finally, the dynamic spillover mechanism can prevent the performance degradation or resource wastage caused by the mismatch between the compile-time prediction of the traffic pro?le and the runtime real traffic profile.
History
Date
2021-08-16Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)