Kim_CMU-CS-21-143.pdf (4.34 MB)
Download file

Towards Elastic and Resilient In-Network Computing

Download (4.34 MB)
thesis
posted on 04.05.2022, 18:58 authored by Daehyeok KimDaehyeok Kim

Recent advances in programmable networking hardware technology such as programmable switches and smart network interface cards create a new computing

paradigm called in-network computing. This new paradigm allows functionality that has been served by servers or proprietary hardware devices, ranging from network

middleboxes to components of distributed systems, to now be performed in the network. The demand for higher performance and the commercial availability of programmable hardware have driven the popularity of in-network computing. While many recent efforts have demonstrated the performance benefit of innetwork

computing, we observe a significant gap between what it offers today and evolving application demands. In particular, we argue that in-network computing lacks resource elasticity and fault resiliency which are essential building blocks

for practical computing platforms, limiting its potential. Elasticity can address the shortcoming that today’s in-network computing only supports a simple deployment

model where a single application runs on a single device equipped with fixed and limited resources. Similarly, fault resiliency is critical for managing prevalent device

failures for the correctness and performance of applications, but it has gained little attention. Although resource elasticity and fault resiliency have been extensively studied for traditional CPU server-based computing, we find that enabling them on programmable networking devices is challenging, especially due to their low-level abstractions, hardware constraints, heterogeneity, and workload characteristics. In this thesis, we argue that by designing high-level abstractions and runtime environments that help leverage compute and memory resources available outside

of one type of device, we can make in-network computing more elastic and resilient without any hardware modifications. This concept, which we call device resource

augmentation, is a key enabler for resource elasticity and fault resiliency for stateful in-network applications written for programmable switches. In particular, we design three systems, named TEA, ExoPlane, and RedPlane, that use this concept to support elastic memory and elastic compute/memory, and fault resiliency, respectively. Each

of these systems consists of a key abstraction, programming APIs, and a runtime environment. We demonstrate their feasibility and effectiveness with prototype implementations and evaluations using various in-network applications. Putting all the pieces together, developers can easily enable resource elasticity and fault resiliency for their applications without worrying about underlying complexities.

History

Date

23/11/2021

Degree Type

Dissertation

Department

Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Vyas Sekar Srinivasan Seshan