Towards a More Resilient Web Infrastructure
The modern web ecosystem is complex in which multiple web infrastructure providers e.g., DNS, and CDN rely on each other for certain services giving rise to a complex mesh of dependencies e.g., Netflix uses Symantec as CA which uses Verisign for DNS. These dependencies can result in consolidation in the web when multiple websites rely on the same provider, leading to single points of failure. Moreover, the network of a web infrastructure provider itself is complex due to the presence of network functions (software or hardware boxes that perform some processing task on packets e.g., load balancer, firewall) which often become a bottleneck in the face of an attack, often resulting in an outage of the web infrastructure provider. This outage, can, in turn, affect the web ecosystem as a whole due to the dependencies among web infrastructure providers. For example, a network function bottleneck in a DNS provider's network (say DNS-A) may cause its outage, which may in turn cause an outage for the websites and service providers using DNS-A.
Keeping in mind the aforementioned risks to the resilience of the web, in this thesis, we first measure the prevalence and impact of dependencies among \web infrastructure providers. We focus on three critical infrastructure services: DNS, CDN, and certificate revocation checking by a Certificate Authority (CA). We analyze both direct (e.g., Twitter uses Dyn) and indirect (e.g., Netflix uses Symantec as its CA, which itself uses Verisign for DNS) dependencies. Moreover, dependencies may also vary across regions, as the service providers and popular websites may differ across regions. Hence, as a first step in understanding the region-specific trends in dependencies, we study dependencies in an African context and offer Africa-specific insights. Besides studying third-party service dependencies that affect web infrastructure resilience, we also propose a technique to identify network function bottlenecks within a web infrastructure provider. Identifying network function bottlenecks in a web infrastructure provider's network can help their network operators to minimize the bottlenecks.
Our work is an important first step towards establishing actionable metrics that can help web infrastructure providers in understanding their outage risk. They can use this knowledge to make informed decisions about their resilience. Consequently, this may help to mitigate the effects of large-scale incidents, improve resilience to outages, and minimize overall exposure to risk.
History
Date
2023-06-04Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)