posted on 2009-03-01, 00:00authored byDavid G. Andersen
The end-to-end availability of Internet services is between two and three orders of magnitude worse
than other important engineered systems, including the US airline system, the 911 emergency response
system, and the US public telephone system. This dissertation explores three systems designed
to mask Internet failures, and, through a study of three years of data collected on a 31-site
testbed, why these failures happen and how effectively they can be masked.
A core aspect of many of the failures that interrupt end-to-end communication is that they fall
outside the expected domain of well-behaved network failures. Many traditional techniques cope
with link and router failures; as a result, the remaining failures are those caused by software and
hardware bugs, misconfiguration, malice, or the inability of current routing systems to cope with
persistent congestion. The effects of these failures are exacerbated because Internet services depend
upon the proper functioning of many components—wide-area routing, access links, the domain
name system, and the servers themselves—and a failure in any of them can prove disastrous to the
proper functioning of the service.
This dissertation describes three complementary systems to increase Internet availability in the
face of such failures. Each system builds upon the idea of an overlay network, a network created
dynamically between a group of cooperating Internet hosts. The first two systems, Resilient Overlay
Networks (RON) and Multi-homed Overlay Networks (MONET) determine whether the Internet
path between two hosts is working on an end-to-end basis. Both systems exploit the considerable
redundancy available in the underlying Internet to find failure-disjoint paths between nodes, and
forward traffic along a working path. RON is able to avoid 50% of the Internet outages that interrupt
communication between a small group of communicating nodes. MONET is more aggressive,
combining an overlay network of Web proxies with explicitly engineered redundant links to the
Internet to also mask client access link failures. Eighteen months of measurements from a six-site
deployment of MONET show that it increases a client’s ability to access working Web sites by
nearly an order of magnitude.
Where RON and MONET combat accidental failures, the Mayday system guards against denialof-
service attacks by surrounding a vulnerable Internet server with a ring of filtering routers. Mayday
then uses a set of overlay nodes to act as mediators between the service and its clients, permitting
only properly authenticated traffic to reach the server.