posted on 2007-01-01, 00:00authored byDavid G. Andersen
This thesis describes the design, implementation, and evaluation of a Resilient Overlay Network
(RON), an architecture that allows end-to-end communication across the wide-area Internet to detect
and recover from path outages and periods of degraded performance within several seconds. A
RON is an application-layer overlay on top of the existing Internet routing substrate. The overlay
nodes monitor the liveness and quality of the Internet paths among themselves, and they use this
information to decide whether to route packets directly over the Internet or by way of other RON
nodes, optimizing application-specific routing metrics.
We demonstrate the potential benefits of RON by deploying and measuring a working RON
with nodes at thirteen sites scattered widely over the Internet. Over a 71-hour sampling period
in March 2001, there were 32 significant outages lasting over thirty minutes each, between the
156 communicating pairs of RON nodes. RON’s routing mechanism was able to detect and recover
around all of them, showing that there is, in fact, physical path redundancy in the underlying Internet
in many cases. RONs are also able to improve the loss rate, latency, or throughput perceived by data
transfers; for example, about 1% of the transfers doubled their TCP throughput and 5% of our
transfers saw their loss rate reduced by 5% in absolute terms. These improvements, particularly in
the area of fault detection and recovery, demonstrate the benefits of moving some of the control over
routing into the hands of end-systems