posted on 2006-07-20, 00:00authored byPriya Narasimhan, Aaron M Paulos
While distributed applications need replication for the purposes of fault-tolerance, realistic and feasible deployments cannot afford to replicate every single component within the system. Potentially, over the lifecycle of such deployments, the consistency and fault-tolerant properties might be com- promised when replicated and unreplicated components interact. We describe some of the challenges in providing end- to-end fault-tolerance under the mixed semantics. Our approach facilitates communication between the unreplicated and replicated components of a distributed client-server application, without compromising the consistency of the replicated servers and without restricting any concurrent TCP semantics that unreplicated clients expect. We describe the resulting architectural and implementation enhancements to the MEAD system and provide an empirical evaluation of our new mechanisms.