Parachute Queries in the Presence of Unavailable Data Sources
Mediator systems are used today in a wide variety of unreliable environments. When processing a query, a mediator may try to access a data source which is unavailable. In this situation, existing systems either silently ignore unavailable data sources or generate an error. This behavior is inefficient in environments with a non-negligible probability that a data source is unavailable (e.g., the Internet). In the case that some data sources are unavailable, the complete answer to a query cannot be obtained; however useful work can be done with the available data sources. In this paper, we describe a novel approach to mediator query processing where, in the presence of unavailable data sources, the answer to a query is computed incrementally. It is possible to access data obtained at intermediate steps of the computation. We define two new evaluation models and analytically model for these evaluation models the probability of obtaining the answer to a query in the presence of unavailable data sources. The analysis shows that complete answers are more likely in our two evaluation models than in a classical system. We measure the performance of our evaluation models via simulations and show that, in the case that all data sources are available, the performance penalty for our approach is negligible.