posted on 2009-10-01, 00:00authored byShimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
The key idea behind Inspector Joins is that during
the I/O partitioning phase of a hash-based join,
we have the opportunity to look at the actual data
itself and then use this knowledge in two ways:
(1) to create specialized indexes, specific to the
given query on the given data, for optimizing the
CPU cache performance of the subsequent join
phase of the algorithm, and (2) to decide which
join phase algorithm best suits this specific query.
We show how inspector joins, employing novel
statistics and specialized indexes, match or exceed
the performance of state-of-the-art cache-friendly
hash join algorithms. For example, when run on
eight or more processors, our experiments show
that inspector joins offer 1.11.4X speedups over
these previous algorithms, with the speedup increasing
as the number of processors increases