posted on 1973-01-01, 00:00authored byErik Riedel, Christos Faloutsos, Gregory R Ganger, David Nagle
This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of
high-level functions to operate directly at individual disk drives. We show that such a scheme makes
it possible to support a Data Mining workload on an OLTP system almost for free: there is only a
small impact on the throughput and response time of the existing workload. Specifically, we show
that an OLTP system has the disk resources to provide a consistent one third of its sequential bandwidth
to a background Data Mining task with close to zero impact on OLTP throughput and
response time at high transaction loads. At low transaction loads, we show much lower impact than
observed in previous work. This means that a production OLTP system can be used for Data Mining
tasks without the expense of a second dedicated system. Our scheme takes advantage of close
interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk
head “passes over” them while satisfying demand blocks from the OLTP request stream. We show
that this scheme provides a consistent level of throughput for the background workload even at very
high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment
that allows the background Data Mining application to also take advantage of the processing
power and memory available directly on the disk drives.