file.pdf (2.16 MB)
Download file

AutoPart : automating schema design for large scientific databases using data partitioning

Download (2.16 MB)
journal contribution
posted on 01.07.2008, 00:00 authored by Efstratios Papadomanolakis, Anastassia Ailamaki
Abstract: "Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. To improve query execution performance, modern DBMS build indexes and materialized views on the wide tables that store experimental data. The replication of data in indexes and views, however, implies large amounts of additional storage space, and incurs high update costs as new experiments add or change large volumes of data. In this paper we explore automatic data partitioning as a tool to redesign the relational tables in a database for faster sequential access before creating indexes and views. We present AutoPart, a vertical partitioning tool that uses the optimizer's hints to determine optimal partitions for a given workload. According to our experiments, the schemas recommended by AutoPart (a) improve query execution by 12%-44% when compared to the original schema without indexes, (b) execute queries up to 37% faster and updates almost twice as fast compared to the original after indexing. Furthermore, they require half the space for indexes. Finally, we show that a form of categorical partitioning can further improve query performance up to 29%-62% without any indexes and up to 54% with indexes, while update performance improves up to 60%. AutoPart can be used with any commercial database system. Our experimental results are based on the Sloan Digital Sky Survey (SDSS) database, a real-world astronomical database, running on Microsoft's SQL Server 2000."


Publisher Statement

© ACM, 2008. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 25th international Conference on Machine Learning (Helsinki, Finland, July 05 - 09, 2008). ICML '08, vol. 307. ACM, New York, NY, 248-255



Usage metrics