DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

Fan, Bin; Tantisiriroj, Wittawat; Xiao, Lin; Gibson, Garth

doi:10.1184/R1/6619538.v1

file.pdf (302.41 kB)

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

journal contribution

posted on 2009-11-01, 00:00 authored by Bin Fan, Wittawat Tantisiriroj, Lin Xiao, Garth GibsonGarth Gibson

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. DiskReduce is a modification of the Hadoop distributed file system HDFS) enabling asynchronous compression of initially triplicated data down to RAID-class redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, DiskReduce can delay encoding long enough to deliver the performance benefits of multiple data copies.

History

Publisher Statement

Date

2009-11-01

Usage metrics

Keywords

Data Storage

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports