Carnegie Mellon University
Browse
file.pdf (152.4 kB)

Automatic Generation of the HPC Challenge’s Global FFT Benchmark for BlueGene/P

Download (152.4 kB)
journal contribution
posted on 2012-07-01, 00:00 authored by Franz Franchetti, Yevgen Voroneko, Gheorghe Almasi

We present the automatic synthesis of the HPC Challenge’s Global FFT, a large 1D FFT across a whole supercomputer system. We extend the Spiral system to synthesize specialized single-node FFT libraries that combine a data layout transformation with the actual on-node FFT computation to improve the network performance through enabling all-to-all collectives. We run our optimized Global FFT benchmark on up to 128k cores (32 racks) of ANL’s BlueGene/P “Intrepid” and achieved 6.4 Tflop/s, outperforming ANL’s 2008 HPC Challenge Class I Global FFT run (5 Tflop/s). Our code was part of IBM’s winning 2010 HPC Challenge Class II submission. Further, we show first single-thread results on BlueGene/Q.

History

Publisher Statement

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38718-0_20

Date

2012-07-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC