Carnegie Mellon University
Browse

An approach to generating customized load-store architectures

Download (2.22 MB)
thesis
posted on 2023-06-27, 17:56 authored by Guanglin XuGuanglin Xu

 Automated design generation is increasingly used in hardware accelerators to effectively handle the large trade-off space between performance and resource utilization. Typically, a generator focuses on a specialized parallel architecture that fits a particular set of algorithms. As a result, it is difficult to extend the algorithmic support of generators. In contrast, a processor-like architecture, where the datapath is connected to memory ports with all computations sequenced by a controller, can provide much better flexibility. In this dissertation, I call such an architecture the ``load-store architecture'' and present an approach to hardware design generation from high-level specifications. The approach generates customized load-store architecture designs across multiple abstraction levels for algorithm generation, loop optimization, and hardware interpretation, respectively.


The proposed approach is inspired by the Spiral code generation framework and is realized by extending Spiral. In generating algorithms for customized load-store architectures at the data flow level, I present the importance of providing sufficient independent iterations to accommodate the long latency of customized pipelines. The generated algorithms are then translated to loop programs captured in an extensible domain-specific language (DSL) for hardware-oriented loop optimizations. I identified a computational pattern of imperfect loop nests and provide optimizations for reducing execution cycle counts and decreasing memory buffer utilization as well as arithmetic counts in address calculation. Finally, the optimized loops are interpreted to register-transfer level designs in another hardware-extended DSL where local optimizations are employed.


I implemented the approach by extending the open-source Spiral system. I demonstrated the flexibility of the system by generating designs for signal transforms including Walsh-Hadamard transforms and discrete Fourier transforms, and data sorting. Experimental results support the benefit of hardware-oriented optimizations. In particular, the FFT IP cores generated with my approach are comparable to state-of-the-art designs. Despite further parallelization and hardware compilation efforts to be pursued, this dissertation has paved the way for generating competitive hardware designs with Spiral in a flexible manner.


History

Date

2023-05-05

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Franz Franchetti, James C. Hoe

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC