file.pdf (1006.51 kB)
Systolic algorithms for the CMU Warp processor
journal contributionposted on 1980-01-01, 00:00 authored by H. T. Kung, Carnegie Mellon University.Design Research Center.
Abstract: "CMU is building a 32-bit floating-point systolic array that can efficiently perform many essential computations in signal processing like the FFT and convolution. This is a one-dimensional systolic array that in general takes inputs from one end cell and produces outputs at the other end, with data and control all flowing in one direction. We call this particular systolic array the Warp processor, suggesting that it can perform various transformations at a very high speed. We expect to have wide applications for the Warp processor, especially for the CMU prototype which has high degrees of flexibility at the expense of a relatively high chip count for each cell.The prototype has 10 cells, each of which is capable of performing 10 million floating-point operations per second (10 MFLOPS) and is build on a single board using only off-the-shelf components. This 10-cell processor for example can process 1024-point complex FFTs at a rate of one FFT every 600 [mu]s. Under program control, the same processor can perform many other primitive computations in signal, image and vision processing, including two-dimensional convolution and complex matrix multiplication, at a rate of 100 MFLOPS. Together with another processor capable of performing divisions and square roots, the processor can also efficiently carry out a number of difficult matrix operations such as solving covariant linear systems, a crucial computation in real-time adaptive signal processing. This paper outlines the architecture of the Warp processor and describes how the signal processing tasks are implemented on the processor."