posted on 2005-11-01, 00:00authored byEllen Spertus, Seth C. Goldstein, Klaus Erik Schauser, Thorsten von Eicken, David E. Culler, William J. Dally
This paper uses an abstract machine approach to
compare the mechanisms of two parallel machines: the
J-Machine and the CM-5. High-level parallel programs
are translated by a single optimizing compiler to a finegrained
abstract parallel machine, TAM. A final compilation
step is unique to each machine and optimizes for
specifics of the architecture. By determining the cost of the
primitives and weighting them by their dynamic frequency
in parallel programs, we quantify the effectiveness of the
followingmechanisms individuallyand in combination. Efficient
processor/network coupling proves valuable. Message
dispatch is found to be less valuable without atomic
operations that allow the scheduling levels to cooperate.
Multiple hardware contexts are of small value when the
contexts cooperate and the compiler can partition the register
set. Tagged memory provides little gain. Finally, the
performance of the overall system is strongly influenced by
the performance of the memory system and the frequency
of control operations.