A Comparison of Parallel Sorting Algorithms on Different Architectures
NANCY M. A MATO
R AVISHANKAR I YER
S HARAD S UNDARESAN
YAN W U
Department of Computer Science, Texas A&M University, College Station, TX 77843-3112
Technical Report 98-029 Department of Computer Science Texas A&M University January1996
In this paper, we present a comparative performance evaluation of three different parallel sorting algorithms: bitonic sort, sample sort, and parallel radix sort. In order to study the interaction between the algorithms and the architecture, we implemented all the algorithms on three different architectures: a MasPar MP1202, a mesh-connected computer with 2048 processing elements;an nCUBE 2, a message-passing hypercube with 32 processors; and a Sequent Balance, a distributed shared-memory machine with 10 processors. For each machine, we found that the choice of algorithm depends upon the number of elements to be sorted. In addition, as expected, our results show that the relative performance of the algorithms differed on the various machines. It is our hope that our resultscan be extrapolated to help select appropriate candidates for implementation on machines with architectures similar to those that we have studied. As evidence for this, our ﬁndings on the nCUBE 2, a 32 node hypercube, are in accordance with the results obtained by Blelloch et al.  on the CM-2, a hypercube with 1024 processors. In addition, preliminary results we have obtained on the SGI PowerChallenge, a distributed shared-memory machine, are in accordance with our ﬁndings on the Sequent Balance.
This research supported in part by NCSA.
Sorting is one of the fundamental problems of computer science, and parallel algorithms for sorting have been studied since the beginning of parallel computing. Batcher’s log2 n-depth bitonic sorting network  was one ofthe ﬁrst methods proposed. Since then many different parallel sorting algorithms have been proposed (see, e.g., [4, 11, 17]), of which we mention only a few here. The ﬁrst log n-depth sorting circuit was discovered by Ajtai, Komlos, and Szemeredi ; this result is mainly of theoretical importance however, since the constants in their construction are quite large. Around that same time, Reifand Valiant proposed a more practical Olog n time randomized algorithm called ﬂashsort . Some other notable parallel sorting algorithms are parallel versions of radix sort and quicksort [4, 17], column sort , and Cole’s parallel merge sort . Given the large number of parallel sorting algorithms and the wide variety of parallel architectures, it is a difﬁcult task to select the bestalgorithm for a particular machine and problem instance. The main reason that the choice is more difﬁcult than in sequential machines is because there is no known theoretical model that can be applied to accurately predict an algorithm’s performance on different architectures. Thus, experimental studies take on an increased importance for the evaluation and selection of appropriate algorithms formultiprocessors. There have been a number of implementation studies reported in the literature in the past few years (see, e.g., [5, 12]). However, more studies are needed before we can approach the point where a certain algorithm can be recommended for a particular machine with any degree of conﬁdence. In this paper, we present an experimental study of three different parallel sorting algorithms:bitonic sort, sample sort, and parallel radix sort. Since parallel algorithms are generally only beneﬁcial when the problem size is large enough, we concentrated on the case when n, the number of elements to be sorted, is much larger than p, the number of processors. In order to study the interaction between the algorithms and the architecture, we implemented all the algorithms on three different...
Ler documento completo
Por favor, assinar para o acesso.