A comparison of parallel sorting algorithms on different architectures

Disponível somente no TrabalhosFeitos
  • Páginas : 38 (9346 palavras )
  • Download(s) : 0
  • Publicado : 13 de março de 2012
Ler documento completo
Amostra do texto
A Comparison of Parallel Sorting Algorithms on Different Architectures
NANCY M. A MATO
amato@cs.tamu.edu

R AVISHANKAR I YER
ravi@cs.tamu.edu

S HARAD S UNDARESAN
sharad@cs.tamu.edu

YAN W U
yanwu@cs.tamu.edu

Department of Computer Science, Texas A&M University, College Station, TX 77843-3112

Technical Report 98-029 Department of Computer Science Texas A&M University January1996

Abstract
In this paper, we present a comparative performance evaluation of three different parallel sorting algorithms: bitonic sort, sample sort, and parallel radix sort. In order to study the interaction between the algorithms and the architecture, we implemented all the algorithms on three different architectures: a MasPar MP1202, a mesh-connected computer with 2048 processing elements;an nCUBE 2, a message-passing hypercube with 32 processors; and a Sequent Balance, a distributed shared-memory machine with 10 processors. For each machine, we found that the choice of algorithm depends upon the number of elements to be sorted. In addition, as expected, our results show that the relative performance of the algorithms differed on the various machines. It is our hope that our resultscan be extrapolated to help select appropriate candidates for implementation on machines with architectures similar to those that we have studied. As evidence for this, our findings on the nCUBE 2, a 32 node hypercube, are in accordance with the results obtained by Blelloch et al. [5] on the CM-2, a hypercube with 1024 processors. In addition, preliminary results we have obtained on the SGI PowerChallenge, a distributed shared-memory machine, are in accordance with our findings on the Sequent Balance.

 This research supported in part by NCSA.

1 Introduction
Sorting is one of the fundamental problems of computer science, and parallel algorithms for sorting have been studied since the beginning of parallel computing. Batcher’s log2 n-depth bitonic sorting network [2] was one ofthe first methods proposed. Since then many different parallel sorting algorithms have been proposed (see, e.g., [4, 11, 17]), of which we mention only a few here. The first log n-depth sorting circuit was discovered by Ajtai, Komlos, and Szemeredi [1]; this result is mainly of theoretical importance however, since the constants in their construction are quite large. Around that same time, Reifand Valiant proposed a more practical Olog n time randomized algorithm called flashsort [16]. Some other notable parallel sorting algorithms are parallel versions of radix sort and quicksort [4, 17], column sort [10], and Cole’s parallel merge sort [6]. Given the large number of parallel sorting algorithms and the wide variety of parallel architectures, it is a difficult task to select the bestalgorithm for a particular machine and problem instance. The main reason that the choice is more difficult than in sequential machines is because there is no known theoretical model that can be applied to accurately predict an algorithm’s performance on different architectures. Thus, experimental studies take on an increased importance for the evaluation and selection of appropriate algorithms formultiprocessors. There have been a number of implementation studies reported in the literature in the past few years (see, e.g., [5, 12]). However, more studies are needed before we can approach the point where a certain algorithm can be recommended for a particular machine with any degree of confidence. In this paper, we present an experimental study of three different parallel sorting algorithms:bitonic sort, sample sort, and parallel radix sort. Since parallel algorithms are generally only beneficial when the problem size is large enough, we concentrated on the case when n, the number of elements to be sorted, is much larger than p, the number of processors. In order to study the interaction between the algorithms and the architecture, we implemented all the algorithms on three different...
tracking img