There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.
|Number of pages||12|
|Journal||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Publication status||Published - 2003 Dec 1|
ASJC Scopus subject areas
- Computer Science(all)
- Biochemistry, Genetics and Molecular Biology(all)
- Theoretical Computer Science