Partitioned parallel radix sort

Shin Jae Lee, Minsoo Jeon, Andrew Sohn, Dong Seung Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to rightto-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGIT3E does 20% to 100% in execution time. PC cluster shows over 2. 5 fold improvement in execution time.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages160-171
Number of pages12
Volume1940
ISBN (Print)9783540411284
Publication statusPublished - 2000
Event3rd International Symposium on High Performance Computing, ISHPC 2000 - Tokyo, Japan
Duration: 2000 Oct 162000 Oct 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1940
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd International Symposium on High Performance Computing, ISHPC 2000
CountryJapan
CityTokyo
Period00/10/1600/10/18

Fingerprint

Sorting
Sort
Communication
Redistribution
PC Cluster
Execution Time
Distributed Memory multiprocessors
Key Distribution
Data storage equipment
Fold
Internal
Experimental Results

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Lee, S. J., Jeon, M., Sohn, A., & Kim, D. S. (2000). Partitioned parallel radix sort. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1940, pp. 160-171). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1940). Springer Verlag.

Partitioned parallel radix sort. / Lee, Shin Jae; Jeon, Minsoo; Sohn, Andrew; Kim, Dong Seung.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1940 Springer Verlag, 2000. p. 160-171 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1940).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, SJ, Jeon, M, Sohn, A & Kim, DS 2000, Partitioned parallel radix sort. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1940, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1940, Springer Verlag, pp. 160-171, 3rd International Symposium on High Performance Computing, ISHPC 2000, Tokyo, Japan, 00/10/16.
Lee SJ, Jeon M, Sohn A, Kim DS. Partitioned parallel radix sort. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1940. Springer Verlag. 2000. p. 160-171. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Lee, Shin Jae ; Jeon, Minsoo ; Sohn, Andrew ; Kim, Dong Seung. / Partitioned parallel radix sort. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1940 Springer Verlag, 2000. pp. 160-171 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{8ef421f655144b8d8cfeeeafaaef3552,
title = "Partitioned parallel radix sort",
abstract = "Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to rightto-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13{\%} to 30{\%} improvement while Cray/SGIT3E does 20{\%} to 100{\%} in execution time. PC cluster shows over 2. 5 fold improvement in execution time.",
author = "Lee, {Shin Jae} and Minsoo Jeon and Andrew Sohn and Kim, {Dong Seung}",
year = "2000",
language = "English",
isbn = "9783540411284",
volume = "1940",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "160--171",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Partitioned parallel radix sort

AU - Lee, Shin Jae

AU - Jeon, Minsoo

AU - Sohn, Andrew

AU - Kim, Dong Seung

PY - 2000

Y1 - 2000

N2 - Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to rightto-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGIT3E does 20% to 100% in execution time. PC cluster shows over 2. 5 fold improvement in execution time.

AB - Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to rightto-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGIT3E does 20% to 100% in execution time. PC cluster shows over 2. 5 fold improvement in execution time.

UR - http://www.scopus.com/inward/record.url?scp=84944034962&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944034962&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84944034962

SN - 9783540411284

VL - 1940

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 160

EP - 171

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -