Partitioned parallel radix sort

Shin Jae Lee, Minsoo Jeon, Dong Seung Kim, Andrew Sohn

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGI T3E does 20% to 100% in execution time. PC cluster shows over 2.4-fold improvement in execution time.

Original languageEnglish
Pages (from-to)656-668
Number of pages13
JournalJournal of Parallel and Distributed Computing
Volume62
Issue number4
DOIs
Publication statusPublished - 2002 Jun 1

Fingerprint

Sorting
Sort
Communication
Redistribution
PC Cluster
Data storage equipment
Execution Time
Distributed Memory multiprocessors
Key Distribution
Distributed Memory
Fold
Internal
Experimental Results

Keywords

  • Distributed-memory machines
  • Load balancing
  • Parallel sorting
  • Radix sort

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

Partitioned parallel radix sort. / Lee, Shin Jae; Jeon, Minsoo; Kim, Dong Seung; Sohn, Andrew.

In: Journal of Parallel and Distributed Computing, Vol. 62, No. 4, 01.06.2002, p. 656-668.

Research output: Contribution to journalArticle

Lee, Shin Jae ; Jeon, Minsoo ; Kim, Dong Seung ; Sohn, Andrew. / Partitioned parallel radix sort. In: Journal of Parallel and Distributed Computing. 2002 ; Vol. 62, No. 4. pp. 656-668.
@article{f7857da18ab048df85815d7b75605ed4,
title = "Partitioned parallel radix sort",
abstract = "Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13{\%} to 30{\%} improvement while Cray/SGI T3E does 20{\%} to 100{\%} in execution time. PC cluster shows over 2.4-fold improvement in execution time.",
keywords = "Distributed-memory machines, Load balancing, Parallel sorting, Radix sort",
author = "Lee, {Shin Jae} and Minsoo Jeon and Kim, {Dong Seung} and Andrew Sohn",
year = "2002",
month = "6",
day = "1",
doi = "10.1006/jpdc.2001.1808",
language = "English",
volume = "62",
pages = "656--668",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Partitioned parallel radix sort

AU - Lee, Shin Jae

AU - Jeon, Minsoo

AU - Kim, Dong Seung

AU - Sohn, Andrew

PY - 2002/6/1

Y1 - 2002/6/1

N2 - Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGI T3E does 20% to 100% in execution time. PC cluster shows over 2.4-fold improvement in execution time.

AB - Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGI T3E does 20% to 100% in execution time. PC cluster shows over 2.4-fold improvement in execution time.

KW - Distributed-memory machines

KW - Load balancing

KW - Parallel sorting

KW - Radix sort

UR - http://www.scopus.com/inward/record.url?scp=0036112596&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036112596&partnerID=8YFLogxK

U2 - 10.1006/jpdc.2001.1808

DO - 10.1006/jpdc.2001.1808

M3 - Article

AN - SCOPUS:0036112596

VL - 62

SP - 656

EP - 668

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 4

ER -