Distribution-insensitive parallel external sorting on PC clusters

Minsoo Jeon, Dong Seung Kim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.

Original languageEnglish
Pages (from-to)202-213
Number of pages12
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2858
Publication statusPublished - 2003 Dec 1

Fingerprint

PC Cluster
Sorting
Sort
Sorting algorithm
Cost-Benefit Analysis
Cost effectiveness
Cost-effectiveness
Linux
Research
Throughput
Histogram
Execution Time
Sampling
Count
Experimental Results
Arbitrary
Range of data

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science
  • Engineering(all)

Cite this

@article{5e80b71fc3ee453b9000f9b8cc7314bc,
title = "Distribution-insensitive parallel external sorting on PC clusters",
abstract = "There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63{\%} reduction in the execution time compared to previous NOW-sort.",
author = "Minsoo Jeon and Kim, {Dong Seung}",
year = "2003",
month = "12",
day = "1",
language = "English",
volume = "2858",
pages = "202--213",
journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Distribution-insensitive parallel external sorting on PC clusters

AU - Jeon, Minsoo

AU - Kim, Dong Seung

PY - 2003/12/1

Y1 - 2003/12/1

N2 - There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.

AB - There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.

UR - http://www.scopus.com/inward/record.url?scp=0242307961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242307961&partnerID=8YFLogxK

M3 - Article

VL - 2858

SP - 202

EP - 213

JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SN - 0302-9743

ER -