Replicated process allocation for load distribution in fault-tolerant multicomputers

Jong Kim, Heejo Lee, Sunggu Lee

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.

Original languageEnglish
Pages (from-to)499-505
Number of pages7
JournalIEEE Transactions on Computers
Volume46
Issue number4
DOIs
Publication statusPublished - 1997 Dec 1
Externally publishedYes

Fingerprint

Multicomputers
Load Distribution
Fault-tolerant
Resource allocation
Fault
Load Balancing
Checkpointing

Keywords

  • Backup process
  • Checkpointing
  • Fault-tolerant multicomputer
  • Load balancing
  • Process allocation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Cite this

Replicated process allocation for load distribution in fault-tolerant multicomputers. / Kim, Jong; Lee, Heejo; Lee, Sunggu.

In: IEEE Transactions on Computers, Vol. 46, No. 4, 01.12.1997, p. 499-505.

Research output: Contribution to journalArticle

@article{b9d5ba32565a4fb2bbbd2eb4749c1584,
title = "Replicated process allocation for load distribution in fault-tolerant multicomputers",
abstract = "In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.",
keywords = "Backup process, Checkpointing, Fault-tolerant multicomputer, Load balancing, Process allocation",
author = "Jong Kim and Heejo Lee and Sunggu Lee",
year = "1997",
month = "12",
day = "1",
doi = "10.1109/12.588067",
language = "English",
volume = "46",
pages = "499--505",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Replicated process allocation for load distribution in fault-tolerant multicomputers

AU - Kim, Jong

AU - Lee, Heejo

AU - Lee, Sunggu

PY - 1997/12/1

Y1 - 1997/12/1

N2 - In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.

AB - In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.

KW - Backup process

KW - Checkpointing

KW - Fault-tolerant multicomputer

KW - Load balancing

KW - Process allocation

UR - http://www.scopus.com/inward/record.url?scp=0031125704&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031125704&partnerID=8YFLogxK

U2 - 10.1109/12.588067

DO - 10.1109/12.588067

M3 - Article

AN - SCOPUS:0031125704

VL - 46

SP - 499

EP - 505

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 4

ER -