Low-cost fault-tolerance protocol for large-scale network monitoring

JinHo Ahn, Sung-Gi Min, YoungIl Choi, ByungSun Lee

Research output: Contribution to journalArticle

Abstract

Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.

Original languageEnglish
Pages (from-to)504-513
Number of pages10
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2659
Publication statusPublished - 2003 Dec 1

Fingerprint

Information Management
Network Monitoring
Fault tolerance
Fault Tolerance
Managers
Organizations
Crash
Costs and Cost Analysis
Monitoring
Failure Detection
Costs
Hierarchical Networks
Distributed Networks
Hierarchical Structure
Concurrent
Scalability
Recovery
Model
Minimise
Vertex of a graph

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

@article{7abbeb304be14c02bc6c6003cf206270,
title = "Low-cost fault-tolerance protocol for large-scale network monitoring",
abstract = "Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.",
author = "JinHo Ahn and Sung-Gi Min and YoungIl Choi and ByungSun Lee",
year = "2003",
month = "12",
day = "1",
language = "English",
volume = "2659",
pages = "504--513",
journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Low-cost fault-tolerance protocol for large-scale network monitoring

AU - Ahn, JinHo

AU - Min, Sung-Gi

AU - Choi, YoungIl

AU - Lee, ByungSun

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.

AB - Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.

UR - http://www.scopus.com/inward/record.url?scp=35248877563&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35248877563&partnerID=8YFLogxK

M3 - Article

VL - 2659

SP - 504

EP - 513

JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SN - 0302-9743

ER -