Data deduplication using dynamic chunking algorithm

Young Chan Moon, Ho Min Jung, Hyuck Yoo, Young Woong Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages59-68
Number of pages10
Volume7654 LNAI
EditionPART 2
DOIs
Publication statusPublished - 2012 Dec 17
Event4th International Conference on Computational Collective Intelligence, ICCCI 2012 - Ho Chi Minh City, Viet Nam
Duration: 2012 Nov 282012 Nov 30

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7654 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other4th International Conference on Computational Collective Intelligence, ICCCI 2012
CountryViet Nam
CityHo Chi Minh City
Period12/11/2812/11/30

Fingerprint

Dynamic Algorithms
Data handling
Processing
Data Handling
Storage System
Experiments
Experiment
Similarity

Keywords

  • Chunking algorithm
  • Deduplication
  • File similarity
  • Metadata

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Moon, Y. C., Jung, H. M., Yoo, H., & Ko, Y. W. (2012). Data deduplication using dynamic chunking algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 2 ed., Vol. 7654 LNAI, pp. 59-68). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7654 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-34707-8_7

Data deduplication using dynamic chunking algorithm. / Moon, Young Chan; Jung, Ho Min; Yoo, Hyuck; Ko, Young Woong.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7654 LNAI PART 2. ed. 2012. p. 59-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7654 LNAI, No. PART 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Moon, YC, Jung, HM, Yoo, H & Ko, YW 2012, Data deduplication using dynamic chunking algorithm. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 edn, vol. 7654 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 7654 LNAI, pp. 59-68, 4th International Conference on Computational Collective Intelligence, ICCCI 2012, Ho Chi Minh City, Viet Nam, 12/11/28. https://doi.org/10.1007/978-3-642-34707-8_7
Moon YC, Jung HM, Yoo H, Ko YW. Data deduplication using dynamic chunking algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 ed. Vol. 7654 LNAI. 2012. p. 59-68. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). https://doi.org/10.1007/978-3-642-34707-8_7
Moon, Young Chan ; Jung, Ho Min ; Yoo, Hyuck ; Ko, Young Woong. / Data deduplication using dynamic chunking algorithm. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7654 LNAI PART 2. ed. 2012. pp. 59-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).
@inproceedings{ff053d8ec877475984097851408bc98d,
title = "Data deduplication using dynamic chunking algorithm",
abstract = "Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.",
keywords = "Chunking algorithm, Deduplication, File similarity, Metadata",
author = "Moon, {Young Chan} and Jung, {Ho Min} and Hyuck Yoo and Ko, {Young Woong}",
year = "2012",
month = "12",
day = "17",
doi = "10.1007/978-3-642-34707-8_7",
language = "English",
isbn = "9783642347061",
volume = "7654 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 2",
pages = "59--68",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 2",

}

TY - GEN

T1 - Data deduplication using dynamic chunking algorithm

AU - Moon, Young Chan

AU - Jung, Ho Min

AU - Yoo, Hyuck

AU - Ko, Young Woong

PY - 2012/12/17

Y1 - 2012/12/17

N2 - Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.

AB - Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.

KW - Chunking algorithm

KW - Deduplication

KW - File similarity

KW - Metadata

UR - http://www.scopus.com/inward/record.url?scp=84870860734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870860734&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-34707-8_7

DO - 10.1007/978-3-642-34707-8_7

M3 - Conference contribution

AN - SCOPUS:84870860734

SN - 9783642347061

VL - 7654 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 59

EP - 68

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -