Abstract
Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Pages | 59-68 |
Number of pages | 10 |
Volume | 7654 LNAI |
Edition | PART 2 |
DOIs | |
Publication status | Published - 2012 Dec 17 |
Event | 4th International Conference on Computational Collective Intelligence, ICCCI 2012 - Ho Chi Minh City, Viet Nam Duration: 2012 Nov 28 → 2012 Nov 30 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Number | PART 2 |
Volume | 7654 LNAI |
ISSN (Print) | 03029743 |
ISSN (Electronic) | 16113349 |
Other
Other | 4th International Conference on Computational Collective Intelligence, ICCCI 2012 |
---|---|
Country | Viet Nam |
City | Ho Chi Minh City |
Period | 12/11/28 → 12/11/30 |
Fingerprint
Keywords
- Chunking algorithm
- Deduplication
- File similarity
- Metadata
ASJC Scopus subject areas
- Computer Science(all)
- Theoretical Computer Science
Cite this
Data deduplication using dynamic chunking algorithm. / Moon, Young Chan; Jung, Ho Min; Yoo, Hyuck; Ko, Young Woong.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7654 LNAI PART 2. ed. 2012. p. 59-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7654 LNAI, No. PART 2).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Data deduplication using dynamic chunking algorithm
AU - Moon, Young Chan
AU - Jung, Ho Min
AU - Yoo, Hyuck
AU - Ko, Young Woong
PY - 2012/12/17
Y1 - 2012/12/17
N2 - Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.
AB - Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.
KW - Chunking algorithm
KW - Deduplication
KW - File similarity
KW - Metadata
UR - http://www.scopus.com/inward/record.url?scp=84870860734&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870860734&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-34707-8_7
DO - 10.1007/978-3-642-34707-8_7
M3 - Conference contribution
AN - SCOPUS:84870860734
SN - 9783642347061
VL - 7654 LNAI
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 59
EP - 68
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ER -