Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience

Young Seo Lee, Gunjae Koo, Young Ho Gong, Sung Woo Chung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As DRAM process technology scales down and DRAM density continues to grow, DRAM errors have become a primary concern in modern data centers. Typically, data centers have adopted memory systems with a single error correction double error detection (SECDED) code. However, the SECDED code is not sufficient to satisfy DRAM reliability demands as memory systems get more vulnerable. Though the servers in data centers employ strong ECC schemes, such ECC schemes lead to substantial performance and/or storage overhead. In this paper, we propose Stealth ECC, a cost-effective memory protection scheme providing stronger error correctability than the conventional SECDED code, with negligible performance overhead and without storage overhead. Depending on the data-width (either narrow-width or full-width), Stealth ECC adaptively selects ECC schemes. For narrow-width values, Stealth ECC provides multi-bit error correctability by storing more parity bits in MSB side, instead of zeros. Furthermore, with bitwise interleaved data placement between x4 DRAM chips, Stealth ECC is robust to a single DRAM chip error for narrow-width values. On the other hand, for full-width values, Stealth ECC adopts the SECDED code, which maintains DRAM reliability comparable to the conventional SECDED code. As a result, thanks to the reliability improvement of narrow-width values, Stealth ECC enhances overall DRAM reliability, while incurring negligible performance overhead as well as no storage overhead. Our simulation results show that Stealth ECC reduces the probability of system failure (caused by DRAM errors) by 47.9%, on average, with only 0.9% performance overhead compared to the conventional SECDED code.

Original languageEnglish
Title of host publicationProceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
EditorsCristiana Bolchini, Ingrid Verbauwhede, Ioana Vatajelu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages382-387
Number of pages6
ISBN (Electronic)9783981926361
DOIs
Publication statusPublished - 2022
Event2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 - Virtual, Online, Belgium
Duration: 2022 Mar 142022 Mar 23

Publication series

NameProceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022

Conference

Conference2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
Country/TerritoryBelgium
CityVirtual, Online
Period22/3/1422/3/23

Keywords

  • chip error resilience
  • DRAM reliability
  • error correction code
  • narrow-width value

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience'. Together they form a unique fingerprint.

Cite this