Extending the ONNX Runtime Framework for the Processing-in-Memory Execution

Seok Young Kim, Jaewook Lee, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The attention mechanism-based model provides sufficiently accurate performance for NLP tasks. As the model's size enlarges, the memory usage increases exponentially. Also, the large amount of data with low locality causes an excessive increase in power consumption for the data movement. Therefore, Processing-in-Memory (PIM), which places computing logic in/near memory, is becoming an attractive solution to solve the memory bottleneck of system performance. Meanwhile, various design explorations of the PIM architecture have been studied, but their efficient software framework has been rarely conducted. This paper extends the ONNX runtime framework for the PIM-based platform. The framework provides the function abstractions for various PIM operations and easy programmability to users. We executed the BERT workload with the GLUE dataset using the framework, and the workload is dominantly used among the attention-based models. By exploiting data/bank-level parallelism and performing vector execution in each bank, our baseline PIM platform showed a speedup of x1.64 and x1.71 on average compared to x86 and ARM CPU, respectively.

Original languageEnglish
Title of host publication2022 International Conference on Electronics, Information, and Communication, ICEIC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665409346
DOIs
Publication statusPublished - 2022
Event2022 International Conference on Electronics, Information, and Communication, ICEIC 2022 - Jeju, Korea, Republic of
Duration: 2022 Feb 62022 Feb 9

Publication series

Name2022 International Conference on Electronics, Information, and Communication, ICEIC 2022

Conference

Conference2022 International Conference on Electronics, Information, and Communication, ICEIC 2022
Country/TerritoryKorea, Republic of
CityJeju
Period22/2/622/2/9

Keywords

  • Attention-based Model
  • Processing-in-Memory

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Energy Engineering and Power Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Extending the ONNX Runtime Framework for the Processing-in-Memory Execution'. Together they form a unique fingerprint.

Cite this