The attention mechanism-based model provides sufficiently accurate performance for NLP tasks. As the model's size enlarges, the memory usage increases exponentially. Also, the large amount of data with low locality causes an excessive increase in power consumption for the data movement. Therefore, Processing-in-Memory (PIM), which places computing logic in/near memory, is becoming an attractive solution to solve the memory bottleneck of system performance. Meanwhile, various design explorations of the PIM architecture have been studied, but their efficient software framework has been rarely conducted. This paper extends the ONNX runtime framework for the PIM-based platform. The framework provides the function abstractions for various PIM operations and easy programmability to users. We executed the BERT workload with the GLUE dataset using the framework, and the workload is dominantly used among the attention-based models. By exploiting data/bank-level parallelism and performing vector execution in each bank, our baseline PIM platform showed a speedup of x1.64 and x1.71 on average compared to x86 and ARM CPU, respectively.