Pre-trained Language Model for Biomedical Question Answering

Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
EditorsPeggy Cellier, Kurt Driessens
PublisherSpringer
Pages727-740
Number of pages14
ISBN (Print)9783030438869
DOIs
Publication statusPublished - 2020 Jan 1
Event19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 - Wurzburg, Germany
Duration: 2019 Sep 162019 Sep 20

Publication series

NameCommunications in Computer and Information Science
Volume1168 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
CountryGermany
CityWurzburg
Period19/9/1619/9/20

Keywords

  • Biomedical question answering
  • Pre-trained language model
  • Transfer learning

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Yoon, W., Lee, J., Kim, D., Jeong, M., & Kang, J. (2020). Pre-trained Language Model for Biomedical Question Answering. In P. Cellier, & K. Driessens (Eds.), Machine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings (pp. 727-740). (Communications in Computer and Information Science; Vol. 1168 CCIS). Springer. https://doi.org/10.1007/978-3-030-43887-6_64