Pre-trained Language Model for Biomedical Question Answering

Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    4 Citations (Scopus)

    Abstract

    The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers.

    Original languageEnglish
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
    EditorsPeggy Cellier, Kurt Driessens
    PublisherSpringer
    Pages727-740
    Number of pages14
    ISBN (Print)9783030438869
    DOIs
    Publication statusPublished - 2020
    Event19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 - Wurzburg, Germany
    Duration: 2019 Sep 162019 Sep 20

    Publication series

    NameCommunications in Computer and Information Science
    Volume1168 CCIS
    ISSN (Print)1865-0929
    ISSN (Electronic)1865-0937

    Conference

    Conference19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
    Country/TerritoryGermany
    CityWurzburg
    Period19/9/1619/9/20

    Keywords

    • Biomedical question answering
    • Pre-trained language model
    • Transfer learning

    ASJC Scopus subject areas

    • Computer Science(all)
    • Mathematics(all)

    Fingerprint

    Dive into the research topics of 'Pre-trained Language Model for Biomedical Question Answering'. Together they form a unique fingerprint.

    Cite this