A Dog Is Passing over the Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Recent natural language understanding (NLU) research on the Korean language has been vigorously maturing with the advancements of pretrained language models and datasets. However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i.e., generative commonsense reasoning). The two major challenges are inadequate data resources to develop generative commonsense reasoning regarding Korean linguistic features and to evaluate language models which are necessary for natural language generation (NLG). To solve these problems, we propose a textgeneration dataset for Korean generative commonsense reasoning and language model evaluation. In this work, a semi-automatic dataset construction approach filters out contents inexplicable to commonsense, ascertains quality, and reduces the cost of building the dataset. We also present an in-depth analysis of the generation results of language models with various evaluation metrics along with human-annotated scores. The whole dataset is publicly available at (https://aihub.or. kr/opendata/korea-university).

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2022 - Findings
PublisherAssociation for Computational Linguistics (ACL)
Pages2233-2249
Number of pages17
ISBN (Electronic)9781955917766
Publication statusPublished - 2022
Event2022 Findings of the Association for Computational Linguistics: NAACL 2022 - Seattle, United States
Duration: 2022 Jul 102022 Jul 15

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2022 - Findings

Conference

Conference2022 Findings of the Association for Computational Linguistics: NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period22/7/1022/7/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'A Dog Is Passing over the Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation'. Together they form a unique fingerprint.

Cite this