An Automatic Post Editing With Efficient and Simple Data Generation Method

Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim

Research output: Contribution to journalArticlepeer-review

Abstract

Automatic post-editing (APE) research considers methods for correcting translation results inferred by machine translation systems. The training of APE models, generally require triplets including a source sentence ( src ), machine translation sentence ( mt ), and post-edited sentence ( pe ). As considerable expert-level human labor is required in creating pe , APE researches have encountered difficulty in constructing suitable dataset for most of language pairs. This has led to the absence of APE data for most of language pairs, such as Korean-English, and imposed limitation to the sustainable researches of APE. Motivated by this problem, we propose a method that can generate APE triplets using only a parallel corpus without human labor. Our proposal comprises three noise generation techniques, including random, part of speech tagging (POS) based, and semantic level noises, and the effectiveness of these methods are verified by the results of quantitative and qualitative experiments on Korean-English APE tasks. As a result of our experiments, we find that POS based noise encourages the best APE performance. The proposed method is influential in that it can obviate expert human labor which was generally required in APE data construction, and enable the sustainable APE researches for the most language pairs where human-edited APE triplets are unavailable.

Original languageEnglish
Pages (from-to)21032-21040
Number of pages9
JournalIEEE Access
Volume10
DOIs
Publication statusPublished - 2022

Keywords

  • Automatic post editing
  • data generation
  • machine translation
  • neural machine translation
  • post editing

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint

Dive into the research topics of 'An Automatic Post Editing With Efficient and Simple Data Generation Method'. Together they form a unique fingerprint.

Cite this