TY - CONF
T1 - Dealing with the Paradox of Quality Estimation
AU - Eo, Sugyeong
AU - Park, Chanjun
AU - Moon, Hyeonseok
AU - Seo, Jaehyung
AU - Lim, Heuiseok
N1 - Funding Information:
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information Communications Technology Planning Evaluation) and the MSIT, Korea, under the ICT Creative Consilience program(IITP-2021-2020-0-01819) supervised by the IITP. Additionally, this work was supported by Institute for Information communications Technology Planning Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).
Publisher Copyright:
© 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.
AB - In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.
UR - http://www.scopus.com/inward/record.url?scp=85119152534&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85119152534
SP - 1
EP - 10
T2 - 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021
Y2 - 16 August 2021 through 20 August 2021
ER -