TY - JOUR
T1 - Validation of Text Data Preprocessing Using a Neural Network Model
AU - Woo, Ho Sung
AU - Kim, Ja Mee
AU - Lee, Won Gyu
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (no. 2019R1H1A1079885).
Publisher Copyright:
© 2020 HoSung Woo et al.
PY - 2020
Y1 - 2020
N2 - Many artificial intelligence studies focus on designing new neural network models or optimizing hyperparameters to improve model accuracy. To develop a reliable model, appropriate data are required, and data preprocessing is an essential part of acquiring the data. Although various studies regard data preprocessing as part of the data exploration process, those studies lack awareness about the need for separate technologies and solutions for preprocessing. Therefore, this study evaluated combinations of preprocessing types in a text-processing neural network model. Better performance was observed when two preprocessing types were used than when three or more preprocessing types were used for data purification. More specifically, using lemmatization and punctuation splitting together, lemmatization and lowering together, and lowering and punctuation splitting together showed positive effects on accuracy. This study is significant because the results allow better decisions to be made about the selection of the preprocessing types in various research fields, including neural network research.
AB - Many artificial intelligence studies focus on designing new neural network models or optimizing hyperparameters to improve model accuracy. To develop a reliable model, appropriate data are required, and data preprocessing is an essential part of acquiring the data. Although various studies regard data preprocessing as part of the data exploration process, those studies lack awareness about the need for separate technologies and solutions for preprocessing. Therefore, this study evaluated combinations of preprocessing types in a text-processing neural network model. Better performance was observed when two preprocessing types were used than when three or more preprocessing types were used for data purification. More specifically, using lemmatization and punctuation splitting together, lemmatization and lowering together, and lowering and punctuation splitting together showed positive effects on accuracy. This study is significant because the results allow better decisions to be made about the selection of the preprocessing types in various research fields, including neural network research.
UR - http://www.scopus.com/inward/record.url?scp=85085580252&partnerID=8YFLogxK
U2 - 10.1155/2020/1958149
DO - 10.1155/2020/1958149
M3 - Article
AN - SCOPUS:85085580252
VL - 2020
JO - Mathematical Problems in Engineering
JF - Mathematical Problems in Engineering
SN - 1024-123X
M1 - 1958149
ER -