TY - JOUR
T1 - Learning-Free Unsupervised Extractive Summarization Model
AU - Jang, Myeongjun
AU - Kang, Pilsung
N1 - Funding Information:
This work was supported in part by the National Research Foundation of Korea (NRF) Grants Funded by the Korean Government (MSIT) under Grant NRF-2019R1F1A1060338 and in part by the Korea Institute for Advancement of Technology (KIAT) Grant Funded by the Korean Government (MOTIE) (The Competency Development Program for Industry Specialist) under Grant P0008691.
Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.
AB - Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.
KW - Text summarization
KW - integer linear programming
KW - natural language processing
KW - sentence representation vector
UR - http://www.scopus.com/inward/record.url?scp=85099607440&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3051237
DO - 10.1109/ACCESS.2021.3051237
M3 - Article
AN - SCOPUS:85099607440
VL - 9
SP - 14358
EP - 14368
JO - IEEE Access
JF - IEEE Access
SN - 2169-3536
M1 - 9321308
ER -