TY - GEN
T1 - From Text Classification to Keyphrase Extraction for Short Text
AU - Lee, Song Eun
AU - Kim, Kang Min
AU - Ryu, Woo Jong
AU - Park, Jemin
AU - Lee, Sangkeun
N1 - Publisher Copyright:
© 2019 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/12
Y1 - 2019/12
N2 - Existing keyphrase extraction approaches often suffer from issues such as the sparsity and brevity of short text (e.g., headlines, queries, and tweets). In this paper, we propose a novel keyphrase extraction method for short text by utilizing recurrent neural networks. The main idea behind our approach is to classify short text into a relevant class or category and extract keyphrases from important words in the class or category. Unlike previous supervised approaches that need the information of annotated keyphrases, our approach requires only a text classification dataset (i.e., DBpedia), which is easier to use and requires less human effort. In our approach, we first feed short text into the attention-based neural network for text classification. We then compute attention weights of each word in input short text. Subsequently, we detect keyphrase candidates by chunking phrases and summing the attention weights of compositional words in the chunked phrase. The experimental results clearly show the efficacy of our approach on real-world datasets, such as headlines, queries, and tweets. The proposed method outperforms the Microsoft Cognitive Services and IBM Watson Natural Language Understanding service for keyphrase extraction in terms of F1-score and acceptable percentage on the NYT and Question datasets. Further, we confirm that the proposed method is comparable to supervised methods for keyphrase extraction from short text in the Tweet dataset.
AB - Existing keyphrase extraction approaches often suffer from issues such as the sparsity and brevity of short text (e.g., headlines, queries, and tweets). In this paper, we propose a novel keyphrase extraction method for short text by utilizing recurrent neural networks. The main idea behind our approach is to classify short text into a relevant class or category and extract keyphrases from important words in the class or category. Unlike previous supervised approaches that need the information of annotated keyphrases, our approach requires only a text classification dataset (i.e., DBpedia), which is easier to use and requires less human effort. In our approach, we first feed short text into the attention-based neural network for text classification. We then compute attention weights of each word in input short text. Subsequently, we detect keyphrase candidates by chunking phrases and summing the attention weights of compositional words in the chunked phrase. The experimental results clearly show the efficacy of our approach on real-world datasets, such as headlines, queries, and tweets. The proposed method outperforms the Microsoft Cognitive Services and IBM Watson Natural Language Understanding service for keyphrase extraction in terms of F1-score and acceptable percentage on the NYT and Question datasets. Further, we confirm that the proposed method is comparable to supervised methods for keyphrase extraction from short text in the Tweet dataset.
KW - Attention mechanism
KW - Deep neural network
KW - Keyphrase extraction
KW - Knowledge base
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85081301325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081301325&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006409
DO - 10.1109/BigData47090.2019.9006409
M3 - Conference contribution
AN - SCOPUS:85081301325
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 1137
EP - 1142
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -