From Text Classification to Keyphrase Extraction for Short Text

Song Eun Lee, Kang Min Kim, Woo Jong Ryu, Jemin Park, Sangkeun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Existing keyphrase extraction approaches often suffer from issues such as the sparsity and brevity of short text (e.g., headlines, queries, and tweets). In this paper, we propose a novel keyphrase extraction method for short text by utilizing recurrent neural networks. The main idea behind our approach is to classify short text into a relevant class or category and extract keyphrases from important words in the class or category. Unlike previous supervised approaches that need the information of annotated keyphrases, our approach requires only a text classification dataset (i.e., DBpedia), which is easier to use and requires less human effort. In our approach, we first feed short text into the attention-based neural network for text classification. We then compute attention weights of each word in input short text. Subsequently, we detect keyphrase candidates by chunking phrases and summing the attention weights of compositional words in the chunked phrase. The experimental results clearly show the efficacy of our approach on real-world datasets, such as headlines, queries, and tweets. The proposed method outperforms the Microsoft Cognitive Services and IBM Watson Natural Language Understanding service for keyphrase extraction in terms of F1-score and acceptable percentage on the NYT and Question datasets. Further, we confirm that the proposed method is comparable to supervised methods for keyphrase extraction from short text in the Tweet dataset.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1137-1142
Number of pages6
ISBN (Electronic)9781728108582
DOIs
Publication statusPublished - 2019 Dec
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: 2019 Dec 92019 Dec 12

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
CountryUnited States
CityLos Angeles
Period19/12/919/12/12

Keywords

  • Attention mechanism
  • Deep neural network
  • Keyphrase extraction
  • Knowledge base
  • Text classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'From Text Classification to Keyphrase Extraction for Short Text'. Together they form a unique fingerprint.

  • Cite this

    Lee, S. E., Kim, K. M., Ryu, W. J., Park, J., & Lee, S. (2019). From Text Classification to Keyphrase Extraction for Short Text. In C. Baru, J. Huan, L. Khan, X. T. Hu, R. Ak, Y. Tian, R. Barga, C. Zaniolo, K. Lee, & Y. F. Ye (Eds.), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 (pp. 1137-1142). [9006409] (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData47090.2019.9006409