TY - GEN
T1 - Incorporating word embeddings into open directory project based large-scale classification
AU - Kim, Kang Min
AU - Dinara, Aliyeva
AU - Choi, Byung Ju
AU - Lee, Sang-Geun
N1 - Funding Information:
Acknowledgment. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (number 2015R1A2A1A10052665).
Publisher Copyright:
© 2018, Springer International Publishing AG, part of Springer Nature.
PY - 2018
Y1 - 2018
N2 - Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model. The evaluation results clearly show the efficacy of our methodology in large-scale text classification. The proposed scheme exhibits significant improvements of 10% and 28% in terms of macro-averaging F1-score and precision at k, respectively, over state-of-the-art techniques.
AB - Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model. The evaluation results clearly show the efficacy of our methodology in large-scale text classification. The proposed scheme exhibits significant improvements of 10% and 28% in terms of macro-averaging F1-score and precision at k, respectively, over state-of-the-art techniques.
KW - Text classification
KW - Word embeddings
UR - http://www.scopus.com/inward/record.url?scp=85049372495&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-93037-4_30
DO - 10.1007/978-3-319-93037-4_30
M3 - Conference contribution
AN - SCOPUS:85049372495
SN - 9783319930367
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 376
EP - 388
BT - Advances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, PAKDD 2018, Proceedings
A2 - Ho, Bao
A2 - Phung, Dinh
A2 - Webb, Geoffrey I.
A2 - Tseng, Vincent S.
A2 - Ganji, Mohadeseh
A2 - Rashidi, Lida
PB - Springer Verlag
T2 - 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018
Y2 - 3 June 2018 through 6 June 2018
ER -