Combining Dual Word Embeddings with Open Directory Project Based Text Classification

Dinara Aliyeva, Kang Min Kim, Byung Ju Choi, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9% and 37% in terms of Fl-score and precision at k, over the state-of-the-art techniques.

Original languageEnglish
Title of host publicationProceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018
EditorsNewton Howard, Sam Kwong, Yingxu Wang, Jerome Feldman, Bernard Widrow, Phillip Sheu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages179-186
Number of pages8
ISBN (Electronic)9781538633601
DOIs
Publication statusPublished - 2018 Oct 4
Event17th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018 - Berkeley, United States
Duration: 2018 Jul 162018 Jul 18

Other

Other17th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018
CountryUnited States
CityBerkeley
Period18/7/1618/7/18

Fingerprint

Directories
Semantics
Knowledge Bases

Keywords

  • Machine Learning
  • Text Classification
  • Word embeddings

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Cognitive Neuroscience

Cite this

Aliyeva, D., Kim, K. M., Choi, B. J., & Lee, S-G. (2018). Combining Dual Word Embeddings with Open Directory Project Based Text Classification. In N. Howard, S. Kwong, Y. Wang, J. Feldman, B. Widrow, & P. Sheu (Eds.), Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018 (pp. 179-186). [8482044] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCI-CC.2018.8482044

Combining Dual Word Embeddings with Open Directory Project Based Text Classification. / Aliyeva, Dinara; Kim, Kang Min; Choi, Byung Ju; Lee, Sang-Geun.

Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018. ed. / Newton Howard; Sam Kwong; Yingxu Wang; Jerome Feldman; Bernard Widrow; Phillip Sheu. Institute of Electrical and Electronics Engineers Inc., 2018. p. 179-186 8482044.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aliyeva, D, Kim, KM, Choi, BJ & Lee, S-G 2018, Combining Dual Word Embeddings with Open Directory Project Based Text Classification. in N Howard, S Kwong, Y Wang, J Feldman, B Widrow & P Sheu (eds), Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018., 8482044, Institute of Electrical and Electronics Engineers Inc., pp. 179-186, 17th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018, Berkeley, United States, 18/7/16. https://doi.org/10.1109/ICCI-CC.2018.8482044
Aliyeva D, Kim KM, Choi BJ, Lee S-G. Combining Dual Word Embeddings with Open Directory Project Based Text Classification. In Howard N, Kwong S, Wang Y, Feldman J, Widrow B, Sheu P, editors, Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 179-186. 8482044 https://doi.org/10.1109/ICCI-CC.2018.8482044
Aliyeva, Dinara ; Kim, Kang Min ; Choi, Byung Ju ; Lee, Sang-Geun. / Combining Dual Word Embeddings with Open Directory Project Based Text Classification. Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018. editor / Newton Howard ; Sam Kwong ; Yingxu Wang ; Jerome Feldman ; Bernard Widrow ; Phillip Sheu. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 179-186
@inproceedings{7a8e4a564f4b4f33a46cc150f694469e,
title = "Combining Dual Word Embeddings with Open Directory Project Based Text Classification",
abstract = "Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9{\%} and 37{\%} in terms of Fl-score and precision at k, over the state-of-the-art techniques.",
keywords = "Machine Learning, Text Classification, Word embeddings",
author = "Dinara Aliyeva and Kim, {Kang Min} and Choi, {Byung Ju} and Sang-Geun Lee",
year = "2018",
month = "10",
day = "4",
doi = "10.1109/ICCI-CC.2018.8482044",
language = "English",
pages = "179--186",
editor = "Newton Howard and Sam Kwong and Yingxu Wang and Jerome Feldman and Bernard Widrow and Phillip Sheu",
booktitle = "Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Combining Dual Word Embeddings with Open Directory Project Based Text Classification

AU - Aliyeva, Dinara

AU - Kim, Kang Min

AU - Choi, Byung Ju

AU - Lee, Sang-Geun

PY - 2018/10/4

Y1 - 2018/10/4

N2 - Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9% and 37% in terms of Fl-score and precision at k, over the state-of-the-art techniques.

AB - Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9% and 37% in terms of Fl-score and precision at k, over the state-of-the-art techniques.

KW - Machine Learning

KW - Text Classification

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85056464881&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056464881&partnerID=8YFLogxK

U2 - 10.1109/ICCI-CC.2018.8482044

DO - 10.1109/ICCI-CC.2018.8482044

M3 - Conference contribution

SP - 179

EP - 186

BT - Proceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018

A2 - Howard, Newton

A2 - Kwong, Sam

A2 - Wang, Yingxu

A2 - Feldman, Jerome

A2 - Widrow, Bernard

A2 - Sheu, Phillip

PB - Institute of Electrical and Electronics Engineers Inc.

ER -