Utilizing Wikipedia knowledge in open directory project-based text classification

Hae Yong Shin, Geun Jae Lee, Woo Jong Ryu, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.

Original languageEnglish
Title of host publication32nd Annual ACM Symposium on Applied Computing, SAC 2017
PublisherAssociation for Computing Machinery
Pages309-314
Number of pages6
VolumePart F128005
ISBN (Electronic)9781450344869
DOIs
Publication statusPublished - 2017 Apr 3
Event32nd Annual ACM Symposium on Applied Computing, SAC 2017 - Marrakesh, Morocco
Duration: 2017 Apr 42017 Apr 6

Other

Other32nd Annual ACM Symposium on Applied Computing, SAC 2017
CountryMorocco
CityMarrakesh
Period17/4/417/4/6

Fingerprint

Semantics
Glossaries

Keywords

  • Open directory project
  • Text classification
  • Wikipedia

ASJC Scopus subject areas

  • Software

Cite this

Shin, H. Y., Lee, G. J., Ryu, W. J., & Lee, S-G. (2017). Utilizing Wikipedia knowledge in open directory project-based text classification. In 32nd Annual ACM Symposium on Applied Computing, SAC 2017 (Vol. Part F128005, pp. 309-314). Association for Computing Machinery. https://doi.org/10.1145/3019612.3019

Utilizing Wikipedia knowledge in open directory project-based text classification. / Shin, Hae Yong; Lee, Geun Jae; Ryu, Woo Jong; Lee, Sang-Geun.

32nd Annual ACM Symposium on Applied Computing, SAC 2017. Vol. Part F128005 Association for Computing Machinery, 2017. p. 309-314.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shin, HY, Lee, GJ, Ryu, WJ & Lee, S-G 2017, Utilizing Wikipedia knowledge in open directory project-based text classification. in 32nd Annual ACM Symposium on Applied Computing, SAC 2017. vol. Part F128005, Association for Computing Machinery, pp. 309-314, 32nd Annual ACM Symposium on Applied Computing, SAC 2017, Marrakesh, Morocco, 17/4/4. https://doi.org/10.1145/3019612.3019
Shin HY, Lee GJ, Ryu WJ, Lee S-G. Utilizing Wikipedia knowledge in open directory project-based text classification. In 32nd Annual ACM Symposium on Applied Computing, SAC 2017. Vol. Part F128005. Association for Computing Machinery. 2017. p. 309-314 https://doi.org/10.1145/3019612.3019
Shin, Hae Yong ; Lee, Geun Jae ; Ryu, Woo Jong ; Lee, Sang-Geun. / Utilizing Wikipedia knowledge in open directory project-based text classification. 32nd Annual ACM Symposium on Applied Computing, SAC 2017. Vol. Part F128005 Association for Computing Machinery, 2017. pp. 309-314
@inproceedings{5e9ba8697c964fe5a901ddcdb8fa5c24,
title = "Utilizing Wikipedia knowledge in open directory project-based text classification",
abstract = "Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.",
keywords = "Open directory project, Text classification, Wikipedia",
author = "Shin, {Hae Yong} and Lee, {Geun Jae} and Ryu, {Woo Jong} and Sang-Geun Lee",
year = "2017",
month = "4",
day = "3",
doi = "10.1145/3019612.3019",
language = "English",
volume = "Part F128005",
pages = "309--314",
booktitle = "32nd Annual ACM Symposium on Applied Computing, SAC 2017",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Utilizing Wikipedia knowledge in open directory project-based text classification

AU - Shin, Hae Yong

AU - Lee, Geun Jae

AU - Ryu, Woo Jong

AU - Lee, Sang-Geun

PY - 2017/4/3

Y1 - 2017/4/3

N2 - Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.

AB - Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.

KW - Open directory project

KW - Text classification

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85020899791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020899791&partnerID=8YFLogxK

U2 - 10.1145/3019612.3019

DO - 10.1145/3019612.3019

M3 - Conference contribution

AN - SCOPUS:85020899791

VL - Part F128005

SP - 309

EP - 314

BT - 32nd Annual ACM Symposium on Applied Computing, SAC 2017

PB - Association for Computing Machinery

ER -