Utilizing Wikipedia knowledge in open directory project-based text classification

Hae Yong Shin, Geun Jae Lee, Woo Jong Ryu, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.

Original languageEnglish
Title of host publication32nd Annual ACM Symposium on Applied Computing, SAC 2017
PublisherAssociation for Computing Machinery
Pages309-314
Number of pages6
ISBN (Electronic)9781450344869
DOIs
Publication statusPublished - 2017 Apr 3
Event32nd Annual ACM Symposium on Applied Computing, SAC 2017 - Marrakesh, Morocco
Duration: 2017 Apr 42017 Apr 6

Publication series

NameProceedings of the ACM Symposium on Applied Computing
VolumePart F128005

Other

Other32nd Annual ACM Symposium on Applied Computing, SAC 2017
CountryMorocco
CityMarrakesh
Period17/4/417/4/6

Keywords

  • Open directory project
  • Text classification
  • Wikipedia

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Utilizing Wikipedia knowledge in open directory project-based text classification'. Together they form a unique fingerprint.

  • Cite this

    Shin, H. Y., Lee, G. J., Ryu, W. J., & Lee, S-G. (2017). Utilizing Wikipedia knowledge in open directory project-based text classification. In 32nd Annual ACM Symposium on Applied Computing, SAC 2017 (pp. 309-314). (Proceedings of the ACM Symposium on Applied Computing; Vol. Part F128005). Association for Computing Machinery. https://doi.org/10.1145/3019612.3019