Bringing bag-of-phrases to ODP-based text classification

Haeyong Shin, Byung Gul Ryu, Woo Jong Ryu, Geunjae Lee, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

Original languageEnglish
Title of host publication2016 International Conference on Big Data and Smart Computing, BigComp 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages485-488
Number of pages4
ISBN (Electronic)9781467387965
DOIs
Publication statusPublished - 2016 Mar 3
EventInternational Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China
Duration: 2016 Jan 182016 Jan 20

Publication series

Name2016 International Conference on Big Data and Smart Computing, BigComp 2016

Other

OtherInternational Conference on Big Data and Smart Computing, BigComp 2016
CountryChina
CityHong Kong
Period16/1/1816/1/20

Keywords

  • open directory project
  • syntactic structure
  • text classification
  • text mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Bringing bag-of-phrases to ODP-based text classification'. Together they form a unique fingerprint.

Cite this