Bringing bag-of-phrases to ODP-based text classification

Haeyong Shin, Byung Gul Ryu, Woo Jong Ryu, Geunjae Lee, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

Original languageEnglish
Title of host publication2016 International Conference on Big Data and Smart Computing, BigComp 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages485-488
Number of pages4
ISBN (Electronic)9781467387965
DOIs
Publication statusPublished - 2016 Mar 3
EventInternational Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China
Duration: 2016 Jan 182016 Jan 20

Publication series

Name2016 International Conference on Big Data and Smart Computing, BigComp 2016

Other

OtherInternational Conference on Big Data and Smart Computing, BigComp 2016
Country/TerritoryChina
CityHong Kong
Period16/1/1816/1/20

Keywords

  • open directory project
  • syntactic structure
  • text classification
  • text mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Bringing bag-of-phrases to ODP-based text classification'. Together they form a unique fingerprint.

Cite this