Bringing bag-of-phrases to ODP-based text classification

Haeyong Shin, Byung Gul Ryu, Woo Jong Ryu, Geunjae Lee, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

Original languageEnglish
Title of host publication2016 International Conference on Big Data and Smart Computing, BigComp 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages485-488
Number of pages4
ISBN (Print)9781467387965
DOIs
Publication statusPublished - 2016 Mar 3
EventInternational Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China
Duration: 2016 Jan 182016 Jan 20

Other

OtherInternational Conference on Big Data and Smart Computing, BigComp 2016
CountryChina
CityHong Kong
Period16/1/1816/1/20

Fingerprint

Classifiers
Semantics
Syntactics
Text classification
Classifier
World Wide Web
Dimensionality
Evaluation

Keywords

  • open directory project
  • syntactic structure
  • text classification
  • text mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Cite this

Shin, H., Ryu, B. G., Ryu, W. J., Lee, G., & Lee, S-G. (2016). Bringing bag-of-phrases to ODP-based text classification. In 2016 International Conference on Big Data and Smart Computing, BigComp 2016 (pp. 485-488). [7425975] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIGCOMP.2016.7425975

Bringing bag-of-phrases to ODP-based text classification. / Shin, Haeyong; Ryu, Byung Gul; Ryu, Woo Jong; Lee, Geunjae; Lee, Sang-Geun.

2016 International Conference on Big Data and Smart Computing, BigComp 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 485-488 7425975.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shin, H, Ryu, BG, Ryu, WJ, Lee, G & Lee, S-G 2016, Bringing bag-of-phrases to ODP-based text classification. in 2016 International Conference on Big Data and Smart Computing, BigComp 2016., 7425975, Institute of Electrical and Electronics Engineers Inc., pp. 485-488, International Conference on Big Data and Smart Computing, BigComp 2016, Hong Kong, China, 16/1/18. https://doi.org/10.1109/BIGCOMP.2016.7425975
Shin H, Ryu BG, Ryu WJ, Lee G, Lee S-G. Bringing bag-of-phrases to ODP-based text classification. In 2016 International Conference on Big Data and Smart Computing, BigComp 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 485-488. 7425975 https://doi.org/10.1109/BIGCOMP.2016.7425975
Shin, Haeyong ; Ryu, Byung Gul ; Ryu, Woo Jong ; Lee, Geunjae ; Lee, Sang-Geun. / Bringing bag-of-phrases to ODP-based text classification. 2016 International Conference on Big Data and Smart Computing, BigComp 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 485-488
@inproceedings{ff50ed9da667413ea25c78c798a0760d,
title = "Bringing bag-of-phrases to ODP-based text classification",
abstract = "The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.",
keywords = "open directory project, syntactic structure, text classification, text mining",
author = "Haeyong Shin and Ryu, {Byung Gul} and Ryu, {Woo Jong} and Geunjae Lee and Sang-Geun Lee",
year = "2016",
month = "3",
day = "3",
doi = "10.1109/BIGCOMP.2016.7425975",
language = "English",
isbn = "9781467387965",
pages = "485--488",
booktitle = "2016 International Conference on Big Data and Smart Computing, BigComp 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Bringing bag-of-phrases to ODP-based text classification

AU - Shin, Haeyong

AU - Ryu, Byung Gul

AU - Ryu, Woo Jong

AU - Lee, Geunjae

AU - Lee, Sang-Geun

PY - 2016/3/3

Y1 - 2016/3/3

N2 - The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

AB - The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

KW - open directory project

KW - syntactic structure

KW - text classification

KW - text mining

UR - http://www.scopus.com/inward/record.url?scp=84964680330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964680330&partnerID=8YFLogxK

U2 - 10.1109/BIGCOMP.2016.7425975

DO - 10.1109/BIGCOMP.2016.7425975

M3 - Conference contribution

AN - SCOPUS:84964680330

SN - 9781467387965

SP - 485

EP - 488

BT - 2016 International Conference on Big Data and Smart Computing, BigComp 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -