TY - GEN
T1 - Bringing bag-of-phrases to ODP-based text classification
AU - Shin, Haeyong
AU - Ryu, Byung Gul
AU - Ryu, Woo Jong
AU - Lee, Geunjae
AU - Lee, Sang-Geun
N1 - Funding Information:
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and future Planning (2015R1A2A1A10052665).
PY - 2016/3/3
Y1 - 2016/3/3
N2 - The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.
AB - The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.
KW - open directory project
KW - syntactic structure
KW - text classification
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=84964680330&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964680330&partnerID=8YFLogxK
U2 - 10.1109/BIGCOMP.2016.7425975
DO - 10.1109/BIGCOMP.2016.7425975
M3 - Conference contribution
AN - SCOPUS:84964680330
T3 - 2016 International Conference on Big Data and Smart Computing, BigComp 2016
SP - 485
EP - 488
BT - 2016 International Conference on Big Data and Smart Computing, BigComp 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Conference on Big Data and Smart Computing, BigComp 2016
Y2 - 18 January 2016 through 20 January 2016
ER -