TsPhraseRank for document clustering

Reweighting the weight of phrase

Yoon Ho Cho, Sang Hyun Park, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Pages168-174
Number of pages7
Volume403
DOIs
Publication statusPublished - 2009 Dec 1
Event2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009 - Seoul, Korea, Republic of
Duration: 2009 Nov 242009 Nov 26

Other

Other2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
CountryKorea, Republic of
CitySeoul
Period09/11/2409/11/26

Fingerprint

Clustering algorithms
Experiments

Keywords

  • Document model
  • Overlap phrases
  • Reweighting

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Cho, Y. H., Park, S. H., & Lee, S-G. (2009). TsPhraseRank for document clustering: Reweighting the weight of phrase. In ACM International Conference Proceeding Series (Vol. 403, pp. 168-174) https://doi.org/10.1145/1655925.1655956

TsPhraseRank for document clustering : Reweighting the weight of phrase. / Cho, Yoon Ho; Park, Sang Hyun; Lee, Sang-Geun.

ACM International Conference Proceeding Series. Vol. 403 2009. p. 168-174.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cho, YH, Park, SH & Lee, S-G 2009, TsPhraseRank for document clustering: Reweighting the weight of phrase. in ACM International Conference Proceeding Series. vol. 403, pp. 168-174, 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009, Seoul, Korea, Republic of, 09/11/24. https://doi.org/10.1145/1655925.1655956
Cho YH, Park SH, Lee S-G. TsPhraseRank for document clustering: Reweighting the weight of phrase. In ACM International Conference Proceeding Series. Vol. 403. 2009. p. 168-174 https://doi.org/10.1145/1655925.1655956
Cho, Yoon Ho ; Park, Sang Hyun ; Lee, Sang-Geun. / TsPhraseRank for document clustering : Reweighting the weight of phrase. ACM International Conference Proceeding Series. Vol. 403 2009. pp. 168-174
@inproceedings{ea87903a458f4b87a623cd9299b55767,
title = "TsPhraseRank for document clustering: Reweighting the weight of phrase",
abstract = "Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.",
keywords = "Document model, Overlap phrases, Reweighting",
author = "Cho, {Yoon Ho} and Park, {Sang Hyun} and Sang-Geun Lee",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1655925.1655956",
language = "English",
isbn = "9781605587103",
volume = "403",
pages = "168--174",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - TsPhraseRank for document clustering

T2 - Reweighting the weight of phrase

AU - Cho, Yoon Ho

AU - Park, Sang Hyun

AU - Lee, Sang-Geun

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

AB - Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

KW - Document model

KW - Overlap phrases

KW - Reweighting

UR - http://www.scopus.com/inward/record.url?scp=74949140434&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74949140434&partnerID=8YFLogxK

U2 - 10.1145/1655925.1655956

DO - 10.1145/1655925.1655956

M3 - Conference contribution

SN - 9781605587103

VL - 403

SP - 168

EP - 174

BT - ACM International Conference Proceeding Series

ER -