TsPhraseRank for document clustering: Reweighting the weight of phrase

Yoon Ho Cho, Sang Hyun Park, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Pages168-174
Number of pages7
Volume403
DOIs
Publication statusPublished - 2009 Dec 1
Event2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009 - Seoul, Korea, Republic of
Duration: 2009 Nov 242009 Nov 26

Other

Other2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
CountryKorea, Republic of
CitySeoul
Period09/11/2409/11/26

    Fingerprint

Keywords

  • Document model
  • Overlap phrases
  • Reweighting

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Cho, Y. H., Park, S. H., & Lee, S-G. (2009). TsPhraseRank for document clustering: Reweighting the weight of phrase. In ACM International Conference Proceeding Series (Vol. 403, pp. 168-174) https://doi.org/10.1145/1655925.1655956