IVisClustering: An interactive visual document clustering via topic modeling

Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, Haesun Park

Research output: Contribution to journalArticle

77 Citations (Scopus)

Abstract

Clustering plays an important role in many large-scale data analyses providing users with an overall understand- ing of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely- used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets.

Original languageEnglish
Pages (from-to)1155-1164
Number of pages10
JournalComputer Graphics Forum
Volume31
Issue number3 PART 3
Publication statusPublished - 2012 Jan 1
Externally publishedYes

Fingerprint

Visualization
Merging

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design

Cite this

Lee, H., Kihm, J., Choo, J., Stasko, J., & Park, H. (2012). IVisClustering: An interactive visual document clustering via topic modeling. Computer Graphics Forum, 31(3 PART 3), 1155-1164.

IVisClustering : An interactive visual document clustering via topic modeling. / Lee, Hanseung; Kihm, Jaeyeon; Choo, Jaegul; Stasko, John; Park, Haesun.

In: Computer Graphics Forum, Vol. 31, No. 3 PART 3, 01.01.2012, p. 1155-1164.

Research output: Contribution to journalArticle

Lee, H, Kihm, J, Choo, J, Stasko, J & Park, H 2012, 'IVisClustering: An interactive visual document clustering via topic modeling', Computer Graphics Forum, vol. 31, no. 3 PART 3, pp. 1155-1164.
Lee, Hanseung ; Kihm, Jaeyeon ; Choo, Jaegul ; Stasko, John ; Park, Haesun. / IVisClustering : An interactive visual document clustering via topic modeling. In: Computer Graphics Forum. 2012 ; Vol. 31, No. 3 PART 3. pp. 1155-1164.
@article{18c3f3bb93e44c7088de6831ed50dd1a,
title = "IVisClustering: An interactive visual document clustering via topic modeling",
abstract = "Clustering plays an important role in many large-scale data analyses providing users with an overall understand- ing of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely- used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets.",
author = "Hanseung Lee and Jaeyeon Kihm and Jaegul Choo and John Stasko and Haesun Park",
year = "2012",
month = "1",
day = "1",
language = "English",
volume = "31",
pages = "1155--1164",
journal = "Computer Graphics Forum",
issn = "0167-7055",
publisher = "Wiley-Blackwell",
number = "3 PART 3",

}

TY - JOUR

T1 - IVisClustering

T2 - An interactive visual document clustering via topic modeling

AU - Lee, Hanseung

AU - Kihm, Jaeyeon

AU - Choo, Jaegul

AU - Stasko, John

AU - Park, Haesun

PY - 2012/1/1

Y1 - 2012/1/1

N2 - Clustering plays an important role in many large-scale data analyses providing users with an overall understand- ing of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely- used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets.

AB - Clustering plays an important role in many large-scale data analyses providing users with an overall understand- ing of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely- used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets.

UR - http://www.scopus.com/inward/record.url?scp=84875822190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875822190&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84875822190

VL - 31

SP - 1155

EP - 1164

JO - Computer Graphics Forum

JF - Computer Graphics Forum

SN - 0167-7055

IS - 3 PART 3

ER -