UTOPIAN

User-driven topic modeling based on interactive nonnegative matrix factorization

Jaegul Choo, Changhyun Lee, Chandan K. Reddy, Haesun Park

Research output: Contribution to journalArticle

121 Citations (Scopus)

Abstract

Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.

Original languageEnglish
Article number6634167
Pages (from-to)1992-2001
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume19
Issue number12
DOIs
Publication statusPublished - 2013 Nov 4
Externally publishedYes

Fingerprint

Factorization
Feedback
Datasets

Keywords

  • interactive clustering
  • Latent dirichlet allocation
  • nonnegative matrix factorization
  • text analytics
  • topic modeling
  • visual analytics

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Cite this

UTOPIAN : User-driven topic modeling based on interactive nonnegative matrix factorization. / Choo, Jaegul; Lee, Changhyun; Reddy, Chandan K.; Park, Haesun.

In: IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No. 12, 6634167, 04.11.2013, p. 1992-2001.

Research output: Contribution to journalArticle

@article{ea8d2e37ad014458a76c2856e2b2f0ea,
title = "UTOPIAN: User-driven topic modeling based on interactive nonnegative matrix factorization",
abstract = "Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.",
keywords = "interactive clustering, Latent dirichlet allocation, nonnegative matrix factorization, text analytics, topic modeling, visual analytics",
author = "Jaegul Choo and Changhyun Lee and Reddy, {Chandan K.} and Haesun Park",
year = "2013",
month = "11",
day = "4",
doi = "10.1109/TVCG.2013.212",
language = "English",
volume = "19",
pages = "1992--2001",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",
number = "12",

}

TY - JOUR

T1 - UTOPIAN

T2 - User-driven topic modeling based on interactive nonnegative matrix factorization

AU - Choo, Jaegul

AU - Lee, Changhyun

AU - Reddy, Chandan K.

AU - Park, Haesun

PY - 2013/11/4

Y1 - 2013/11/4

N2 - Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.

AB - Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.

KW - interactive clustering

KW - Latent dirichlet allocation

KW - nonnegative matrix factorization

KW - text analytics

KW - topic modeling

KW - visual analytics

UR - http://www.scopus.com/inward/record.url?scp=84886684025&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886684025&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2013.212

DO - 10.1109/TVCG.2013.212

M3 - Article

VL - 19

SP - 1992

EP - 2001

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

IS - 12

M1 - 6634167

ER -