TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

Minjeong Kim, Kyeongpil Kang, Deokgun Park, Jaegul Choo, Niklas Elmqvist

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.

Original languageEnglish
Article number7539597
Pages (from-to)151-160
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume23
Issue number1
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Factorization
Lenses
Visualization
Processing
Workflow
Diptera

Keywords

  • magic lens
  • nonnegative matrix factorization
  • t-distributed stochastic neighbor embedding
  • text analytics
  • topic modeling

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Cite this

TopicLens : Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections. / Kim, Minjeong; Kang, Kyeongpil; Park, Deokgun; Choo, Jaegul; Elmqvist, Niklas.

In: IEEE Transactions on Visualization and Computer Graphics, Vol. 23, No. 1, 7539597, 01.01.2017, p. 151-160.

Research output: Contribution to journalArticle

Kim, Minjeong ; Kang, Kyeongpil ; Park, Deokgun ; Choo, Jaegul ; Elmqvist, Niklas. / TopicLens : Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections. In: IEEE Transactions on Visualization and Computer Graphics. 2017 ; Vol. 23, No. 1. pp. 151-160.
@article{f3b8084b6e4844bf8bac44486727d919,
title = "TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections",
abstract = "Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.",
keywords = "magic lens, nonnegative matrix factorization, t-distributed stochastic neighbor embedding, text analytics, topic modeling",
author = "Minjeong Kim and Kyeongpil Kang and Deokgun Park and Jaegul Choo and Niklas Elmqvist",
year = "2017",
month = "1",
day = "1",
doi = "10.1109/TVCG.2016.2598445",
language = "English",
volume = "23",
pages = "151--160",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",
number = "1",

}

TY - JOUR

T1 - TopicLens

T2 - Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

AU - Kim, Minjeong

AU - Kang, Kyeongpil

AU - Park, Deokgun

AU - Choo, Jaegul

AU - Elmqvist, Niklas

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.

AB - Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.

KW - magic lens

KW - nonnegative matrix factorization

KW - t-distributed stochastic neighbor embedding

KW - text analytics

KW - topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84999233615&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84999233615&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2016.2598445

DO - 10.1109/TVCG.2016.2598445

M3 - Article

C2 - 27875138

AN - SCOPUS:84999233615

VL - 23

SP - 151

EP - 160

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

IS - 1

M1 - 7539597

ER -