Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

Sangho Suh, Sungbok Shin, Joonseok Lee, Chandan K. Reddy, Jaegul Choo

Research output: Contribution to journalArticle

Abstract

Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.

Original languageEnglish
Pages (from-to)503-531
Number of pages29
JournalKnowledge and Information Systems
Volume56
Issue number3
DOIs
Publication statusPublished - 2018 Sep 1

Fingerprint

Factorization
Supervised learning

Keywords

  • Ensemble learning
  • Gradient boosting
  • Local weighting
  • Matrix factorization
  • Topic modeling

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Cite this

Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization. / Suh, Sangho; Shin, Sungbok; Lee, Joonseok; Reddy, Chandan K.; Choo, Jaegul.

In: Knowledge and Information Systems, Vol. 56, No. 3, 01.09.2018, p. 503-531.

Research output: Contribution to journalArticle

Suh, Sangho ; Shin, Sungbok ; Lee, Joonseok ; Reddy, Chandan K. ; Choo, Jaegul. / Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization. In: Knowledge and Information Systems. 2018 ; Vol. 56, No. 3. pp. 503-531.
@article{a0ff11f95f1c4fecac700075017afdb4,
title = "Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization",
abstract = "Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.",
keywords = "Ensemble learning, Gradient boosting, Local weighting, Matrix factorization, Topic modeling",
author = "Sangho Suh and Sungbok Shin and Joonseok Lee and Reddy, {Chandan K.} and Jaegul Choo",
year = "2018",
month = "9",
day = "1",
doi = "10.1007/s10115-017-1147-9",
language = "English",
volume = "56",
pages = "503--531",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "3",

}

TY - JOUR

T1 - Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

AU - Suh, Sangho

AU - Shin, Sungbok

AU - Lee, Joonseok

AU - Reddy, Chandan K.

AU - Choo, Jaegul

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.

AB - Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.

KW - Ensemble learning

KW - Gradient boosting

KW - Local weighting

KW - Matrix factorization

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=85049339359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049339359&partnerID=8YFLogxK

U2 - 10.1007/s10115-017-1147-9

DO - 10.1007/s10115-017-1147-9

M3 - Article

AN - SCOPUS:85049339359

VL - 56

SP - 503

EP - 531

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 3

ER -