L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization

Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
EditorsFrancesco Bonchi, Xindong Wu, Ricardo Baeza-Yates, Josep Domingo-Ferrer, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages479-488
Number of pages10
ISBN (Electronic)9781509054725
DOIs
Publication statusPublished - 2017 Jan 31
Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
Duration: 2016 Dec 122016 Dec 15

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference16th IEEE International Conference on Data Mining, ICDM 2016
CountrySpain
CityBarcelona, Catalonia
Period16/12/1216/12/15

Fingerprint

Factorization
Supervised learning

Keywords

  • Ensemble learning
  • Gradient boosting
  • Local weighting
  • Matrix factorization
  • Topic modeling

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Suh, S., Choo, J., Lee, J., & Reddy, C. K. (2017). L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization. In F. Bonchi, X. Wu, R. Baeza-Yates, J. Domingo-Ferrer, & Z-H. Zhou (Eds.), Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016 (pp. 479-488). [7837872] (Proceedings - IEEE International Conference on Data Mining, ICDM). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2016.108

L-EnsNMF : Boosted local topic discovery via ensemble of nonnegative matrix factorization. / Suh, Sangho; Choo, Jaegul; Lee, Joonseok; Reddy, Chandan K.

Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. ed. / Francesco Bonchi; Xindong Wu; Ricardo Baeza-Yates; Josep Domingo-Ferrer; Zhi-Hua Zhou. Institute of Electrical and Electronics Engineers Inc., 2017. p. 479-488 7837872 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Suh, S, Choo, J, Lee, J & Reddy, CK 2017, L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization. in F Bonchi, X Wu, R Baeza-Yates, J Domingo-Ferrer & Z-H Zhou (eds), Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016., 7837872, Proceedings - IEEE International Conference on Data Mining, ICDM, Institute of Electrical and Electronics Engineers Inc., pp. 479-488, 16th IEEE International Conference on Data Mining, ICDM 2016, Barcelona, Catalonia, Spain, 16/12/12. https://doi.org/10.1109/ICDM.2016.108
Suh S, Choo J, Lee J, Reddy CK. L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization. In Bonchi F, Wu X, Baeza-Yates R, Domingo-Ferrer J, Zhou Z-H, editors, Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 479-488. 7837872. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2016.108
Suh, Sangho ; Choo, Jaegul ; Lee, Joonseok ; Reddy, Chandan K. / L-EnsNMF : Boosted local topic discovery via ensemble of nonnegative matrix factorization. Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. editor / Francesco Bonchi ; Xindong Wu ; Ricardo Baeza-Yates ; Josep Domingo-Ferrer ; Zhi-Hua Zhou. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 479-488 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{63fb0414191b455fb575f59aa812c9ef,
title = "L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization",
abstract = "Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.",
keywords = "Ensemble learning, Gradient boosting, Local weighting, Matrix factorization, Topic modeling",
author = "Sangho Suh and Jaegul Choo and Joonseok Lee and Reddy, {Chandan K.}",
year = "2017",
month = "1",
day = "31",
doi = "10.1109/ICDM.2016.108",
language = "English",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "479--488",
editor = "Francesco Bonchi and Xindong Wu and Ricardo Baeza-Yates and Josep Domingo-Ferrer and Zhi-Hua Zhou",
booktitle = "Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016",

}

TY - GEN

T1 - L-EnsNMF

T2 - Boosted local topic discovery via ensemble of nonnegative matrix factorization

AU - Suh, Sangho

AU - Choo, Jaegul

AU - Lee, Joonseok

AU - Reddy, Chandan K.

PY - 2017/1/31

Y1 - 2017/1/31

N2 - Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.

AB - Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.

KW - Ensemble learning

KW - Gradient boosting

KW - Local weighting

KW - Matrix factorization

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=85014559734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014559734&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2016.108

DO - 10.1109/ICDM.2016.108

M3 - Conference contribution

AN - SCOPUS:85014559734

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 479

EP - 488

BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016

A2 - Bonchi, Francesco

A2 - Wu, Xindong

A2 - Baeza-Yates, Ricardo

A2 - Domingo-Ferrer, Josep

A2 - Zhou, Zhi-Hua

PB - Institute of Electrical and Electronics Engineers Inc.

ER -