TY - GEN
T1 - L-EnsNMF
T2 - 16th IEEE International Conference on Data Mining, ICDM 2016
AU - Suh, Sangho
AU - Choo, Jaegul
AU - Lee, Joonseok
AU - Reddy, Chandan K.
N1 - Funding Information:
This work was partially supported by National Science Foundation grants IIS-1231742, IIS-1527827, and IIS- 1646881 and by Institute for Information & communications Technology Promotion (IITP) and Basic Science Research Program through the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2015K2A1A2070536). Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of funding agencies.
Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2017/1/31
Y1 - 2017/1/31
N2 - Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.
AB - Nonnegative matrix factorization (NMF) has beenwidely applied in many domains. In document analysis, it hasbeen increasingly used in topic modeling applications, where aset of underlying topics are revealed by a low-rank factor matrixfrom NMF. However, it is often the case that the resulting topicsgive only general topic information in the data, which tends notto convey much information. To tackle this problem, we proposea novel ensemble model of nonnegative matrix factorizationfor discovering high-quality local topics. Our method leveragesthe idea of an ensemble model, which has been successfulin supervised learning, into an unsupervised topic modelingcontext. That is, our model successively performs NMF givena residual matrix obtained from previous stages and generatesa sequence of topic sets. Our algorithm for updating the inputmatrix has novelty in two aspects. The first lies in utilizing theresidual matrix inspired by a state-of-The-Art gradient boostingmodel, and the second stems from applying a sophisticatedlocal weighting scheme on the given matrix to enhance thelocality of topics, which in turn delivers high-quality, focusedtopics of interest to users. We evaluate our proposed method bycomparing it against other topic modeling methods, such as afew variants of NMF and latent Dirichlet allocation, in termsof various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also presentqualitative evaluation on the topics discovered by our methodusing several real-world data sets.
KW - Ensemble learning
KW - Gradient boosting
KW - Local weighting
KW - Matrix factorization
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85014559734&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014559734&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2016.108
DO - 10.1109/ICDM.2016.108
M3 - Conference contribution
AN - SCOPUS:85014559734
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 479
EP - 488
BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
A2 - Bonchi, Francesco
A2 - Wu, Xindong
A2 - Baeza-Yates, Ricardo
A2 - Domingo-Ferrer, Josep
A2 - Zhou, Zhi-Hua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 December 2016 through 15 December 2016
ER -