TY - JOUR
T1 - Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization
AU - Suh, Sangho
AU - Shin, Sungbok
AU - Lee, Joonseok
AU - Reddy, Chandan K.
AU - Choo, Jaegul
N1 - Funding Information:
This work was supported in part by the National Science Foundation Grants IIS-1707498, IIS-1619028, and IIS-1646881 and by Basic Science Research Program through the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1C1B2015924). Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of funding agencies. This work is an extended version of [48].
Funding Information:
Acknowledgements This work was supported in part by the National Science Foundation Grants IIS-1707498, IIS-1619028, and IIS-1646881 and by Basic Science Research Program through the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1C1B2015924). Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of funding agencies.
Publisher Copyright:
© 2018, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.
AB - Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.
KW - Ensemble learning
KW - Gradient boosting
KW - Local weighting
KW - Matrix factorization
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85049339359&partnerID=8YFLogxK
U2 - 10.1007/s10115-017-1147-9
DO - 10.1007/s10115-017-1147-9
M3 - Article
AN - SCOPUS:85049339359
VL - 56
SP - 503
EP - 531
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
SN - 0219-1377
IS - 3
ER -