TY - GEN
T1 - Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization
AU - Kim, Hannah
AU - Choo, Jaegul
AU - Kim, Jingu
AU - Reddy, Chandan K.
AU - Park, Haesun
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.
AB - Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.
KW - Discriminative pattern mining
KW - Nonnegative matrix factorization
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=84954150798&partnerID=8YFLogxK
U2 - 10.1145/2783258.2783338
DO - 10.1145/2783258.2783338
M3 - Conference contribution
AN - SCOPUS:84954150798
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 567
EP - 576
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Y2 - 10 August 2015 through 13 August 2015
ER -