Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization

Hannah Kim, Jaegul Choo, Jingu Kim, Chandan K. Reddy, Haesun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.

Original languageEnglish
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages567-576
Number of pages10
ISBN (Electronic)9781450336642
DOIs
Publication statusPublished - 2015 Aug 10
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: 2015 Aug 102015 Aug 13

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Conference

Conference21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
CountryAustralia
CitySydney
Period15/8/1015/8/13

Fingerprint

Finance
Factorization

Keywords

  • Discriminative pattern mining
  • Nonnegative matrix factorization
  • Topic modeling

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Kim, H., Choo, J., Kim, J., Reddy, C. K., & Park, H. (2015). Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. In KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 567-576). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2015-August). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783338

Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. / Kim, Hannah; Choo, Jaegul; Kim, Jingu; Reddy, Chandan K.; Park, Haesun.

KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2015. p. 567-576 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2015-August).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, H, Choo, J, Kim, J, Reddy, CK & Park, H 2015, Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. in KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2015-August, Association for Computing Machinery, pp. 567-576, 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015, Sydney, Australia, 15/8/10. https://doi.org/10.1145/2783258.2783338
Kim H, Choo J, Kim J, Reddy CK, Park H. Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. In KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2015. p. 567-576. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2783258.2783338
Kim, Hannah ; Choo, Jaegul ; Kim, Jingu ; Reddy, Chandan K. ; Park, Haesun. / Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2015. pp. 567-576 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{e9dab0084bca4ef993a8f77b24d24963,
title = "Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization",
abstract = "Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.",
keywords = "Discriminative pattern mining, Nonnegative matrix factorization, Topic modeling",
author = "Hannah Kim and Jaegul Choo and Jingu Kim and Reddy, {Chandan K.} and Haesun Park",
year = "2015",
month = "8",
day = "10",
doi = "10.1145/2783258.2783338",
language = "English",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "567--576",
booktitle = "KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization

AU - Kim, Hannah

AU - Choo, Jaegul

AU - Kim, Jingu

AU - Reddy, Chandan K.

AU - Park, Haesun

PY - 2015/8/10

Y1 - 2015/8/10

N2 - Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.

AB - Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male-vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudodeflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing indepth analyses by clearly identifying common and discriminative topics among multiple document sets.

KW - Discriminative pattern mining

KW - Nonnegative matrix factorization

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84954150798&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954150798&partnerID=8YFLogxK

U2 - 10.1145/2783258.2783338

DO - 10.1145/2783258.2783338

M3 - Conference contribution

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 567

EP - 576

BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -