A text classification method based on latent topics

Yanshan Wang, In Chan Choi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Latent Dirichlet Allocation (LDA) is a generative model, which exhibits superiority over other topic modelling algorithms on latent topics of text data. Indexing by LDA is a new method in the context of LDA to provide a new definition of document probability vectors that can be applied as feature vectors. In this paper, we propose a joint process of text classification that combines DBSCAN, indexing with LDA and Support Vector Machine (SVM). DBSCAN algorithm is applied as a pre-processing for LDA to determine the number of topics, and then LDA document indexing features are employed for text classifier SVM.

Original languageEnglish
Title of host publicationICORES 2012 - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems
Pages212-214
Number of pages3
Publication statusPublished - 2012
Event1st International Conference on Operations Research and Enterprise Systems, ICORES 2012 - Vilamoura, Algarve, Portugal
Duration: 2012 Feb 42012 Feb 6

Publication series

NameICORES 2012 - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems

Other

Other1st International Conference on Operations Research and Enterprise Systems, ICORES 2012
Country/TerritoryPortugal
CityVilamoura, Algarve
Period12/2/412/2/6

Keywords

  • Indexing by LDA
  • Latent topic
  • Text classification

ASJC Scopus subject areas

  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'A text classification method based on latent topics'. Together they form a unique fingerprint.

Cite this