BL-LDA

Bringing bigram to supervised topic model

Youngsun Park, Md Hijbul Alam, Woo Jong Ryu, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.

Original languageEnglish
Title of host publicationProceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages83-88
Number of pages6
ISBN (Print)9781467397957
DOIs
Publication statusPublished - 2016 Mar 2
EventInternational Conference on Computational Science and Computational Intelligence, CSCI 2015 - Las Vegas, United States
Duration: 2015 Dec 72015 Dec 9

Other

OtherInternational Conference on Computational Science and Computational Intelligence, CSCI 2015
CountryUnited States
CityLas Vegas
Period15/12/715/12/9

Fingerprint

Labels
Experiments
Statistical Models

Keywords

  • Data Analysis
  • Data Mining
  • Text Classification
  • Topic Modeling

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing

Cite this

Park, Y., Alam, M. H., Ryu, W. J., & Lee, S-G. (2016). BL-LDA: Bringing bigram to supervised topic model. In Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015 (pp. 83-88). [7424068] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CSCI.2015.146

BL-LDA : Bringing bigram to supervised topic model. / Park, Youngsun; Alam, Md Hijbul; Ryu, Woo Jong; Lee, Sang-Geun.

Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 83-88 7424068.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Park, Y, Alam, MH, Ryu, WJ & Lee, S-G 2016, BL-LDA: Bringing bigram to supervised topic model. in Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015., 7424068, Institute of Electrical and Electronics Engineers Inc., pp. 83-88, International Conference on Computational Science and Computational Intelligence, CSCI 2015, Las Vegas, United States, 15/12/7. https://doi.org/10.1109/CSCI.2015.146
Park Y, Alam MH, Ryu WJ, Lee S-G. BL-LDA: Bringing bigram to supervised topic model. In Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015. Institute of Electrical and Electronics Engineers Inc. 2016. p. 83-88. 7424068 https://doi.org/10.1109/CSCI.2015.146
Park, Youngsun ; Alam, Md Hijbul ; Ryu, Woo Jong ; Lee, Sang-Geun. / BL-LDA : Bringing bigram to supervised topic model. Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 83-88
@inproceedings{d4e74a90b9d849fd9513bc1967de0588,
title = "BL-LDA: Bringing bigram to supervised topic model",
abstract = "With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.",
keywords = "Data Analysis, Data Mining, Text Classification, Topic Modeling",
author = "Youngsun Park and Alam, {Md Hijbul} and Ryu, {Woo Jong} and Sang-Geun Lee",
year = "2016",
month = "3",
day = "2",
doi = "10.1109/CSCI.2015.146",
language = "English",
isbn = "9781467397957",
pages = "83--88",
booktitle = "Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - BL-LDA

T2 - Bringing bigram to supervised topic model

AU - Park, Youngsun

AU - Alam, Md Hijbul

AU - Ryu, Woo Jong

AU - Lee, Sang-Geun

PY - 2016/3/2

Y1 - 2016/3/2

N2 - With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.

AB - With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.

KW - Data Analysis

KW - Data Mining

KW - Text Classification

KW - Topic Modeling

UR - http://www.scopus.com/inward/record.url?scp=84964476038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964476038&partnerID=8YFLogxK

U2 - 10.1109/CSCI.2015.146

DO - 10.1109/CSCI.2015.146

M3 - Conference contribution

SN - 9781467397957

SP - 83

EP - 88

BT - Proceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -