Topic document model approach for naive Bayes text classification

Sang Bum Kim, Hae-Chang Rim, Jin Dong Kim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.

Original languageEnglish
Pages (from-to)1091-1094
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE88-D
Issue number5
DOIs
Publication statusPublished - 2005 Sep 9

Fingerprint

Parameter estimation
Experiments

Keywords

  • Naive Bayes
  • Text classification

ASJC Scopus subject areas

  • Information Systems
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Topic document model approach for naive Bayes text classification. / Kim, Sang Bum; Rim, Hae-Chang; Kim, Jin Dong.

In: IEICE Transactions on Information and Systems, Vol. E88-D, No. 5, 09.09.2005, p. 1091-1094.

Research output: Contribution to journalArticle

@article{b8ffa139abbe479d85660debb80fe9b1,
title = "Topic document model approach for naive Bayes text classification",
abstract = "The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.",
keywords = "Naive Bayes, Text classification",
author = "Kim, {Sang Bum} and Hae-Chang Rim and Kim, {Jin Dong}",
year = "2005",
month = "9",
day = "9",
doi = "10.1093/ietisy/e88-d.5.1091",
language = "English",
volume = "E88-D",
pages = "1091--1094",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "5",

}

TY - JOUR

T1 - Topic document model approach for naive Bayes text classification

AU - Kim, Sang Bum

AU - Rim, Hae-Chang

AU - Kim, Jin Dong

PY - 2005/9/9

Y1 - 2005/9/9

N2 - The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.

AB - The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.

KW - Naive Bayes

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=24144440107&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24144440107&partnerID=8YFLogxK

U2 - 10.1093/ietisy/e88-d.5.1091

DO - 10.1093/ietisy/e88-d.5.1091

M3 - Article

AN - SCOPUS:24144440107

VL - E88-D

SP - 1091

EP - 1094

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 5

ER -