TY - GEN
T1 - Effective methods for improving naive Bayes text classifiers
AU - Kim, Sang Bum
AU - Rim, Hae-Chang
AU - Yook, Dongsuk
AU - Lim, Heui Seok
PY - 2002
Y1 - 2002
N2 - Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.
AB - Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.
UR - http://www.scopus.com/inward/record.url?scp=84947928349&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947928349&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84947928349
SN - 3540440380
SN - 9783540440383
VL - 2417
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 414
EP - 423
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PB - Springer Verlag
T2 - 7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002
Y2 - 18 August 2002 through 22 August 2002
ER -