Effective methods for improving naive Bayes text classifiers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Citations (Scopus)

Abstract

Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages414-423
Number of pages10
Volume2417
ISBN (Print)3540440380, 9783540440383
Publication statusPublished - 2002
Event7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 - Tokyo, Japan
Duration: 2002 Aug 182002 Aug 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2417
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002
CountryJapan
CityTokyo
Period02/8/1802/8/22

Fingerprint

Naive Bayes
Classifiers
Classifier
Text Classification
Mutual Information
Parameter estimation
Normalization
Parameter Estimation
Simplicity
Model-based
Text
Evaluate

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Kim, S. B., Rim, H-C., Yook, D., & Lim, H. S. (2002). Effective methods for improving naive Bayes text classifiers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2417, pp. 414-423). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2417). Springer Verlag.

Effective methods for improving naive Bayes text classifiers. / Kim, Sang Bum; Rim, Hae-Chang; Yook, Dongsuk; Lim, Heui Seok.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2417 Springer Verlag, 2002. p. 414-423 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2417).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, SB, Rim, H-C, Yook, D & Lim, HS 2002, Effective methods for improving naive Bayes text classifiers. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2417, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2417, Springer Verlag, pp. 414-423, 7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002, Tokyo, Japan, 02/8/18.
Kim SB, Rim H-C, Yook D, Lim HS. Effective methods for improving naive Bayes text classifiers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2417. Springer Verlag. 2002. p. 414-423. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Kim, Sang Bum ; Rim, Hae-Chang ; Yook, Dongsuk ; Lim, Heui Seok. / Effective methods for improving naive Bayes text classifiers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2417 Springer Verlag, 2002. pp. 414-423 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{db4ec0c062f44e7585c3f7ecbbcab839,
title = "Effective methods for improving naive Bayes text classifiers",
abstract = "Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.",
author = "Kim, {Sang Bum} and Hae-Chang Rim and Dongsuk Yook and Lim, {Heui Seok}",
year = "2002",
language = "English",
isbn = "3540440380",
volume = "2417",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "414--423",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Effective methods for improving naive Bayes text classifiers

AU - Kim, Sang Bum

AU - Rim, Hae-Chang

AU - Yook, Dongsuk

AU - Lim, Heui Seok

PY - 2002

Y1 - 2002

N2 - Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

AB - Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

UR - http://www.scopus.com/inward/record.url?scp=84947928349&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947928349&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540440380

SN - 9783540440383

VL - 2417

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 414

EP - 423

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -