Decision tree based clustering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Adecision tree can be used not only as a classifier but also as a clustering method. One of such applications can be found in automatic speech recognition using hidden Markov models (HMMs). Due to the insufficient amount of training data, similar states of triphone HMMs are grouped together using a decision tree to share a common probability distribution. At the same time, in order to predict the statistics of unseen triphones, the decision tree is used as a classifier as well. In this paper, we study several cluster split criteria in decision tree building algorithms for the case where the instances to be clustered are probability density functions. Especially, when Gaussian probability distributions are to be clustered, we have found that the Bhattacharyya distance based measures are more consistent than the conventional log likelihood based measure.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages487-492
Number of pages6
Volume2412
ISBN (Print)9783540440253
Publication statusPublished - 2002
Event3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002 - Manchester, United Kingdom
Duration: 2002 Aug 122002 Aug 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2412
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002
CountryUnited Kingdom
CityManchester
Period02/8/1202/8/14

Fingerprint

Decision trees
Decision tree
Clustering
Hidden Markov models
Probability distributions
Markov Model
Classifiers
Probability Distribution
Classifier
Automatic Speech Recognition
Speech recognition
Clustering Methods
Probability density function
Gaussian distribution
Likelihood
Statistics
Predict

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Yook, D. (2002). Decision tree based clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2412, pp. 487-492). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2412). Springer Verlag.

Decision tree based clustering. / Yook, Dongsuk.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412 Springer Verlag, 2002. p. 487-492 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2412).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yook, D 2002, Decision tree based clustering. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2412, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2412, Springer Verlag, pp. 487-492, 3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002, Manchester, United Kingdom, 02/8/12.
Yook D. Decision tree based clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412. Springer Verlag. 2002. p. 487-492. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Yook, Dongsuk. / Decision tree based clustering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412 Springer Verlag, 2002. pp. 487-492 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{7843f568eb3c4acb9f71f611b4ffdb9b,
title = "Decision tree based clustering",
abstract = "Adecision tree can be used not only as a classifier but also as a clustering method. One of such applications can be found in automatic speech recognition using hidden Markov models (HMMs). Due to the insufficient amount of training data, similar states of triphone HMMs are grouped together using a decision tree to share a common probability distribution. At the same time, in order to predict the statistics of unseen triphones, the decision tree is used as a classifier as well. In this paper, we study several cluster split criteria in decision tree building algorithms for the case where the instances to be clustered are probability density functions. Especially, when Gaussian probability distributions are to be clustered, we have found that the Bhattacharyya distance based measures are more consistent than the conventional log likelihood based measure.",
author = "Dongsuk Yook",
year = "2002",
language = "English",
isbn = "9783540440253",
volume = "2412",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "487--492",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Decision tree based clustering

AU - Yook, Dongsuk

PY - 2002

Y1 - 2002

N2 - Adecision tree can be used not only as a classifier but also as a clustering method. One of such applications can be found in automatic speech recognition using hidden Markov models (HMMs). Due to the insufficient amount of training data, similar states of triphone HMMs are grouped together using a decision tree to share a common probability distribution. At the same time, in order to predict the statistics of unseen triphones, the decision tree is used as a classifier as well. In this paper, we study several cluster split criteria in decision tree building algorithms for the case where the instances to be clustered are probability density functions. Especially, when Gaussian probability distributions are to be clustered, we have found that the Bhattacharyya distance based measures are more consistent than the conventional log likelihood based measure.

AB - Adecision tree can be used not only as a classifier but also as a clustering method. One of such applications can be found in automatic speech recognition using hidden Markov models (HMMs). Due to the insufficient amount of training data, similar states of triphone HMMs are grouped together using a decision tree to share a common probability distribution. At the same time, in order to predict the statistics of unseen triphones, the decision tree is used as a classifier as well. In this paper, we study several cluster split criteria in decision tree building algorithms for the case where the instances to be clustered are probability density functions. Especially, when Gaussian probability distributions are to be clustered, we have found that the Bhattacharyya distance based measures are more consistent than the conventional log likelihood based measure.

UR - http://www.scopus.com/inward/record.url?scp=84947976088&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947976088&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540440253

VL - 2412

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 487

EP - 492

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -