Construction of decision tree from data driven clustering

Junho Park, Hanseok Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In the acoustic modeling for large vocabulary speech recognition, context-dependent (CD) modeling is essential for realizing both improved recognition performance and rapid search. However, sparse data problem caused by huge number of CD models usually leads the estimated models unreliable. To cope with that, two major context-clustering methods, datadriven and rule-based, have been investigated vigorously. In this paper, we briefly review the two methods and develop a new clustering method based on ID3 decision tree learning algorithm that effectively captures the CD modeling. The proposed scheme essentially constructs a decision rule of preclustered triphones using ID3 algorithm. In particular, the datadriven method is used as a clustering algorithm while its result is used as the learning target of ID3 algorithm. The proposed scheme is shown effective over the database of low unknowncontext ratio in terms of recognition performance. For speakerindependent, task-independent continuous speech recognition task, the proposed method reduced percent accuracy WER by 1.16% comparing to that of the existing rule-based method alone.

Original languageEnglish
Title of host publication7th International Conference on Spoken Language Processing, ICSLP 2002
PublisherInternational Speech Communication Association
Pages2657-2660
Number of pages4
Publication statusPublished - 2002
Externally publishedYes
Event7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, United States
Duration: 2002 Sept 162002 Sept 20

Other

Other7th International Conference on Spoken Language Processing, ICSLP 2002
Country/TerritoryUnited States
CityDenver
Period02/9/1602/9/20

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Construction of decision tree from data driven clustering'. Together they form a unique fingerprint.

Cite this