Effective acoustic model clustering via decision-tree with supervised learning

Junho Park, Hanseok Ko

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalSpeech Communication
Volume46
Issue number1
DOIs
Publication statusPublished - 2005 May 1

Fingerprint

Acoustic Model
Decision Trees
Supervised learning
Supervised Learning
Decision trees
Data-driven
Acoustics
Decision tree
acoustics
Cluster Analysis
Learning
Clustering
Speech Recognition
Continuous speech recognition
Clustering Methods
Speech recognition
Clustering algorithms
Learning algorithms
learning
Sparse Data

Keywords

  • Acoustic modeling
  • Decision-tree
  • Large vocabulary continuous speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Effective acoustic model clustering via decision-tree with supervised learning. / Park, Junho; Ko, Hanseok.

In: Speech Communication, Vol. 46, No. 1, 01.05.2005, p. 1-13.

Research output: Contribution to journalArticle

@article{32c3cf09aa9645118973a4d3a1049690,
title = "Effective acoustic model clustering via decision-tree with supervised learning",
abstract = "In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93{\%}.",
keywords = "Acoustic modeling, Decision-tree, Large vocabulary continuous speech recognition",
author = "Junho Park and Hanseok Ko",
year = "2005",
month = "5",
day = "1",
doi = "10.1016/j.specom.2004.12.003",
language = "English",
volume = "46",
pages = "1--13",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "1",

}

TY - JOUR

T1 - Effective acoustic model clustering via decision-tree with supervised learning

AU - Park, Junho

AU - Ko, Hanseok

PY - 2005/5/1

Y1 - 2005/5/1

N2 - In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.

AB - In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.

KW - Acoustic modeling

KW - Decision-tree

KW - Large vocabulary continuous speech recognition

UR - http://www.scopus.com/inward/record.url?scp=17444397713&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17444397713&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2004.12.003

DO - 10.1016/j.specom.2004.12.003

M3 - Article

AN - SCOPUS:17444397713

VL - 46

SP - 1

EP - 13

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 1

ER -