Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling

Junho Park, Hanseok Ko

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM's for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.

Original languageEnglish
Pages (from-to)737-745
Number of pages9
JournalSpeech Communication
Volume48
Issue number6
DOIs
Publication statusPublished - 2006 Jun 1

Fingerprint

Confusion
Acoustic Model
Speech Recognition
Speech recognition
Acoustics
acoustics
Digit
Mixture Modeling
performance
Embedded Systems
Model
Performance Evaluation
Degradation
optimization model
Minimise
Resources
Aptitude
Handling (Psychology)
Recognition (Psychology)
Optimization

Keywords

  • Compact acoustic modeling
  • Embedded speech recognition system
  • Tied-mixture HMM

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling. / Park, Junho; Ko, Hanseok.

In: Speech Communication, Vol. 48, No. 6, 01.06.2006, p. 737-745.

Research output: Contribution to journalArticle

@article{ebcae5d52e5542af8abcb28e7a068c40,
title = "Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling",
abstract = "An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM's for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.",
keywords = "Compact acoustic modeling, Embedded speech recognition system, Tied-mixture HMM",
author = "Junho Park and Hanseok Ko",
year = "2006",
month = "6",
day = "1",
doi = "10.1016/j.specom.2005.10.001",
language = "English",
volume = "48",
pages = "737--745",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "6",

}

TY - JOUR

T1 - Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling

AU - Park, Junho

AU - Ko, Hanseok

PY - 2006/6/1

Y1 - 2006/6/1

N2 - An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM's for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.

AB - An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM's for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.

KW - Compact acoustic modeling

KW - Embedded speech recognition system

KW - Tied-mixture HMM

UR - http://www.scopus.com/inward/record.url?scp=33646259972&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646259972&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2005.10.001

DO - 10.1016/j.specom.2005.10.001

M3 - Article

VL - 48

SP - 737

EP - 745

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 6

ER -