Design of audio-visual interface for aiding driver's voice commands in automotive environment

Kihyeon Kim, Changwon Jeon, Junho Park, Seokyeong Jeong, David K. Han, Hanseok Ko

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.

Original languageEnglish
Title of host publicationIn-Vehicle Corpus and Signal Processing for Driver Behavior
PublisherSpringer US
Pages211-219
Number of pages9
ISBN (Print)9780387795812
DOIs
Publication statusPublished - 2009 Dec 1

Fingerprint

Speech recognition
Hidden Markov models
Automobiles
Model structures
Decoding
Signal to noise ratio

Keywords

  • Active shape model
  • Audio-visual speech interface
  • Automatic speech recognition
  • Hybrid integration
  • Lip-reading
  • Mel-frequency cepstrum coefficients
  • Mouth model
  • Multistream features
  • SNR-dependent audio-visual information combination
  • Tied-mixture hidden Markov model

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Kim, K., Jeon, C., Park, J., Jeong, S., Han, D. K., & Ko, H. (2009). Design of audio-visual interface for aiding driver's voice commands in automotive environment. In In-Vehicle Corpus and Signal Processing for Driver Behavior (pp. 211-219). Springer US. https://doi.org/10.1007/978-0-387-79582-9_17

Design of audio-visual interface for aiding driver's voice commands in automotive environment. / Kim, Kihyeon; Jeon, Changwon; Park, Junho; Jeong, Seokyeong; Han, David K.; Ko, Hanseok.

In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer US, 2009. p. 211-219.

Research output: Chapter in Book/Report/Conference proceedingChapter

Kim, K, Jeon, C, Park, J, Jeong, S, Han, DK & Ko, H 2009, Design of audio-visual interface for aiding driver's voice commands in automotive environment. in In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer US, pp. 211-219. https://doi.org/10.1007/978-0-387-79582-9_17
Kim K, Jeon C, Park J, Jeong S, Han DK, Ko H. Design of audio-visual interface for aiding driver's voice commands in automotive environment. In In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer US. 2009. p. 211-219 https://doi.org/10.1007/978-0-387-79582-9_17
Kim, Kihyeon ; Jeon, Changwon ; Park, Junho ; Jeong, Seokyeong ; Han, David K. ; Ko, Hanseok. / Design of audio-visual interface for aiding driver's voice commands in automotive environment. In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer US, 2009. pp. 211-219
@inbook{87643cbcc75c47608421ae8255572d90,
title = "Design of audio-visual interface for aiding driver's voice commands in automotive environment",
abstract = "This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.",
keywords = "Active shape model, Audio-visual speech interface, Automatic speech recognition, Hybrid integration, Lip-reading, Mel-frequency cepstrum coefficients, Mouth model, Multistream features, SNR-dependent audio-visual information combination, Tied-mixture hidden Markov model",
author = "Kihyeon Kim and Changwon Jeon and Junho Park and Seokyeong Jeong and Han, {David K.} and Hanseok Ko",
year = "2009",
month = "12",
day = "1",
doi = "10.1007/978-0-387-79582-9_17",
language = "English",
isbn = "9780387795812",
pages = "211--219",
booktitle = "In-Vehicle Corpus and Signal Processing for Driver Behavior",
publisher = "Springer US",

}

TY - CHAP

T1 - Design of audio-visual interface for aiding driver's voice commands in automotive environment

AU - Kim, Kihyeon

AU - Jeon, Changwon

AU - Park, Junho

AU - Jeong, Seokyeong

AU - Han, David K.

AU - Ko, Hanseok

PY - 2009/12/1

Y1 - 2009/12/1

N2 - This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.

AB - This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.

KW - Active shape model

KW - Audio-visual speech interface

KW - Automatic speech recognition

KW - Hybrid integration

KW - Lip-reading

KW - Mel-frequency cepstrum coefficients

KW - Mouth model

KW - Multistream features

KW - SNR-dependent audio-visual information combination

KW - Tied-mixture hidden Markov model

UR - http://www.scopus.com/inward/record.url?scp=84892306514&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892306514&partnerID=8YFLogxK

U2 - 10.1007/978-0-387-79582-9_17

DO - 10.1007/978-0-387-79582-9_17

M3 - Chapter

AN - SCOPUS:84892306514

SN - 9780387795812

SP - 211

EP - 219

BT - In-Vehicle Corpus and Signal Processing for Driver Behavior

PB - Springer US

ER -