Design of audio-visual interface for aiding driver's voice commands in automotive environment

Kihyeon Kim, Changwon Jeon, Junho Park, Seokyeong Jeong, David K. Han, Hanseok Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.

Original languageEnglish
Title of host publicationIn-Vehicle Corpus and Signal Processing for Driver Behavior
Pages211-219
Number of pages9
DOIs
Publication statusPublished - 2009
Event3rd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2007 - Istanbul, Turkey
Duration: 2007 Jun 12007 Jun 1

Publication series

NameIn-Vehicle Corpus and Signal Processing for Driver Behavior

Other

Other3rd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2007
CountryTurkey
CityIstanbul
Period07/6/107/6/1

Keywords

  • Active shape model
  • Audio-visual speech interface
  • Automatic speech recognition
  • Hybrid integration
  • Lip-reading
  • Mel-frequency cepstrum coefficients
  • Mouth model
  • Multistream features
  • SNR-dependent audio-visual information combination
  • Tied-mixture hidden Markov model

ASJC Scopus subject areas

  • Radiation

Fingerprint Dive into the research topics of 'Design of audio-visual interface for aiding driver's voice commands in automotive environment'. Together they form a unique fingerprint.

Cite this