Robust speech recognition using articulatory gestures in a dynamic Bayesian network framework

Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Articulatory Phonology models speech as spatio-temporal constellation of constricting events (e.g. raising tongue tip, narrowing lips etc.), known as articulatory gestures. These gestures are associated with distinct organs (lips, tongue tip, tongue body, velum and glottis) along the vocal tract. In this paper we present a Dynamic Bayesian Network based speech recognition architecture that models the articulatory gestures as hidden variables and uses them for speech recognition. Using the proposed architecture we performed: (a) word recognition experiments on the noisy data of Aurora-2 and (b) phone recognition experiments on the University of Wisconsin X-ray microbeam database. Our results indicate that the use of gestural information helps to improve the performance of the recognition system compared to the system using acoustic information only.

Original languageEnglish
Title of host publication2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
Pages131-136
Number of pages6
DOIs
Publication statusPublished - 2011
Event2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011 - Waikoloa, HI, United States
Duration: 2011 Dec 112011 Dec 15

Publication series

Name2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings

Conference

Conference2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011
CountryUnited States
CityWaikoloa, HI
Period11/12/1111/12/15

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'Robust speech recognition using articulatory gestures in a dynamic Bayesian network framework'. Together they form a unique fingerprint.

  • Cite this

    Mitra, V., Nam, H., & Espy-Wilson, C. Y. (2011). Robust speech recognition using articulatory gestures in a dynamic Bayesian network framework. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings (pp. 131-136). [6163918] (2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings). https://doi.org/10.1109/ASRU.2011.6163918