TY - GEN
T1 - Robust speech recognition using articulatory gestures in a dynamic Bayesian network framework
AU - Mitra, Vikramjit
AU - Nam, Hosung
AU - Espy-Wilson, Carol Y.
PY - 2011
Y1 - 2011
N2 - Articulatory Phonology models speech as spatio-temporal constellation of constricting events (e.g. raising tongue tip, narrowing lips etc.), known as articulatory gestures. These gestures are associated with distinct organs (lips, tongue tip, tongue body, velum and glottis) along the vocal tract. In this paper we present a Dynamic Bayesian Network based speech recognition architecture that models the articulatory gestures as hidden variables and uses them for speech recognition. Using the proposed architecture we performed: (a) word recognition experiments on the noisy data of Aurora-2 and (b) phone recognition experiments on the University of Wisconsin X-ray microbeam database. Our results indicate that the use of gestural information helps to improve the performance of the recognition system compared to the system using acoustic information only.
AB - Articulatory Phonology models speech as spatio-temporal constellation of constricting events (e.g. raising tongue tip, narrowing lips etc.), known as articulatory gestures. These gestures are associated with distinct organs (lips, tongue tip, tongue body, velum and glottis) along the vocal tract. In this paper we present a Dynamic Bayesian Network based speech recognition architecture that models the articulatory gestures as hidden variables and uses them for speech recognition. Using the proposed architecture we performed: (a) word recognition experiments on the noisy data of Aurora-2 and (b) phone recognition experiments on the University of Wisconsin X-ray microbeam database. Our results indicate that the use of gestural information helps to improve the performance of the recognition system compared to the system using acoustic information only.
UR - http://www.scopus.com/inward/record.url?scp=84858964876&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2011.6163918
DO - 10.1109/ASRU.2011.6163918
M3 - Conference contribution
AN - SCOPUS:84858964876
SN - 9781467303675
T3 - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
SP - 131
EP - 136
BT - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
T2 - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011
Y2 - 11 December 2011 through 15 December 2011
ER -