Recognizing articulatory gestures from speech for robust speech recognition

Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

Original languageEnglish
Pages (from-to)2270-2287
Number of pages18
JournalJournal of the Acoustical Society of America
Volume131
Issue number3
DOIs
Publication statusPublished - 2012 Mar 1
Externally publishedYes

Fingerprint

speech recognition
annotations
time functions
digits
Articulatory Gestures
Speech Recognition
Gesture
waveforms
activation
acoustics
evaluation
Word Recognition

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Cite this

Recognizing articulatory gestures from speech for robust speech recognition. / Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis.

In: Journal of the Acoustical Society of America, Vol. 131, No. 3, 01.03.2012, p. 2270-2287.

Research output: Contribution to journalArticle

Mitra, Vikramjit ; Nam, Hosung ; Espy-Wilson, Carol ; Saltzman, Elliot ; Goldstein, Louis. / Recognizing articulatory gestures from speech for robust speech recognition. In: Journal of the Acoustical Society of America. 2012 ; Vol. 131, No. 3. pp. 2270-2287.
@article{f7709642e90442128197d5ee8c07938a,
title = "Recognizing articulatory gestures from speech for robust speech recognition",
abstract = "Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.",
author = "Vikramjit Mitra and Hosung Nam and Carol Espy-Wilson and Elliot Saltzman and Louis Goldstein",
year = "2012",
month = "3",
day = "1",
doi = "10.1121/1.3682038",
language = "English",
volume = "131",
pages = "2270--2287",
journal = "Journal of the Acoustical Society of America",
issn = "0001-4966",
publisher = "Acoustical Society of America",
number = "3",

}

TY - JOUR

T1 - Recognizing articulatory gestures from speech for robust speech recognition

AU - Mitra, Vikramjit

AU - Nam, Hosung

AU - Espy-Wilson, Carol

AU - Saltzman, Elliot

AU - Goldstein, Louis

PY - 2012/3/1

Y1 - 2012/3/1

N2 - Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

AB - Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

UR - http://www.scopus.com/inward/record.url?scp=84858976368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858976368&partnerID=8YFLogxK

U2 - 10.1121/1.3682038

DO - 10.1121/1.3682038

M3 - Article

C2 - 22423722

AN - SCOPUS:84858976368

VL - 131

SP - 2270

EP - 2287

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

SN - 0001-4966

IS - 3

ER -