A procedure for estimating gestural scores from speech acoustics

Hosung Nam, Vikramjit Mitra, Mark Tiede, Mark Hasegawa-Johnson, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.

Original languageEnglish
Pages (from-to)3980-3989
Number of pages10
JournalJournal of the Acoustical Society of America
Volume132
Issue number6
DOIs
Publication statusPublished - 2012 Dec 1

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint Dive into the research topics of 'A procedure for estimating gestural scores from speech acoustics'. Together they form a unique fingerprint.

  • Cite this

    Nam, H., Mitra, V., Tiede, M., Hasegawa-Johnson, M., Espy-Wilson, C., Saltzman, E., & Goldstein, L. (2012). A procedure for estimating gestural scores from speech acoustics. Journal of the Acoustical Society of America, 132(6), 3980-3989. https://doi.org/10.1121/1.4763545