A task-dynamic toolkit for modeling the effects of prosodic structure on articulation

Elliot Saltzman, Hosung Nam, Jelena Krivokapic, Louis Goldstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Citations (Scopus)

Abstract

The original task-dynamic model of speech production incorporated the theoretical tenets of Articulatory Phonology and provided a dynamics of inter-articulator coordination for single and co-produced constriction gestures, given a gestural score that specifies a time-dependent vector of gestural activations for a given utterance. More recently, the model has been significantly extended to provide a framework for investigating the higher order dynamics of prosodic phrasing, syllable structure, lexical stress, and the prominence (accentual) properties associated with higher level prosodic constituents (e.g., foot, word, phrase, sentence). There are two new components in the model. The first is an ensemble of gestural planning oscillators that defines a dynamics of gestural score formation in that, once the ensemble reaches an entrained steady-state of relative phasing, the waveform of each oscillator is used to specify the activation function of that oscillator's associated constriction gesture and to trigger, thereby, the onset of the gesture. The second component is a set of modulation gestures (μ-gestures) that, rather than activating constriction formation and release gestures in the vocal tract, serve to modulate the temporal and spatial properties of all concurrently active constriction gestures. Modulation gestures are of two types: temporal modulation gestures (μT-gestures) that alter the rate of utterance timeflow by smoothly changing all frequency parameters of the planning oscillator ensemble; and spatial modulation gestures (μS -gestures) that spatially strengthen or reduce the motions of constriction gestures by smoothly changing the spatial target parameters of these constriction gestures. Key to the representation of prosodic phrasing has been use of clockslowing temporal modulation gestures (called prosodic gestures [π-gestures] in previous work) that are locally active in the region of phrasal boundaries, and that slow the rate of utterance timeflow in direct proportion to the strength of the associated boundary. Central to the representation of syllable structure is the use of a coupling graph that defines the existence and strength of coupling in the network of gestural planning oscillators, and shapes the manner in which gestures are coordinated. Concepts from graph theory have been crucial to understanding how hypothesized differences among coupling graphs have correctly predicted empirically demonstrated intra-syllabic differences between onsets and codas in both the mean values and variabilities of C-C, C-V, and V-C timing patterns. In this paper, we describe a set of recent developments to our task-dynamic 'toolkit' (planning oscillator ensemble and temporal modulation gestures) and how they have been used to interpret and simulate experimental data on the interactions of stress and prominence in shaping the "prosodically driven phonetic detail" [14] of speech.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Speech Prosody, SP 2008
PublisherInternational Speech Communications Association
Pages175-184
Number of pages10
ISBN (Print)9780616220030
Publication statusPublished - 2008 Jan 1
Externally publishedYes
Event4th International Conference on Speech Prosody 2008, SP 2008 - Campinas, Brazil
Duration: 2008 May 62008 May 9

Publication series

NameProceedings of the 4th International Conference on Speech Prosody, SP 2008

Conference

Conference4th International Conference on Speech Prosody 2008, SP 2008
CountryBrazil
CityCampinas
Period08/5/608/5/9

Fingerprint

Modulation
Planning
Chemical activation
Speech analysis
Graph theory
Toolkit
Prosodic Structure
Articulation
Gesture
Modeling
Dynamic models
Ensemble

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Software
  • Mechanical Engineering

Cite this

Saltzman, E., Nam, H., Krivokapic, J., & Goldstein, L. (2008). A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. In Proceedings of the 4th International Conference on Speech Prosody, SP 2008 (pp. 175-184). (Proceedings of the 4th International Conference on Speech Prosody, SP 2008). International Speech Communications Association.

A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. / Saltzman, Elliot; Nam, Hosung; Krivokapic, Jelena; Goldstein, Louis.

Proceedings of the 4th International Conference on Speech Prosody, SP 2008. International Speech Communications Association, 2008. p. 175-184 (Proceedings of the 4th International Conference on Speech Prosody, SP 2008).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Saltzman, E, Nam, H, Krivokapic, J & Goldstein, L 2008, A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. in Proceedings of the 4th International Conference on Speech Prosody, SP 2008. Proceedings of the 4th International Conference on Speech Prosody, SP 2008, International Speech Communications Association, pp. 175-184, 4th International Conference on Speech Prosody 2008, SP 2008, Campinas, Brazil, 08/5/6.
Saltzman E, Nam H, Krivokapic J, Goldstein L. A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. In Proceedings of the 4th International Conference on Speech Prosody, SP 2008. International Speech Communications Association. 2008. p. 175-184. (Proceedings of the 4th International Conference on Speech Prosody, SP 2008).
Saltzman, Elliot ; Nam, Hosung ; Krivokapic, Jelena ; Goldstein, Louis. / A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. Proceedings of the 4th International Conference on Speech Prosody, SP 2008. International Speech Communications Association, 2008. pp. 175-184 (Proceedings of the 4th International Conference on Speech Prosody, SP 2008).
@inproceedings{f1a0ee4517444f66a89584f0cb1a3ad0,
title = "A task-dynamic toolkit for modeling the effects of prosodic structure on articulation",
abstract = "The original task-dynamic model of speech production incorporated the theoretical tenets of Articulatory Phonology and provided a dynamics of inter-articulator coordination for single and co-produced constriction gestures, given a gestural score that specifies a time-dependent vector of gestural activations for a given utterance. More recently, the model has been significantly extended to provide a framework for investigating the higher order dynamics of prosodic phrasing, syllable structure, lexical stress, and the prominence (accentual) properties associated with higher level prosodic constituents (e.g., foot, word, phrase, sentence). There are two new components in the model. The first is an ensemble of gestural planning oscillators that defines a dynamics of gestural score formation in that, once the ensemble reaches an entrained steady-state of relative phasing, the waveform of each oscillator is used to specify the activation function of that oscillator's associated constriction gesture and to trigger, thereby, the onset of the gesture. The second component is a set of modulation gestures (μ-gestures) that, rather than activating constriction formation and release gestures in the vocal tract, serve to modulate the temporal and spatial properties of all concurrently active constriction gestures. Modulation gestures are of two types: temporal modulation gestures (μT-gestures) that alter the rate of utterance timeflow by smoothly changing all frequency parameters of the planning oscillator ensemble; and spatial modulation gestures (μS -gestures) that spatially strengthen or reduce the motions of constriction gestures by smoothly changing the spatial target parameters of these constriction gestures. Key to the representation of prosodic phrasing has been use of clockslowing temporal modulation gestures (called prosodic gestures [π-gestures] in previous work) that are locally active in the region of phrasal boundaries, and that slow the rate of utterance timeflow in direct proportion to the strength of the associated boundary. Central to the representation of syllable structure is the use of a coupling graph that defines the existence and strength of coupling in the network of gestural planning oscillators, and shapes the manner in which gestures are coordinated. Concepts from graph theory have been crucial to understanding how hypothesized differences among coupling graphs have correctly predicted empirically demonstrated intra-syllabic differences between onsets and codas in both the mean values and variabilities of C-C, C-V, and V-C timing patterns. In this paper, we describe a set of recent developments to our task-dynamic 'toolkit' (planning oscillator ensemble and temporal modulation gestures) and how they have been used to interpret and simulate experimental data on the interactions of stress and prominence in shaping the {"}prosodically driven phonetic detail{"} [14] of speech.",
author = "Elliot Saltzman and Hosung Nam and Jelena Krivokapic and Louis Goldstein",
year = "2008",
month = "1",
day = "1",
language = "English",
isbn = "9780616220030",
series = "Proceedings of the 4th International Conference on Speech Prosody, SP 2008",
publisher = "International Speech Communications Association",
pages = "175--184",
booktitle = "Proceedings of the 4th International Conference on Speech Prosody, SP 2008",

}

TY - GEN

T1 - A task-dynamic toolkit for modeling the effects of prosodic structure on articulation

AU - Saltzman, Elliot

AU - Nam, Hosung

AU - Krivokapic, Jelena

AU - Goldstein, Louis

PY - 2008/1/1

Y1 - 2008/1/1

N2 - The original task-dynamic model of speech production incorporated the theoretical tenets of Articulatory Phonology and provided a dynamics of inter-articulator coordination for single and co-produced constriction gestures, given a gestural score that specifies a time-dependent vector of gestural activations for a given utterance. More recently, the model has been significantly extended to provide a framework for investigating the higher order dynamics of prosodic phrasing, syllable structure, lexical stress, and the prominence (accentual) properties associated with higher level prosodic constituents (e.g., foot, word, phrase, sentence). There are two new components in the model. The first is an ensemble of gestural planning oscillators that defines a dynamics of gestural score formation in that, once the ensemble reaches an entrained steady-state of relative phasing, the waveform of each oscillator is used to specify the activation function of that oscillator's associated constriction gesture and to trigger, thereby, the onset of the gesture. The second component is a set of modulation gestures (μ-gestures) that, rather than activating constriction formation and release gestures in the vocal tract, serve to modulate the temporal and spatial properties of all concurrently active constriction gestures. Modulation gestures are of two types: temporal modulation gestures (μT-gestures) that alter the rate of utterance timeflow by smoothly changing all frequency parameters of the planning oscillator ensemble; and spatial modulation gestures (μS -gestures) that spatially strengthen or reduce the motions of constriction gestures by smoothly changing the spatial target parameters of these constriction gestures. Key to the representation of prosodic phrasing has been use of clockslowing temporal modulation gestures (called prosodic gestures [π-gestures] in previous work) that are locally active in the region of phrasal boundaries, and that slow the rate of utterance timeflow in direct proportion to the strength of the associated boundary. Central to the representation of syllable structure is the use of a coupling graph that defines the existence and strength of coupling in the network of gestural planning oscillators, and shapes the manner in which gestures are coordinated. Concepts from graph theory have been crucial to understanding how hypothesized differences among coupling graphs have correctly predicted empirically demonstrated intra-syllabic differences between onsets and codas in both the mean values and variabilities of C-C, C-V, and V-C timing patterns. In this paper, we describe a set of recent developments to our task-dynamic 'toolkit' (planning oscillator ensemble and temporal modulation gestures) and how they have been used to interpret and simulate experimental data on the interactions of stress and prominence in shaping the "prosodically driven phonetic detail" [14] of speech.

AB - The original task-dynamic model of speech production incorporated the theoretical tenets of Articulatory Phonology and provided a dynamics of inter-articulator coordination for single and co-produced constriction gestures, given a gestural score that specifies a time-dependent vector of gestural activations for a given utterance. More recently, the model has been significantly extended to provide a framework for investigating the higher order dynamics of prosodic phrasing, syllable structure, lexical stress, and the prominence (accentual) properties associated with higher level prosodic constituents (e.g., foot, word, phrase, sentence). There are two new components in the model. The first is an ensemble of gestural planning oscillators that defines a dynamics of gestural score formation in that, once the ensemble reaches an entrained steady-state of relative phasing, the waveform of each oscillator is used to specify the activation function of that oscillator's associated constriction gesture and to trigger, thereby, the onset of the gesture. The second component is a set of modulation gestures (μ-gestures) that, rather than activating constriction formation and release gestures in the vocal tract, serve to modulate the temporal and spatial properties of all concurrently active constriction gestures. Modulation gestures are of two types: temporal modulation gestures (μT-gestures) that alter the rate of utterance timeflow by smoothly changing all frequency parameters of the planning oscillator ensemble; and spatial modulation gestures (μS -gestures) that spatially strengthen or reduce the motions of constriction gestures by smoothly changing the spatial target parameters of these constriction gestures. Key to the representation of prosodic phrasing has been use of clockslowing temporal modulation gestures (called prosodic gestures [π-gestures] in previous work) that are locally active in the region of phrasal boundaries, and that slow the rate of utterance timeflow in direct proportion to the strength of the associated boundary. Central to the representation of syllable structure is the use of a coupling graph that defines the existence and strength of coupling in the network of gestural planning oscillators, and shapes the manner in which gestures are coordinated. Concepts from graph theory have been crucial to understanding how hypothesized differences among coupling graphs have correctly predicted empirically demonstrated intra-syllabic differences between onsets and codas in both the mean values and variabilities of C-C, C-V, and V-C timing patterns. In this paper, we describe a set of recent developments to our task-dynamic 'toolkit' (planning oscillator ensemble and temporal modulation gestures) and how they have been used to interpret and simulate experimental data on the interactions of stress and prominence in shaping the "prosodically driven phonetic detail" [14] of speech.

UR - http://www.scopus.com/inward/record.url?scp=84902687217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902687217&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84902687217

SN - 9780616220030

T3 - Proceedings of the 4th International Conference on Speech Prosody, SP 2008

SP - 175

EP - 184

BT - Proceedings of the 4th International Conference on Speech Prosody, SP 2008

PB - International Speech Communications Association

ER -