Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-sync

Junho Park, Hanseok Ko

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.

Original languageEnglish
Article number4668497
Pages (from-to)1299-1306
Number of pages8
JournalIEEE Transactions on Multimedia
Volume10
Issue number7
DOIs
Publication statusPublished - 2008 Nov 1

Fingerprint

Hidden Markov models

Keywords

  • Head-body-tail HMM
  • Phoneme recognition
  • Real-time lip-sync

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Media Technology
  • Computer Science Applications

Cite this

Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-sync. / Park, Junho; Ko, Hanseok.

In: IEEE Transactions on Multimedia, Vol. 10, No. 7, 4668497, 01.11.2008, p. 1299-1306.

Research output: Contribution to journalArticle

@article{04f7eea7feb84fd68f2903cf2f3e8447,
title = "Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-sync",
abstract = "This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.",
keywords = "Head-body-tail HMM, Phoneme recognition, Real-time lip-sync",
author = "Junho Park and Hanseok Ko",
year = "2008",
month = "11",
day = "1",
doi = "10.1109/TMM.2008.2004908",
language = "English",
volume = "10",
pages = "1299--1306",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "7",

}

TY - JOUR

T1 - Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-sync

AU - Park, Junho

AU - Ko, Hanseok

PY - 2008/11/1

Y1 - 2008/11/1

N2 - This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.

AB - This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.

KW - Head-body-tail HMM

KW - Phoneme recognition

KW - Real-time lip-sync

UR - http://www.scopus.com/inward/record.url?scp=56549088313&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56549088313&partnerID=8YFLogxK

U2 - 10.1109/TMM.2008.2004908

DO - 10.1109/TMM.2008.2004908

M3 - Article

AN - SCOPUS:56549088313

VL - 10

SP - 1299

EP - 1306

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 7

M1 - 4668497

ER -