Abstract
This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.
Original language | English |
---|---|
Article number | 4668497 |
Pages (from-to) | 1299-1306 |
Number of pages | 8 |
Journal | IEEE Transactions on Multimedia |
Volume | 10 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2008 Nov |
Keywords
- Head-body-tail HMM
- Phoneme recognition
- Real-time lip-sync
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Science Applications
- Electrical and Electronic Engineering