Abstract
In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.
Original language | English |
---|---|
Title of host publication | Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 299-304 |
Number of pages | 6 |
ISBN (Print) | 0769518346, 9780769518343 |
DOIs | |
Publication status | Published - 2002 |
Event | 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States Duration: 2002 Oct 14 → 2002 Oct 16 |
Other
Other | 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 |
---|---|
Country/Territory | United States |
City | Pittsburgh |
Period | 02/10/14 → 02/10/16 |
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition
- Hardware and Architecture