Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement

Taeyoon Kim, Yongsung Kang, Hanseok Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.

Original languageEnglish
Title of host publicationProceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages299-304
Number of pages6
ISBN (Print)0769518346, 9780769518343
DOIs
Publication statusPublished - 2002
Event4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States
Duration: 2002 Oct 142002 Oct 16

Other

Other4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
CountryUnited States
CityPittsburgh
Period02/10/1402/10/16

Fingerprint

Support vector machines
Animation
Feature extraction
Processing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Cite this

Kim, T., Kang, Y., & Ko, H. (2002). Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement. In Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 (pp. 299-304). [1167010] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMI.2002.1167010

Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement. / Kim, Taeyoon; Kang, Yongsung; Ko, Hanseok.

Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 299-304 1167010.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, T, Kang, Y & Ko, H 2002, Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement. in Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002., 1167010, Institute of Electrical and Electronics Engineers Inc., pp. 299-304, 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002, Pittsburgh, United States, 02/10/14. https://doi.org/10.1109/ICMI.2002.1167010
Kim T, Kang Y, Ko H. Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement. In Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 299-304. 1167010 https://doi.org/10.1109/ICMI.2002.1167010
Kim, Taeyoon ; Kang, Yongsung ; Ko, Hanseok. / Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement. Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 299-304
@inproceedings{d16576078a604d1e99cb3b0a703bf184,
title = "Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement",
abstract = "In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize {"}real time{"} operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.",
author = "Taeyoon Kim and Yongsung Kang and Hanseok Ko",
year = "2002",
doi = "10.1109/ICMI.2002.1167010",
language = "English",
isbn = "0769518346",
pages = "299--304",
booktitle = "Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement

AU - Kim, Taeyoon

AU - Kang, Yongsung

AU - Ko, Hanseok

PY - 2002

Y1 - 2002

N2 - In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.

AB - In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.

UR - http://www.scopus.com/inward/record.url?scp=84963799348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963799348&partnerID=8YFLogxK

U2 - 10.1109/ICMI.2002.1167010

DO - 10.1109/ICMI.2002.1167010

M3 - Conference contribution

AN - SCOPUS:84963799348

SN - 0769518346

SN - 9780769518343

SP - 299

EP - 304

BT - Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002

PB - Institute of Electrical and Electronics Engineers Inc.

ER -