SVM-based phoneme classification and lip shape refinement in real-time lip-synch system

Hanseok Ko, David K. Han

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by "merge and split" procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.

Original languageEnglish
Pages (from-to)1029-1051
Number of pages23
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume20
Issue number7
DOIs
Publication statusPublished - 2006 Nov 1

Fingerprint

Support vector machines
Hidden Markov models
Studios
Random access storage
Animation
Merging
Program processors
Feature extraction
Processing

Keywords

  • Lip-synch
  • Real-time
  • Speech
  • Support vector machine
  • Viseme

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Cite this

SVM-based phoneme classification and lip shape refinement in real-time lip-synch system. / Ko, Hanseok; Han, David K.

In: International Journal of Pattern Recognition and Artificial Intelligence, Vol. 20, No. 7, 01.11.2006, p. 1029-1051.

Research output: Contribution to journalArticle

@article{18abed23e5f14581a908541cbe7736e2,
title = "SVM-based phoneme classification and lip shape refinement in real-time lip-synch system",
abstract = "In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by {"}merge and split{"} procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.",
keywords = "Lip-synch, Real-time, Speech, Support vector machine, Viseme",
author = "Hanseok Ko and Han, {David K.}",
year = "2006",
month = "11",
day = "1",
doi = "10.1142/S0218001406005113",
language = "English",
volume = "20",
pages = "1029--1051",
journal = "International Journal of Pattern Recognition and Artificial Intelligence",
issn = "0218-0014",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "7",

}

TY - JOUR

T1 - SVM-based phoneme classification and lip shape refinement in real-time lip-synch system

AU - Ko, Hanseok

AU - Han, David K.

PY - 2006/11/1

Y1 - 2006/11/1

N2 - In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by "merge and split" procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.

AB - In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by "merge and split" procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.

KW - Lip-synch

KW - Real-time

KW - Speech

KW - Support vector machine

KW - Viseme

UR - http://www.scopus.com/inward/record.url?scp=33845963091&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845963091&partnerID=8YFLogxK

U2 - 10.1142/S0218001406005113

DO - 10.1142/S0218001406005113

M3 - Article

VL - 20

SP - 1029

EP - 1051

JO - International Journal of Pattern Recognition and Artificial Intelligence

JF - International Journal of Pattern Recognition and Artificial Intelligence

SN - 0218-0014

IS - 7

ER -