Effective lip localization and tracking for achieving multimodal speech recognition

Wei Chuan Ooi, Changwon Jeon, Kihyeon Kim, Hanseok Ko, David K. Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

Original languageEnglish
Title of host publicationLecture Notes in Electrical Engineering
Pages33-43
Number of pages11
Volume35 LNEE
DOIs
Publication statusPublished - 2009 Sep 25
Event7th IEEE International Conference on Multi-Sensor Integration and Fusion, IEEE MFI 2008 - Seoul, Korea, Republic of
Duration: 2008 Aug 202008 Aug 22

Publication series

NameLecture Notes in Electrical Engineering
Volume35 LNEE
ISSN (Print)18761100
ISSN (Electronic)18761119

Other

Other7th IEEE International Conference on Multi-Sensor Integration and Fusion, IEEE MFI 2008
CountryKorea, Republic of
CitySeoul
Period08/8/2008/8/22

Fingerprint

Speech recognition
Color
Interfaces (computer)
Fusion reactions
Acoustics

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Cite this

Ooi, W. C., Jeon, C., Kim, K., Ko, H., & Han, D. K. (2009). Effective lip localization and tracking for achieving multimodal speech recognition. In Lecture Notes in Electrical Engineering (Vol. 35 LNEE, pp. 33-43). (Lecture Notes in Electrical Engineering; Vol. 35 LNEE). https://doi.org/10.1007/978-3-540-89859-7_3

Effective lip localization and tracking for achieving multimodal speech recognition. / Ooi, Wei Chuan; Jeon, Changwon; Kim, Kihyeon; Ko, Hanseok; Han, David K.

Lecture Notes in Electrical Engineering. Vol. 35 LNEE 2009. p. 33-43 (Lecture Notes in Electrical Engineering; Vol. 35 LNEE).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ooi, WC, Jeon, C, Kim, K, Ko, H & Han, DK 2009, Effective lip localization and tracking for achieving multimodal speech recognition. in Lecture Notes in Electrical Engineering. vol. 35 LNEE, Lecture Notes in Electrical Engineering, vol. 35 LNEE, pp. 33-43, 7th IEEE International Conference on Multi-Sensor Integration and Fusion, IEEE MFI 2008, Seoul, Korea, Republic of, 08/8/20. https://doi.org/10.1007/978-3-540-89859-7_3
Ooi WC, Jeon C, Kim K, Ko H, Han DK. Effective lip localization and tracking for achieving multimodal speech recognition. In Lecture Notes in Electrical Engineering. Vol. 35 LNEE. 2009. p. 33-43. (Lecture Notes in Electrical Engineering). https://doi.org/10.1007/978-3-540-89859-7_3
Ooi, Wei Chuan ; Jeon, Changwon ; Kim, Kihyeon ; Ko, Hanseok ; Han, David K. / Effective lip localization and tracking for achieving multimodal speech recognition. Lecture Notes in Electrical Engineering. Vol. 35 LNEE 2009. pp. 33-43 (Lecture Notes in Electrical Engineering).
@inproceedings{da7d1663d5144075bf183b7df8202602,
title = "Effective lip localization and tracking for achieving multimodal speech recognition",
abstract = "Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.",
author = "Ooi, {Wei Chuan} and Changwon Jeon and Kihyeon Kim and Hanseok Ko and Han, {David K.}",
year = "2009",
month = "9",
day = "25",
doi = "10.1007/978-3-540-89859-7_3",
language = "English",
isbn = "9783540898580",
volume = "35 LNEE",
series = "Lecture Notes in Electrical Engineering",
pages = "33--43",
booktitle = "Lecture Notes in Electrical Engineering",

}

TY - GEN

T1 - Effective lip localization and tracking for achieving multimodal speech recognition

AU - Ooi, Wei Chuan

AU - Jeon, Changwon

AU - Kim, Kihyeon

AU - Ko, Hanseok

AU - Han, David K.

PY - 2009/9/25

Y1 - 2009/9/25

N2 - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

AB - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

UR - http://www.scopus.com/inward/record.url?scp=78651558287&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78651558287&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-89859-7_3

DO - 10.1007/978-3-540-89859-7_3

M3 - Conference contribution

AN - SCOPUS:78651558287

SN - 9783540898580

VL - 35 LNEE

T3 - Lecture Notes in Electrical Engineering

SP - 33

EP - 43

BT - Lecture Notes in Electrical Engineering

ER -