Effective lip localization and tracking for achieving multimodal speech recognition

Chuan Ooi Wei, Changwon Jeon, Kihyeon Kim, David K. Han, Hanseok Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

Original languageEnglish
Title of host publicationIEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems
Pages90-93
Number of pages4
DOIs
Publication statusPublished - 2008 Dec 1
Event2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI - Seoul, Korea, Republic of
Duration: 2008 Aug 202008 Aug 22

Other

Other2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI
CountryKorea, Republic of
CitySeoul
Period08/8/2008/8/22

Fingerprint

Speech recognition
Color
Interfaces (computer)
Fusion reactions
Acoustics

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Computer Science Applications

Cite this

Wei, C. O., Jeon, C., Kim, K., Han, D. K., & Ko, H. (2008). Effective lip localization and tracking for achieving multimodal speech recognition. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (pp. 90-93). [4648114] https://doi.org/10.1109/MFI.2008.4648114

Effective lip localization and tracking for achieving multimodal speech recognition. / Wei, Chuan Ooi; Jeon, Changwon; Kim, Kihyeon; Han, David K.; Ko, Hanseok.

IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 2008. p. 90-93 4648114.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wei, CO, Jeon, C, Kim, K, Han, DK & Ko, H 2008, Effective lip localization and tracking for achieving multimodal speech recognition. in IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems., 4648114, pp. 90-93, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI, Seoul, Korea, Republic of, 08/8/20. https://doi.org/10.1109/MFI.2008.4648114
Wei CO, Jeon C, Kim K, Han DK, Ko H. Effective lip localization and tracking for achieving multimodal speech recognition. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 2008. p. 90-93. 4648114 https://doi.org/10.1109/MFI.2008.4648114
Wei, Chuan Ooi ; Jeon, Changwon ; Kim, Kihyeon ; Han, David K. ; Ko, Hanseok. / Effective lip localization and tracking for achieving multimodal speech recognition. IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 2008. pp. 90-93
@inproceedings{1fbfb889e16b49e9a9e2ad2733f3c9f5,
title = "Effective lip localization and tracking for achieving multimodal speech recognition",
abstract = "Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.",
author = "Wei, {Chuan Ooi} and Changwon Jeon and Kihyeon Kim and Han, {David K.} and Hanseok Ko",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/MFI.2008.4648114",
language = "English",
pages = "90--93",
booktitle = "IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems",

}

TY - GEN

T1 - Effective lip localization and tracking for achieving multimodal speech recognition

AU - Wei, Chuan Ooi

AU - Jeon, Changwon

AU - Kim, Kihyeon

AU - Han, David K.

AU - Ko, Hanseok

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

AB - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

UR - http://www.scopus.com/inward/record.url?scp=67650530502&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650530502&partnerID=8YFLogxK

U2 - 10.1109/MFI.2008.4648114

DO - 10.1109/MFI.2008.4648114

M3 - Conference contribution

SP - 90

EP - 93

BT - IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems

ER -