Visual speech recognition using weighted dynamic time warping

Kyungsun Lee, Minseok Keum, David K. Han, Hanseok Ko

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.

Original languageEnglish
Pages (from-to)1430-1433
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE98D
Issue number7
DOIs
Publication statusPublished - 2015 Jul 1

Fingerprint

Speech recognition
Hidden Markov models
Template matching
Experiments

Keywords

  • Lip reading
  • Visual speech recognition

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Visual speech recognition using weighted dynamic time warping. / Lee, Kyungsun; Keum, Minseok; Han, David K.; Ko, Hanseok.

In: IEICE Transactions on Information and Systems, Vol. E98D, No. 7, 01.07.2015, p. 1430-1433.

Research output: Contribution to journalArticle

Lee, Kyungsun ; Keum, Minseok ; Han, David K. ; Ko, Hanseok. / Visual speech recognition using weighted dynamic time warping. In: IEICE Transactions on Information and Systems. 2015 ; Vol. E98D, No. 7. pp. 1430-1433.
@article{39a0cb0b325d43cc89adcd5acf317fab,
title = "Visual speech recognition using weighted dynamic time warping",
abstract = "It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.",
keywords = "Lip reading, Visual speech recognition",
author = "Kyungsun Lee and Minseok Keum and Han, {David K.} and Hanseok Ko",
year = "2015",
month = "7",
day = "1",
doi = "10.1587/transinf.2015EDL8002",
language = "English",
volume = "E98D",
pages = "1430--1433",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "7",

}

TY - JOUR

T1 - Visual speech recognition using weighted dynamic time warping

AU - Lee, Kyungsun

AU - Keum, Minseok

AU - Han, David K.

AU - Ko, Hanseok

PY - 2015/7/1

Y1 - 2015/7/1

N2 - It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.

AB - It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.

KW - Lip reading

KW - Visual speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84937597424&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937597424&partnerID=8YFLogxK

U2 - 10.1587/transinf.2015EDL8002

DO - 10.1587/transinf.2015EDL8002

M3 - Article

VL - E98D

SP - 1430

EP - 1433

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 7

ER -