Viseme recognition experiment using context dependent hidden Markov models

Soonkyu Lee, Dongsuk Yook

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages557-561
Number of pages5
Volume2412
ISBN (Print)9783540440253
Publication statusPublished - 2002
Event3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002 - Manchester, United Kingdom
Duration: 2002 Aug 122002 Aug 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2412
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002
CountryUnited Kingdom
CityManchester
Period02/8/1202/8/14

Fingerprint

Hidden Markov models
Markov Model
Dependent
Experiment
Experiments
Human computer interaction
User interfaces
Acoustic waves
User Interface
Convert
Context
Face
Interaction

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Lee, S., & Yook, D. (2002). Viseme recognition experiment using context dependent hidden Markov models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2412, pp. 557-561). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2412). Springer Verlag.

Viseme recognition experiment using context dependent hidden Markov models. / Lee, Soonkyu; Yook, Dongsuk.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412 Springer Verlag, 2002. p. 557-561 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2412).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, S & Yook, D 2002, Viseme recognition experiment using context dependent hidden Markov models. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2412, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2412, Springer Verlag, pp. 557-561, 3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002, Manchester, United Kingdom, 02/8/12.
Lee S, Yook D. Viseme recognition experiment using context dependent hidden Markov models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412. Springer Verlag. 2002. p. 557-561. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Lee, Soonkyu ; Yook, Dongsuk. / Viseme recognition experiment using context dependent hidden Markov models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2412 Springer Verlag, 2002. pp. 557-561 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e383ad51b6d843a4aa963e7a80904188,
title = "Viseme recognition experiment using context dependent hidden Markov models",
abstract = "Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.",
author = "Soonkyu Lee and Dongsuk Yook",
year = "2002",
language = "English",
isbn = "9783540440253",
volume = "2412",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "557--561",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Viseme recognition experiment using context dependent hidden Markov models

AU - Lee, Soonkyu

AU - Yook, Dongsuk

PY - 2002

Y1 - 2002

N2 - Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.

AB - Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.

UR - http://www.scopus.com/inward/record.url?scp=84947935184&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947935184&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540440253

VL - 2412

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 557

EP - 561

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -