Audio-to-visual conversion using hidden Markov models

Soonkyu Lee, Dongsuk Yook

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

We describe audio-to-visual conversion techniques for efficient multimedia communications. The audio signals are automatically converted to visual images of mouth shape. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. Visual images synchronized with audio signals can provide userfriendly interface for man machine interactions. Also, it can be used to help the people with impaired-hearing. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each viseme, and the audio signals are directly recognized as a sequence of visemes. In the second approach, each phoneme is modeled with an HMM, and a general phoneme recognizer is utilized to produce a phoneme sequence from the audio signals. The phoneme sequence is then converted to a viseme sequence. We implemented the two approaches and tested them on the TIMIT speech corpus. The viseme recognizer shows 33.9% error rate, and the phoneme-based approach exhibits 29.7% viseme recognition error rate. When similar viseme classes are merged, we have found that the error rates can be reduced to 20.5% and 13.9%, respectably.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages563-570
Number of pages8
Volume2417
ISBN (Print)3540440380, 9783540440383
Publication statusPublished - 2002
Event7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 - Tokyo, Japan
Duration: 2002 Aug 182002 Aug 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2417
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002
CountryJapan
CityTokyo
Period02/8/1802/8/22

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Audio-to-visual conversion using hidden Markov models'. Together they form a unique fingerprint.

  • Cite this

    Lee, S., & Yook, D. (2002). Audio-to-visual conversion using hidden Markov models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2417, pp. 563-570). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2417). Springer Verlag.