A voice trigger system using keyword and speaker recognition for mobile devices

Hyeopwoo Lee, Sukmoon Chang, Dongsuk Yook, Yongserk Kim

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1

Original languageEnglish
Article number5373813
Pages (from-to)2377-2384
Number of pages8
JournalIEEE Transactions on Consumer Electronics
Volume55
Issue number4
DOIs
Publication statusPublished - 2009 Nov 1

Fingerprint

Mobile devices
Hidden Markov models
Speech recognition
Electric power utilization
Experiments

Keywords

  • Dynamic time warping
  • Gaussian mixture model
  • Hidden Markov model
  • Keyword recognition
  • Speaker recognition
  • Vector quantization
  • Voice trigger

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Media Technology

Cite this

A voice trigger system using keyword and speaker recognition for mobile devices. / Lee, Hyeopwoo; Chang, Sukmoon; Yook, Dongsuk; Kim, Yongserk.

In: IEEE Transactions on Consumer Electronics, Vol. 55, No. 4, 5373813, 01.11.2009, p. 2377-2384.

Research output: Contribution to journalArticle

Lee, Hyeopwoo ; Chang, Sukmoon ; Yook, Dongsuk ; Kim, Yongserk. / A voice trigger system using keyword and speaker recognition for mobile devices. In: IEEE Transactions on Consumer Electronics. 2009 ; Vol. 55, No. 4. pp. 2377-2384.
@article{2278b2d30a9d4b64a0133462868fc2af,
title = "A voice trigger system using keyword and speaker recognition for mobile devices",
abstract = "Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8{\%} relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1",
keywords = "Dynamic time warping, Gaussian mixture model, Hidden Markov model, Keyword recognition, Speaker recognition, Vector quantization, Voice trigger",
author = "Hyeopwoo Lee and Sukmoon Chang and Dongsuk Yook and Yongserk Kim",
year = "2009",
month = "11",
day = "1",
doi = "10.1109/TCE.2009.5373813",
language = "English",
volume = "55",
pages = "2377--2384",
journal = "IEEE Transactions on Consumer Electronics",
issn = "0098-3063",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - A voice trigger system using keyword and speaker recognition for mobile devices

AU - Lee, Hyeopwoo

AU - Chang, Sukmoon

AU - Yook, Dongsuk

AU - Kim, Yongserk

PY - 2009/11/1

Y1 - 2009/11/1

N2 - Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1

AB - Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1

KW - Dynamic time warping

KW - Gaussian mixture model

KW - Hidden Markov model

KW - Keyword recognition

KW - Speaker recognition

KW - Vector quantization

KW - Voice trigger

UR - http://www.scopus.com/inward/record.url?scp=75449114041&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75449114041&partnerID=8YFLogxK

U2 - 10.1109/TCE.2009.5373813

DO - 10.1109/TCE.2009.5373813

M3 - Article

AN - SCOPUS:75449114041

VL - 55

SP - 2377

EP - 2384

JO - IEEE Transactions on Consumer Electronics

JF - IEEE Transactions on Consumer Electronics

SN - 0098-3063

IS - 4

M1 - 5373813

ER -