Formant-based robust voice activity detection

In Chul Yoo, Hyeontaek Lim, Dongsuk Yook

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

Original languageEnglish
Article number7239555
Pages (from-to)2238-2245
Number of pages8
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume23
Issue number12
DOIs
Publication statusPublished - 2015 Dec 1

Fingerprint

background noise
acoustics
Acoustic waves
vowels
speech recognition
Speech coding
coding
signal to noise ratios
Speech recognition
Signal to noise ratio
estimates
Processing

Keywords

  • Formants
  • Spectral peaks
  • Voice activity detection (VAD)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Formant-based robust voice activity detection. / Yoo, In Chul; Lim, Hyeontaek; Yook, Dongsuk.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 23, No. 12, 7239555, 01.12.2015, p. 2238-2245.

Research output: Contribution to journalArticle

@article{100ae2766fa7404d96166b0cd49a3cb8,
title = "Formant-based robust voice activity detection",
abstract = "Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.",
keywords = "Formants, Spectral peaks, Voice activity detection (VAD)",
author = "Yoo, {In Chul} and Hyeontaek Lim and Dongsuk Yook",
year = "2015",
month = "12",
day = "1",
doi = "10.1109/TASLP.2015.2476762",
language = "English",
volume = "23",
pages = "2238--2245",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "12",

}

TY - JOUR

T1 - Formant-based robust voice activity detection

AU - Yoo, In Chul

AU - Lim, Hyeontaek

AU - Yook, Dongsuk

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

AB - Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

KW - Formants

KW - Spectral peaks

KW - Voice activity detection (VAD)

UR - http://www.scopus.com/inward/record.url?scp=84954127731&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954127731&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2015.2476762

DO - 10.1109/TASLP.2015.2476762

M3 - Article

AN - SCOPUS:84954127731

VL - 23

SP - 2238

EP - 2245

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 12

M1 - 7239555

ER -