Unsupervised adaptation without estimated transriptions

Hyeopwoo Lee, Dongsuk Yook

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To estimate the unknown distortion parameters from input test signals, estimated transcriptions are typically used for unsupervised adaptation. In a low signal to noise ratio (SNR) condition, the transcription estimated by a decoding procedure can be error prone because of the high mismatch between the acoustic models and the input signal. As a result, it can cause performance degradation of the adapted systems. To account for this problem, we propose an unsupervised adaptation method that can adapt the acoustic models without the estimated transcription. Instead, Gaussian mixture models (GMM) and pseudo phoneme models (PPM) are used. Using these models the unknown distortion parameters are estimated based on the vector Taylor series (VTS) model adaptation scheme. On the Aurora2 task, we obtained relative reduction of 5.4% in word error rate (WER).

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages7918-7921
Number of pages4
DOIs
Publication statusPublished - 2013 Oct 18
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 2013 May 262013 May 31

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period13/5/2613/5/31

Fingerprint

Transcription
Acoustics
Taylor series
Decoding
Signal to noise ratio
Degradation

Keywords

  • robust speech recognition
  • Unsupervised adaptation
  • vector Taylor series

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Lee, H., & Yook, D. (2013). Unsupervised adaptation without estimated transriptions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 7918-7921). [6639206] https://doi.org/10.1109/ICASSP.2013.6639206

Unsupervised adaptation without estimated transriptions. / Lee, Hyeopwoo; Yook, Dongsuk.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 7918-7921 6639206.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, H & Yook, D 2013, Unsupervised adaptation without estimated transriptions. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6639206, pp. 7918-7921, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 13/5/26. https://doi.org/10.1109/ICASSP.2013.6639206
Lee H, Yook D. Unsupervised adaptation without estimated transriptions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 7918-7921. 6639206 https://doi.org/10.1109/ICASSP.2013.6639206
Lee, Hyeopwoo ; Yook, Dongsuk. / Unsupervised adaptation without estimated transriptions. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. pp. 7918-7921
@inproceedings{f516f2d73b6b4c76b3b90e565299ec99,
title = "Unsupervised adaptation without estimated transriptions",
abstract = "To estimate the unknown distortion parameters from input test signals, estimated transcriptions are typically used for unsupervised adaptation. In a low signal to noise ratio (SNR) condition, the transcription estimated by a decoding procedure can be error prone because of the high mismatch between the acoustic models and the input signal. As a result, it can cause performance degradation of the adapted systems. To account for this problem, we propose an unsupervised adaptation method that can adapt the acoustic models without the estimated transcription. Instead, Gaussian mixture models (GMM) and pseudo phoneme models (PPM) are used. Using these models the unknown distortion parameters are estimated based on the vector Taylor series (VTS) model adaptation scheme. On the Aurora2 task, we obtained relative reduction of 5.4{\%} in word error rate (WER).",
keywords = "robust speech recognition, Unsupervised adaptation, vector Taylor series",
author = "Hyeopwoo Lee and Dongsuk Yook",
year = "2013",
month = "10",
day = "18",
doi = "10.1109/ICASSP.2013.6639206",
language = "English",
isbn = "9781479903566",
pages = "7918--7921",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Unsupervised adaptation without estimated transriptions

AU - Lee, Hyeopwoo

AU - Yook, Dongsuk

PY - 2013/10/18

Y1 - 2013/10/18

N2 - To estimate the unknown distortion parameters from input test signals, estimated transcriptions are typically used for unsupervised adaptation. In a low signal to noise ratio (SNR) condition, the transcription estimated by a decoding procedure can be error prone because of the high mismatch between the acoustic models and the input signal. As a result, it can cause performance degradation of the adapted systems. To account for this problem, we propose an unsupervised adaptation method that can adapt the acoustic models without the estimated transcription. Instead, Gaussian mixture models (GMM) and pseudo phoneme models (PPM) are used. Using these models the unknown distortion parameters are estimated based on the vector Taylor series (VTS) model adaptation scheme. On the Aurora2 task, we obtained relative reduction of 5.4% in word error rate (WER).

AB - To estimate the unknown distortion parameters from input test signals, estimated transcriptions are typically used for unsupervised adaptation. In a low signal to noise ratio (SNR) condition, the transcription estimated by a decoding procedure can be error prone because of the high mismatch between the acoustic models and the input signal. As a result, it can cause performance degradation of the adapted systems. To account for this problem, we propose an unsupervised adaptation method that can adapt the acoustic models without the estimated transcription. Instead, Gaussian mixture models (GMM) and pseudo phoneme models (PPM) are used. Using these models the unknown distortion parameters are estimated based on the vector Taylor series (VTS) model adaptation scheme. On the Aurora2 task, we obtained relative reduction of 5.4% in word error rate (WER).

KW - robust speech recognition

KW - Unsupervised adaptation

KW - vector Taylor series

UR - http://www.scopus.com/inward/record.url?scp=84890443262&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890443262&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6639206

DO - 10.1109/ICASSP.2013.6639206

M3 - Conference contribution

AN - SCOPUS:84890443262

SN - 9781479903566

SP - 7918

EP - 7921

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -