Recursive whitening transformation for speaker recognition on language mismatched condition

Suwon Shon, Seongkyu Mun, Hanseok Ko

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Recently in speaker recognition, performance degradation due to the channel domain mismatched condition has been actively addressed. However, the mismatches arising from language is yet to be sufficiently addressed. This paper proposes an approach which employs recursive whitening transformation to mitigate the language mismatched condition. The proposed method is based on the multiple whitening transformation, which is intended to remove un-whitened residual components in the dataset associated with i-vector length normalization. The experiments were conducted on the Speaker Recognition Evaluation 2016 trials of which the task is non-English speaker recognition using development dataset consist of both a large scale out-of-domain (English) dataset and an extremely low-quantity in-domain (non-English) dataset. For performance comparison, we develop a state-of-the-art system using deep neural network and bottleneck feature, which is based on a phonetically aware model. From the experimental results, along with other prior studies, effectiveness of the proposed method on language mismatched condition is validated.

Original languageEnglish
Pages (from-to)2869-2873
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Speaker Recognition
Degradation
Experiments
Performance Comparison
Normalization
Neural Networks
Evaluation
Experimental Results
Experiment
Language
Deep neural networks
Model

Keywords

  • Language mismatched condition
  • Speaker recognition
  • Whitening transform

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{f794c5b11cb94dfc989d338276340d90,
title = "Recursive whitening transformation for speaker recognition on language mismatched condition",
abstract = "Recently in speaker recognition, performance degradation due to the channel domain mismatched condition has been actively addressed. However, the mismatches arising from language is yet to be sufficiently addressed. This paper proposes an approach which employs recursive whitening transformation to mitigate the language mismatched condition. The proposed method is based on the multiple whitening transformation, which is intended to remove un-whitened residual components in the dataset associated with i-vector length normalization. The experiments were conducted on the Speaker Recognition Evaluation 2016 trials of which the task is non-English speaker recognition using development dataset consist of both a large scale out-of-domain (English) dataset and an extremely low-quantity in-domain (non-English) dataset. For performance comparison, we develop a state-of-the-art system using deep neural network and bottleneck feature, which is based on a phonetically aware model. From the experimental results, along with other prior studies, effectiveness of the proposed method on language mismatched condition is validated.",
keywords = "Language mismatched condition, Speaker recognition, Whitening transform",
author = "Suwon Shon and Seongkyu Mun and Hanseok Ko",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-545",
language = "English",
volume = "2017-August",
pages = "2869--2873",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Recursive whitening transformation for speaker recognition on language mismatched condition

AU - Shon, Suwon

AU - Mun, Seongkyu

AU - Ko, Hanseok

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Recently in speaker recognition, performance degradation due to the channel domain mismatched condition has been actively addressed. However, the mismatches arising from language is yet to be sufficiently addressed. This paper proposes an approach which employs recursive whitening transformation to mitigate the language mismatched condition. The proposed method is based on the multiple whitening transformation, which is intended to remove un-whitened residual components in the dataset associated with i-vector length normalization. The experiments were conducted on the Speaker Recognition Evaluation 2016 trials of which the task is non-English speaker recognition using development dataset consist of both a large scale out-of-domain (English) dataset and an extremely low-quantity in-domain (non-English) dataset. For performance comparison, we develop a state-of-the-art system using deep neural network and bottleneck feature, which is based on a phonetically aware model. From the experimental results, along with other prior studies, effectiveness of the proposed method on language mismatched condition is validated.

AB - Recently in speaker recognition, performance degradation due to the channel domain mismatched condition has been actively addressed. However, the mismatches arising from language is yet to be sufficiently addressed. This paper proposes an approach which employs recursive whitening transformation to mitigate the language mismatched condition. The proposed method is based on the multiple whitening transformation, which is intended to remove un-whitened residual components in the dataset associated with i-vector length normalization. The experiments were conducted on the Speaker Recognition Evaluation 2016 trials of which the task is non-English speaker recognition using development dataset consist of both a large scale out-of-domain (English) dataset and an extremely low-quantity in-domain (non-English) dataset. For performance comparison, we develop a state-of-the-art system using deep neural network and bottleneck feature, which is based on a phonetically aware model. From the experimental results, along with other prior studies, effectiveness of the proposed method on language mismatched condition is validated.

KW - Language mismatched condition

KW - Speaker recognition

KW - Whitening transform

UR - http://www.scopus.com/inward/record.url?scp=85039167957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039167957&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-545

DO - 10.21437/Interspeech.2017-545

M3 - Article

VL - 2017-August

SP - 2869

EP - 2873

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -