PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION

Ji Hyun Lee, Sang Hoon Lee, Ji Hoon Kim, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Adaptive text-to-speech (TTS) has attracted increasing interests for the purpose of training TTS systems without tons of high quality data. Nevertheless, existing adaptive TTS systems still show low adaptation quality for novel speakers, since it is hard to learn an extensive speaking style with limited data. To address this issue, we propose progressive variational autoencoder (PVAE) which generates data with adapting to style gradually. PVAE learns a progressively style-normalized representation, which is a key component of progressive style adaptation. We extend PVAE to PVAE-TTS, a multi-speaker adaptive TTS model which generates natural speech with high adaptation quality for novel speakers. To further improve the adaptation quality, we also propose dynamic style layer normalization (DSLN) which utilizes a convolution operation. The experimental results demonstrate the superiority of PVAE-TTS in terms of both subjective and objective evaluations.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6312-6316
Number of pages5
ISBN (Electronic)9781665405409
DOIs
Publication statusPublished - 2022
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: 2022 May 232022 May 27

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period22/5/2322/5/27

Keywords

  • adaptive TTS
  • speaker adaptation
  • speech synthesis
  • text-to-speech

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION'. Together they form a unique fingerprint.

Cite this