Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance

Chao Yuan Kao, Hanseok Ko

Research output: Contribution to journalArticle


As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and MultiTask AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.

Original languageEnglish
Pages (from-to)670-677
Number of pages8
JournalJournal of the Acoustical Society of Korea
Issue number6
Publication statusPublished - 2019 Jan 1



  • Deep Neural Network (DNN)
  • Robust speech recognition
  • Speech enhancement
  • Wasserstein Generative Adversarial Network (WGAN)
  • Weight initialization

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Instrumentation
  • Applied Mathematics
  • Signal Processing
  • Speech and Hearing

Cite this