Abstract
Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
Original language | English |
---|---|
Pages (from-to) | 1195-1198 |
Number of pages | 4 |
Journal | IEICE Transactions on Information and Systems |
Volume | E103D |
Issue number | 5 |
DOIs | |
Publication status | Published - 2020 May |
Keywords
- Deep learning
- Generative adversarial networks
- Robust speech recognition
- Speech enhancement
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering
- Artificial Intelligence