TY - JOUR
T1 - Efficient adversarial audio synthesis via progressive upsampling
AU - Cho, Youngwoo
AU - Chang, Minwook
AU - Lee, Sanghyeon
AU - Lee, Hyoungwoo
AU - Kim, Gerard Jounghyun
AU - Choo, Jaegul
N1 - Funding Information:
Acknowledgements. This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (20ZS1200, Fundamental Technology Research for Human-Centric Autonomous Intelligent Systems) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST)).
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. Progressive upsampling GAN (PUGAN) leverages the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to an existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than WaveGAN. Our experiments show that the audio signals can be generated in real time with a comparable quality to that of WaveGAN in terms of the inception scores and human perception.
AB - This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. Progressive upsampling GAN (PUGAN) leverages the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to an existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than WaveGAN. Our experiments show that the audio signals can be generated in real time with a comparable quality to that of WaveGAN in terms of the inception scores and human perception.
KW - Generative adversarial networks (GANs)
KW - Real-time sound effect synthesis
UR - http://www.scopus.com/inward/record.url?scp=85115183852&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9413954
DO - 10.1109/ICASSP39728.2021.9413954
M3 - Conference article
AN - SCOPUS:85115183852
SN - 0736-7791
VL - 2021-June
SP - 3410
EP - 3414
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -