TY - JOUR
T1 - Amphibian Sounds Generating Network Based on Adversarial Learning
AU - Park, Sangwook
AU - Elhilali, Mounya
AU - Han, David K.
AU - Ko, Hanseok
N1 - Funding Information:
Manuscript received February 23, 2020; accepted April 10, 2020. Date of publication April 20, 2020; date of current version May 12, 2020. This work was supported in part by the Korea Environment Industry & Technology Institute (KEITI) through the Public Technology Program based on Environmental Policy and in part by the Korea Ministry of Environment (MOE) under Grant 2017000210001. The work of David K. Han was supported by the U.S. Army Research Laboratory. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Daniele Giacobello. (Corresponding author: Hanseok Ko.) Sangwook Park and Mounya Elhilali are with the Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA (e-mail: spark190@jhu.edu; mounya@jhu.edu).
Publisher Copyright:
© 1994-2012 IEEE.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - This letter proposes a generative network based on adversarial learning for synthesizing short-time audio streams and investigates the effectiveness of data augmentation for amphibian call sounds classification. Based on Fourier analysis, the generator is designed by a multi-layer perceptron composed of frequency basis learning layers and an output layer, and a discriminator is constructed by a convolutional neural network. Additionally, regularization on weights is introduced to train the networks with practical data that includes some disturbances. Synthetic audio streams are evaluated by quantitative comparison using inception score, and classification results are compared for real versus synthetic data. In conclusion, the proposed generative network is shown to produce realistic sounds and therefore useful for data augmentation.
AB - This letter proposes a generative network based on adversarial learning for synthesizing short-time audio streams and investigates the effectiveness of data augmentation for amphibian call sounds classification. Based on Fourier analysis, the generator is designed by a multi-layer perceptron composed of frequency basis learning layers and an output layer, and a discriminator is constructed by a convolutional neural network. Additionally, regularization on weights is introduced to train the networks with practical data that includes some disturbances. Synthetic audio streams are evaluated by quantitative comparison using inception score, and classification results are compared for real versus synthetic data. In conclusion, the proposed generative network is shown to produce realistic sounds and therefore useful for data augmentation.
KW - Generative model
KW - Wasserstein distance
KW - adversarial networks
KW - audio stream generation
UR - http://www.scopus.com/inward/record.url?scp=85087384642&partnerID=8YFLogxK
U2 - 10.1109/LSP.2020.2988199
DO - 10.1109/LSP.2020.2988199
M3 - Article
AN - SCOPUS:85087384642
VL - 27
SP - 640
EP - 644
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
SN - 1070-9908
M1 - 9072273
ER -