TY - GEN
T1 - Specmix
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
AU - Kim, Gwantae
AU - Han, David K.
AU - Ko, Hanseok
N1 - Funding Information:
This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-19-1-4001.
Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event classification, and speech enhancement tasks. While there have been several augmentation methods shown to be effective in improving image classification performance, their efficacy toward time-frequency domain features of audio is not assured. We propose a novel audio data augmentation approach named "Specmix"specifically designed for dealing with time-frequency domain features. The augmentation method consists of mixing two different data samples by applying time-frequency masks effective in preserving the spectral correlation of each audio sample. Our experiments on acoustic scene classification, sound event classification, and speech enhancement tasks show that the proposed Specmix improves the performance of various neural network architectures by a maximum of 2.7%.
AB - A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event classification, and speech enhancement tasks. While there have been several augmentation methods shown to be effective in improving image classification performance, their efficacy toward time-frequency domain features of audio is not assured. We propose a novel audio data augmentation approach named "Specmix"specifically designed for dealing with time-frequency domain features. The augmentation method consists of mixing two different data samples by applying time-frequency masks effective in preserving the spectral correlation of each audio sample. Our experiments on acoustic scene classification, sound event classification, and speech enhancement tasks show that the proposed Specmix improves the performance of various neural network architectures by a maximum of 2.7%.
KW - Acoustic scene classification
KW - Data augmentation
KW - Deep neural networks
KW - Sound event classification
KW - Speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85119245236&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-103
DO - 10.21437/Interspeech.2021-103
M3 - Conference contribution
AN - SCOPUS:85119245236
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 6
EP - 10
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
Y2 - 30 August 2021 through 3 September 2021
ER -