TY - JOUR
T1 - Dual stage learning based dynamic time-frequency mask generation for audio event classification
AU - Kim, Donghyeon
AU - Park, Jaihyun
AU - Han, David K.
AU - Ko, Hanseok
N1 - Funding Information:
This work was supported by the Korea Environmental Industry & Technology Institute (KEITI) through the Public Technology Program based on environmental policy funded by the Korean Ministry of Environment (MOE; 2017000210001), and the contribution of David Han was supported by the US Army Research Laboratory.
Publisher Copyright:
Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - Audio based event recognition becomes quite challenging in real world noisy environments. To alleviate the noise issue, time-frequency mask based feature enhancement methods have been proposed. While these methods with fixed filter settings have been shown to be effective in familiar noise backgrounds, they become brittle when exposed to unexpected noise. To address the unknown noise problem, we develop an approach based on dynamic filter generation learning. In particular, we propose a dual stage dynamic filter generator networks that can be trained to generate a time-frequency mask specifically created for each input audio. Two alternative approaches of training the mask generator network are developed for feature enhancements in high noise environments. Our proposed method shows improved performance and robustness in both clean and unseen noise environments.
AB - Audio based event recognition becomes quite challenging in real world noisy environments. To alleviate the noise issue, time-frequency mask based feature enhancement methods have been proposed. While these methods with fixed filter settings have been shown to be effective in familiar noise backgrounds, they become brittle when exposed to unexpected noise. To address the unknown noise problem, we develop an approach based on dynamic filter generation learning. In particular, we propose a dual stage dynamic filter generator networks that can be trained to generate a time-frequency mask specifically created for each input audio. Two alternative approaches of training the mask generator network are developed for feature enhancements in high noise environments. Our proposed method shows improved performance and robustness in both clean and unseen noise environments.
KW - Audio recognition
KW - Dual stage
KW - Dynamic filter network
KW - Feature enhancement
UR - http://www.scopus.com/inward/record.url?scp=85098177852&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2152
DO - 10.21437/Interspeech.2020-2152
M3 - Conference article
AN - SCOPUS:85098177852
VL - 2020-October
SP - 836
EP - 840
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -