TY - JOUR
T1 - Multi-band CNN architecture using adaptive frequency filter for acoustic event classification
AU - Kim, Donghyeon
AU - Park, Sangwook
AU - Han, David K.
AU - Ko, Hanseok
N1 - Funding Information:
This work was supported by the Korea Environmental Industry & Technology Institute (KEITI) through the Public Technology Program based on environmental policy funded by the Korean Ministry of Environment (MOE; 2017000210001), and the contribution of David Han was supported by the US Army Research Laboratory.
Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/1/15
Y1 - 2021/1/15
N2 - Although Convolutional Neural Networks (CNNs) architecture based learning systems have shown impressive results in the performance of numerous classification tasks, their effectiveness has been limited in certain cases of acoustic based classification. This vulnerability is particularly evident in the acoustic event classification tasks using spectral features. For example, spectral based features may suffer from a typical normalization process when it is fed to a neural network for training since the magnitudes in high-frequency band are inadvertently attenuated even though they may yet contain useful discriminant features. Although some research efforts try to mitigate this problem by introducing a multi-band approach for attaining salient and stable features, it requires empirically preset frequency bands to separate the spectral features. Being heuristic, however, this process is difficult to ensure the consistency required for high correlation between manually separated features and good classification performance. In this paper, we propose a novel filter parameter modeling framework performing optimized frequency sub-band separation via CNN based end-to-end training for achieving high acoustic event classification performance. In particular, the filter response characteristics, namely, cut-off frequencies and damping ratio for roll off are considered as added learning parameters to the CNN architecture for the proposed end-to-end learning framework so that the filter's frequency response is optimized for producing salient features. The proposed training process is shown to not only automatically select the filter parameters for multi-band frequency separation but also guarantee high correlation between the resulting sub-band features and accurate classification performance.
AB - Although Convolutional Neural Networks (CNNs) architecture based learning systems have shown impressive results in the performance of numerous classification tasks, their effectiveness has been limited in certain cases of acoustic based classification. This vulnerability is particularly evident in the acoustic event classification tasks using spectral features. For example, spectral based features may suffer from a typical normalization process when it is fed to a neural network for training since the magnitudes in high-frequency band are inadvertently attenuated even though they may yet contain useful discriminant features. Although some research efforts try to mitigate this problem by introducing a multi-band approach for attaining salient and stable features, it requires empirically preset frequency bands to separate the spectral features. Being heuristic, however, this process is difficult to ensure the consistency required for high correlation between manually separated features and good classification performance. In this paper, we propose a novel filter parameter modeling framework performing optimized frequency sub-band separation via CNN based end-to-end training for achieving high acoustic event classification performance. In particular, the filter response characteristics, namely, cut-off frequencies and damping ratio for roll off are considered as added learning parameters to the CNN architecture for the proposed end-to-end learning framework so that the filter's frequency response is optimized for producing salient features. The proposed training process is shown to not only automatically select the filter parameters for multi-band frequency separation but also guarantee high correlation between the resulting sub-band features and accurate classification performance.
KW - Convolutional neural network
KW - Filter parameter training
KW - High energy frequency
KW - Low energy feature vanishing
KW - Sub-band
UR - http://www.scopus.com/inward/record.url?scp=85090295544&partnerID=8YFLogxK
U2 - 10.1016/j.apacoust.2020.107579
DO - 10.1016/j.apacoust.2020.107579
M3 - Article
AN - SCOPUS:85090295544
VL - 172
JO - Applied Acoustics
JF - Applied Acoustics
SN - 0003-682X
M1 - 107579
ER -