A novel discriminative feature extraction for acoustic scene classification using RNN based source separation

Seongkyu Mun, Suwon Shon, Wooil Kim, David K. Han, Hanseok Ko

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.

Original languageEnglish
Pages (from-to)3041-3044
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE100D
Issue number12
DOIs
Publication statusPublished - 2017 Dec 1

Fingerprint

Source separation
Recurrent neural networks
Feature extraction
Acoustics
Acoustic waves
Classifiers

Keywords

  • Acoustic scene classification
  • Bottleneck feature
  • Recurrent neural network
  • Transfer learning

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

A novel discriminative feature extraction for acoustic scene classification using RNN based source separation. / Mun, Seongkyu; Shon, Suwon; Kim, Wooil; Han, David K.; Ko, Hanseok.

In: IEICE Transactions on Information and Systems, Vol. E100D, No. 12, 01.12.2017, p. 3041-3044.

Research output: Contribution to journalArticle

@article{83e87c67a2334e3a9f3af37ca1d04c16,
title = "A novel discriminative feature extraction for acoustic scene classification using RNN based source separation",
abstract = "Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3{\%} compared to the conventional method, which uses additional classifiers for the similar class pair issue.",
keywords = "Acoustic scene classification, Bottleneck feature, Recurrent neural network, Transfer learning",
author = "Seongkyu Mun and Suwon Shon and Wooil Kim and Han, {David K.} and Hanseok Ko",
year = "2017",
month = "12",
day = "1",
doi = "10.1587/transinf.2017EDL8132",
language = "English",
volume = "E100D",
pages = "3041--3044",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "12",

}

TY - JOUR

T1 - A novel discriminative feature extraction for acoustic scene classification using RNN based source separation

AU - Mun, Seongkyu

AU - Shon, Suwon

AU - Kim, Wooil

AU - Han, David K.

AU - Ko, Hanseok

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.

AB - Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.

KW - Acoustic scene classification

KW - Bottleneck feature

KW - Recurrent neural network

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85038390088&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038390088&partnerID=8YFLogxK

U2 - 10.1587/transinf.2017EDL8132

DO - 10.1587/transinf.2017EDL8132

M3 - Article

AN - SCOPUS:85038390088

VL - E100D

SP - 3041

EP - 3044

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 12

ER -