MUSEMBLE: A novel music retrieval system with automatic voice query transcription and reformulation

Seungmin Rho, Byeong jun Han, Een Jun Hwang, Minkoo Kim

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

So far, many researches have been done to develop efficient music retrieval systems, and query-by-humming has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary. On the other hand, recently, there has been an increased interest in query reformulation using relevance feedback with evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques have not been exploited widely in the field of music retrieval. In this paper, we develop a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on two distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (windowed average energy) and dynamic ADF (amplitude-based difference function) onsets for more accurate note segmentation and onset/offset detection in acoustic signal, respectively. The former improved energy-based approaches such as AE by defining small but coherent windows with local and global threshold values. On the other hand, the latter improved the AF (amplitude function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. (ii) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this paper, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. We have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and genetic algorithm-based relevance feedback schemes. We demonstrate that our proposed method improves the retrieval accuracy up to 20-40% compared with other popular RF methods. We also show that both WAE and Dynamic ADF methods improve the transcription accuracy up to 95%.

Original languageEnglish
Pages (from-to)1065-1080
Number of pages16
JournalJournal of Systems and Software
Volume81
Issue number7
DOIs
Publication statusPublished - 2008 Jul 1

Fingerprint

Transcription
Genetic algorithms
Feedback
Acoustics
Information retrieval
Signal processing
Experiments

Keywords

  • Genetic algorithm
  • Multimedia database
  • Music retrieval
  • Pitch tracking
  • Relevance feedback
  • Signal processing

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Cite this

MUSEMBLE : A novel music retrieval system with automatic voice query transcription and reformulation. / Rho, Seungmin; Han, Byeong jun; Hwang, Een Jun; Kim, Minkoo.

In: Journal of Systems and Software, Vol. 81, No. 7, 01.07.2008, p. 1065-1080.

Research output: Contribution to journalArticle

@article{8a3753bdc1ff4bb3ad8ffe5063f587c7,
title = "MUSEMBLE: A novel music retrieval system with automatic voice query transcription and reformulation",
abstract = "So far, many researches have been done to develop efficient music retrieval systems, and query-by-humming has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary. On the other hand, recently, there has been an increased interest in query reformulation using relevance feedback with evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques have not been exploited widely in the field of music retrieval. In this paper, we develop a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on two distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (windowed average energy) and dynamic ADF (amplitude-based difference function) onsets for more accurate note segmentation and onset/offset detection in acoustic signal, respectively. The former improved energy-based approaches such as AE by defining small but coherent windows with local and global threshold values. On the other hand, the latter improved the AF (amplitude function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. (ii) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this paper, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. We have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and genetic algorithm-based relevance feedback schemes. We demonstrate that our proposed method improves the retrieval accuracy up to 20-40{\%} compared with other popular RF methods. We also show that both WAE and Dynamic ADF methods improve the transcription accuracy up to 95{\%}.",
keywords = "Genetic algorithm, Multimedia database, Music retrieval, Pitch tracking, Relevance feedback, Signal processing",
author = "Seungmin Rho and Han, {Byeong jun} and Hwang, {Een Jun} and Minkoo Kim",
year = "2008",
month = "7",
day = "1",
doi = "10.1016/j.jss.2007.05.038",
language = "English",
volume = "81",
pages = "1065--1080",
journal = "Journal of Systems and Software",
issn = "0164-1212",
publisher = "Elsevier Inc.",
number = "7",

}

TY - JOUR

T1 - MUSEMBLE

T2 - A novel music retrieval system with automatic voice query transcription and reformulation

AU - Rho, Seungmin

AU - Han, Byeong jun

AU - Hwang, Een Jun

AU - Kim, Minkoo

PY - 2008/7/1

Y1 - 2008/7/1

N2 - So far, many researches have been done to develop efficient music retrieval systems, and query-by-humming has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary. On the other hand, recently, there has been an increased interest in query reformulation using relevance feedback with evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques have not been exploited widely in the field of music retrieval. In this paper, we develop a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on two distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (windowed average energy) and dynamic ADF (amplitude-based difference function) onsets for more accurate note segmentation and onset/offset detection in acoustic signal, respectively. The former improved energy-based approaches such as AE by defining small but coherent windows with local and global threshold values. On the other hand, the latter improved the AF (amplitude function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. (ii) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this paper, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. We have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and genetic algorithm-based relevance feedback schemes. We demonstrate that our proposed method improves the retrieval accuracy up to 20-40% compared with other popular RF methods. We also show that both WAE and Dynamic ADF methods improve the transcription accuracy up to 95%.

AB - So far, many researches have been done to develop efficient music retrieval systems, and query-by-humming has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary. On the other hand, recently, there has been an increased interest in query reformulation using relevance feedback with evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques have not been exploited widely in the field of music retrieval. In this paper, we develop a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on two distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (windowed average energy) and dynamic ADF (amplitude-based difference function) onsets for more accurate note segmentation and onset/offset detection in acoustic signal, respectively. The former improved energy-based approaches such as AE by defining small but coherent windows with local and global threshold values. On the other hand, the latter improved the AF (amplitude function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. (ii) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this paper, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. We have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and genetic algorithm-based relevance feedback schemes. We demonstrate that our proposed method improves the retrieval accuracy up to 20-40% compared with other popular RF methods. We also show that both WAE and Dynamic ADF methods improve the transcription accuracy up to 95%.

KW - Genetic algorithm

KW - Multimedia database

KW - Music retrieval

KW - Pitch tracking

KW - Relevance feedback

KW - Signal processing

UR - http://www.scopus.com/inward/record.url?scp=43849100579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=43849100579&partnerID=8YFLogxK

U2 - 10.1016/j.jss.2007.05.038

DO - 10.1016/j.jss.2007.05.038

M3 - Article

AN - SCOPUS:43849100579

VL - 81

SP - 1065

EP - 1080

JO - Journal of Systems and Software

JF - Journal of Systems and Software

SN - 0164-1212

IS - 7

ER -