Automatic voice query transformation for query-by-humming systems

Sangbo Park, Suckchul Kim, Eenjun Hwang, Kwangjun Byeon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Many content-based music retrieval systems represent music using the MIDI format to improve the retrieval efficiency. In such systems, voice queries such as humming need to be transcribed into MIDI note to find out any matched music from database. In this paper, we present an ADF-based voice query processing system that transforms original voice signal into MIDI format. To perform the transformation, a sequence of pitch and duration pairs are extracted from the original voice signal. The pitch is tracked by an autocorrelation function, which is frequently used in the pitch analysis in time-domain. For the exact duration detection, we propose a novel algorithm that combines the onset detection method using ADF and pitch tracking-based duration detection method. In order to estimate the accuracy of the transformation, we tested various queries on the prototype retrieval system and report some of the results.

Original languageEnglish
Title of host publicationProceedings of the 9th IASTED International Conference on Internet and Multimedia Systems and Applications, IMSA 2005
EditorsM.H. Hamza
Pages197-202
Number of pages6
Publication statusPublished - 2005
Event9th IASTED International Conference on Internet and Multimedia Systems and Applications, IMSA 2005 - Honolulu, HI, United States
Duration: 2006 Aug 152006 Aug 17

Publication series

NameProceedings of the IASTED International Conference on Internet and Multimedia Systems and Applications, IMSA

Other

Other9th IASTED International Conference on Internet and Multimedia Systems and Applications, IMSA 2005
CountryUnited States
CityHonolulu, HI
Period06/8/1506/8/17

Keywords

  • Humming
  • Multimedia
  • Query
  • Transformation

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Automatic voice query transformation for query-by-humming systems'. Together they form a unique fingerprint.

Cite this