In this paper, we propose an effective mask-estimation method for missing-feature reconstruction in order to achieve robust speech recognition in unknown noise environments. In previous work, it was found that training a model for mask estimation on speech corrupted by white noise did not provide environment-independent recognition accuracy. In this paper we describe a training method based on bands of colored noise that is more effective in reflecting spectral variations across neighboring frames and subbands. We also achieved further improvement in recognition accuracy by reconsidering frames that appeared to be unvoiced in the initial pitch analysis. Performance is evaluated using the Aurora 2.0 database in the presence of various types of noise maskers. Experimental results indicate that the proposed methods are effective in estimating masks for missing-feature reconstruction while remaining more independent of the noise conditions.
|Number of pages||4|
|Publication status||Published - 2005|
|Event||9th European Conference on Speech Communication and Technology - Lisbon, Portugal|
Duration: 2005 Sep 4 → 2005 Sep 8
|Other||9th European Conference on Speech Communication and Technology|
|Period||05/9/4 → 05/9/8|
ASJC Scopus subject areas