In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.
- Acoustic modeling
- Large vocabulary continuous speech recognition
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering
- Experimental and Cognitive Psychology
- Linguistics and Language