Sub-band Information Fusion Based on Wavelet Thresholding for Robust Speech Recognition

Authors

1 School of Computer Engineering, Faculty of Engineering, University of Guilan, Rasht, Iran Audio and Speech Processing Lab, Department of Computer Engineering, Iran university of Science and technology, Tehran, Iran

2 Audio and Speech Processing Lab, Department of Computer Engineering, Iran university of Science and technology, Tehran, Iran

Abstract

In recent years, sub-band speech recognition has been found useful in addressing the need for robustness in speech recognition, especially for the speech contaminated by band-limited noise. In sub-band speech recognition, the full band speech is divided into several frequency sub-bands, with the result of the recognition task given by the combination of the sub-band feature vectors or their likelihoods as generated by the corresponding sub-band recognizers. In this paper, we draw on the notion of discrete wavelet transform to divide the speech signal into sub-bands. We also make use of the robust features in sub-bands in order to obtain a higher sub-band speech recognition rate. In addition, we propose a likelihood weighting and fusion method based on the wavelet thresholding technique. The experimental results indicate that the proposed weighting methods for likelihood combination and classifiers fusion improve the sub-band speech recognition rate in noisy conditions.

Keywords


[1] J.B.Allen, How do human process and recognize speech, IEEE Trans. on acoustics, speech and signal processing 2 (4), pp. 567-577, 1994. [2] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on acoustics, speech and signal processing 27 (2), pp. 113-120, 1979. [3] C. Cerisara, D.Fohr, Multi-band automatic speech recognition, Computer Speech and Language 15 (2), pp. 151-174, 2001. [4] D. L. Donoho, Denoising by soft thresholding, IEEE Trans. on Information Theory 41 (3), pp. 613-627, 1995. [5] M.J.F.Gales, S.J.Young, Robust continuous speech recognition using parallel model combination, IEEE Trans. on acoustics, speech and signal processing 4 (5), pp. 352-359, 1996. [6] A. Hagen, A. Morris, Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR, Computer Speech and Language 19 (1), pp. 3-30, 2005. [7] H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on acoustics, speech and signal processing..2 (4), pp. 578-589, 1994. [8] X. Huang, A.Acero, H. Hon, Spoken language processing, Prentice Hall, 2001. [9] S. Ikbal, H. Misra, H. Bourlard,, Phase autocorrelation derived robust speech features, In: Processing of IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 2003.
[10 ] Y. Kessentini, T. Paquet, A. B. Hamadou, Off-line handwritten word recognition using multi-stream hidden Markov models, Pattern Recognition Letters, 31 (1), pp. 60-70, 2010.
[11] C.J. Leggetter, P.C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer speech and language 9 (2), pp. 171-185, 1995.
[12] D. Mansour, B.Juang, A family of distortion measure based upon projection operation for robust speech recognition, IEEE Trans. on acoustics, speech and signal processing 37 (11), pp. 1659-1671, 1989. [13] H.A Murthy, V.Gadde, The modified group delay function and its application to phoneme recognition, In: Proc. of IEEE Int. Conf. on Acoustic, Speech, and Signal processing, 2003. [14] B.Nasersharif, A.Akbari, Improved HMM entropy for robust sub-band speech recognition, In: Proc. of 13th European Signal Processing Conferences (EUSIPCO), (2005). [15] B.Nasersharif, A.Akbari, Sub-band weighted projection measure for sub-band speech recognition in noise, IEE Electronics letter. 42, (14), pp. 829-831, 2006. [16] B.Nasersharif, A.Akbari, Application of wavelet transform and wavelet thresholding in robust sub-band speech recognition, In: Proc. of European Signal Processing Conference, 2004. [17] S. Okawa, E. Boochieri, A. Potamianos, Multi-band speech recognition in noisy environment, in: Proceeding of IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 1998. [18] K.Paliwal, L.Alsteris, Usefulness of phase spectrum in human speech perception, In: Proc. of EUROSPEECH, 2003. [19] X. Shao, J. Barker , Stream weight estimation for multistream audio–visual speech recognition in a multispeaker environment ,Speech Communication, 50, (4), pp. 337-353, 2008. [20] K.P.Soman, K.I.Ramachandran, Insight into wavelets: From Theory to Practice, Second Edition, Prentice-Hall of India, 2005. [21] F. Valente, H. Hermansky, Combination of acoustic classifiers based on Dempster–Shafer theory of evidence, In. Proc. ICASSP 2007. [22] F. Valente, Multi-stream speech recognition based on Dempster–Shafer combination rule, Speech Communication, 52, (3), pp. 213-222, 2010.[23] B.Yegnanarayana, H.A Murthy, Significance of group delay functions in spectrum estimation, IEEE Trans. on Acoustic, Speech and signal processing 40 (9), pp. 2281-2289, 1992.
[24] D.Zhu, K Paliwal, Product of power spectrum and group delay function for speech recognition, In: Proceeding IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 1, pp. 125-128, 2004.