LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Loweimi, E.; Barker, J.; Hain, T. (2016)
Languages: English
Types: Other
Subjects:
Designing good normalisation to counter the effect of environmental\ud distortions is one of the major challenges for automatic speech\ud recognition (ASR). The Vector Taylor series (VTS) method is a powerful\ud and mathematically well principled technique that can be applied\ud to both the feature and model domains to compensate for both\ud additive and convolutional noises. One of the limitations of this\ud approach, however, is that it is tied to MFCC (and log-filterbank)\ud features and does not extend to other representations such as PLP,\ud PNCC and phase-based front-ends that use power transformation\ud rather than log compression. This paper aims at broadening the\ud scope of the VTS method by deriving a new formulation that assumes\ud a power transformation is used as the non-linearity during\ud feature extraction. It is shown that the conventional VTS, in the log\ud domain, is a special case of the new extended framework. In addition,\ud the new formulation introduces one more degree of freedom\ud which makes it possible to tune the algorithm to better fit the data\ud to the statistical requirements of the ASR back-end. Compared with\ud MFCC and conventional VTS, the proposed approach provides upto\ud 12.2% and 2.0% absolute performance improvements on average, in\ud Aurora-4 tasks, respectively
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Jinyu Li, Li Deng, R. Haeb-Umbach, and Y. gong, Robust Automatic Speech Recognition - A Bridge to Practical Applications (1st Edition), 306 pages, Elsevier, October 2015.
    • [2] Jinyu Li, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach, “An overview of noise-robust automatic speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 745 - 777, April 2014.
    • [3] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
    • [4] E. Loweimi, S.M. Ahadi, and T. Drugman, “A new phasebased feature representation for robust speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International conference on, May 2013, pp. 7155-7159.
    • [5] Chanwoo Kim and Richard M. Stern, “Power-normalized cepstral coefficients (pncc) for robust speech recognition.,” in ICASSP. 2012, pp. 4101-4104, IEEE.
    • [6] S. Ganapathy, “Robust speech processing using arma spectrogram models,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, April 2015, pp. 5029-5033.
    • [7] P. J. Moreno, B. Raj, and R. M. Stern, “A vector taylor series approach for environment-independent speech recognition,” in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, May 1996, vol. 2, pp. 733-736 vol. 2.
    • [8] Alex Acero, Li Deng, T. Kristjansson, and J. Zhang, “Hmm adaptation using vector taylor series for noisy speech recognition,” in Proc. Int. Conf. on Spoken Language Processing, October 2000.
    • [9] S. Bu, Y. Qian, K. C. Sim, Y. You, and K. Yu, “Second order vector taylor series based robust speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, May 2014, pp. 1769-1773.
    • [10] R. C. van Dalen and M. J. F. Gales, “Extended vts for noiserobust speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 733-743, May 2011.
    • [11] Jinyu Li, Michael L. Seltzer, and Yifan Gong, “Improvements to vts feature enhancement,” in Proc. ICASSP, 2012.
    • [12] Keiichi Tokuda, Takao Kobayashi, Takashi Masuko, and Satoshi Imai, “Mel-generalized cepstral analysis a unified approach to speech spectral estimation,” in Proc. ICSLP-94, 1994, pp. 1043-1046.
    • [13] Baris Bozkurt, Laurent Couvreur, and Thierry Dutoit, “Chirp group delay analysis of speech signals,” Speech Communication, vol. 49, no. 3, pp. 159 - 176, 2007.
    • [14] H.A. Murthy and V. Gadde, “The modified group delay function and its application to phoneme recognition,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International conference on, April 2003, vol. 1, pp. I-68-71 vol.1.
    • [15] R.M. Hegde, H.A. Murthy, and V.R.R. Gadde, “Significance of the modified group delay feature in speech recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, no. 1, pp. 190-202, Jan 2007.
    • [16] Erfan Loweimi and Seyed Mohammad Ahadi, “A new group delay-based feature for robust speech recognition,” in Multimedia and Expo (ICME), 2011 IEEE International conference on, July 2011, pp. 1-5.
    • [17] E Loweimi, J Barker, and T Hain, “Compression of modelbased group delay function for robust speech recognition,” The University of Sheffield Engineering Symposium Conference Proceedings Vol. 1, vol. 1, 2014.
    • [18] Erfan Loweimi, Jon Barker, and Thomas Hain, “Source-filter separation of speech signal in the phase domain.,” in INTERSPEECH. 2015, ISCA.
    • [19] T. Kobayashi and S. Imai, “Spectral analysis using generalized cepstrum,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 5, pp. 1087-1089, Oct 1984.
    • [20] Jae Lim, “Spectral root homomorphic deconvolution system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 3, pp. 223-233, Jun 1979.
    • [21] G. E. P. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological, pp. 211-252, 1964.
    • [22] John W. Tukey, Exploratory Data Analysis, Addison-Wesley, 1977.
    • [23] David Pearce and Hans-Gnter Hirsch, “The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions.,” in INTERSPEECH. 2000, pp. 29-32, ISCA.
    • [24] N Parihar and J Picone, “Aurora working group: Dsr front end lvcsr evaluation au/384/02,” Inst. for Signal and Information Process, Mississippi State University, Tech. Rep, vol. 40, pp. 94, 2002.
    • [25] Steve J. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book Version 3.4, Cambridge University Press, 2006.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article