LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Song, Yan; Cui, Ruilian; McLoughlin, Ian Vince; Dai, Li-Rong (2016)
Languages: English
Types: Unknown
Subjects: T
Recently, the i-vector representation based on deep bottleneck networks (DBN) pre-trained for automatic speech recognition has received significant interest for both speaker verification (SV) and language identification (LID). In particular, a recent unified DBN based i-vector framework, referred to as DBN-pGMM i-vector, has performed well.\ud In this paper, we replace the pGMM with a phonetic mixture of factor analyzers (pMFA), and propose a new DBN-pMFA i-vector. The DBN-pMFA i-vector includes the following improvements: (i) a pMFA model is derived from the DBN, which can jointly perform feature dimension reduction and de-correlation in a single linear transformation, (ii) a shifted DBF, termed SDBF, is proposed to exploit the temporal contextual information, (iii) a senone selection scheme is proposed to improve the i-vector extraction efficiently.\ud We evaluate the proposed DBN-pMFA i-vector on the most confused six languages selected from NIST LRE 2009. The experimental results demonstrate that DBN-pMFA can consistently outperform the previous DBN based framework. The computational complexity can be significantly reduced by applying a simple senone selection scheme.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [4] Bing Jiang, Yan Song, Si Wei, Ian McloughLin, and Li-Rong Dai, “Task-aware deep bottleneck features for spoken language identification,” in Proc. of Interspeech 2014, 2014, pp. 3012-3016.
    • [5] Yuning Song, Bo Jiang, YeBo Bao, Shaojun Wei, and LiRong Dai, “I-vector representation based on bottleneck features for language identification,” Electronics Letters, vol. 49, no. 24, pp. 1569-1570, 2013.
    • [6] Pavel Matejka, Le Zhang, Tim Ng1, Sri Harish Mallidi, Ondrej Glembek, Jeff Ma, and Bing Zhang, “Neural network bottleneck features for language identification,” in Proc. of Odyssey, 2014.
    • [7] Radek Fer, Pavel Matejka, Frantisek Grezl, Oldrich Plchot, and Jan Honza Cernocky, “Multilingual bottleneck features for language recognition,” in Proc. of Interspeech, 2015.
    • [8] Wang Geng, Jie Li, Shanshan Zhang, Xinyuan Cai, and Bo Xu, “Multilingual tandem bottleneck feature for language identification,” in Proc. of Interspeech, 2015.
    • [9] Mireia Diez, Amparo Varona, Mikel Penagarikano, L Rodriguez-Fuentes, and Germa´n Bordel, “Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition,” in Proc. of InterSpeech, 2013.
    • [10] Jeff Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Mallidi, Feipeng Li, and Hynek Hermansky, “Improvements in language identification on the rats noisy speech corpus,” in Proc. of Interspeech, 2013.
    • [11] M. A. Kohler and M. Kennedy, “Language identification using shifted delta cepstra,” in Proc. of the IEEE International Midwest Symposium on Circuits and Systems, 2002, pp. 69-72.
    • [12] Patrick Kenny, Gilles Boulianne, and Pierre Dumouchel, “Eigenvoice modeling with sparse training data,” IEEE Trans Speech Audio Process, vol. 13, no. 3, pp. 345-354, 2005.
    • [13] O. Glembek, L. Burget, P. Matejka, M. Karafiat, and P. Kenny, “Simplification and optimization of i-vector extraction,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 4516-4519.
    • [14] Z Ghahramani and G. E. Hinton, “The em algorithm for mixtures of factor analyzers,” in Technical Report CRGTR-96-1, University of Toronto, 1996.
    • [15] M. Tipping and C. Bishop, “Mixtures of probabilistic principal component analyzers,” pp. 443-482, 1999.
    • [16] Taufiq Hasan and John H. L. Hansen, “Acoustic factor analysis for robust speaker verification,” IEEE Transactions on Audio, Speech & Language Processing, vol. 21, no. 4, pp. 842-853, 2013.
    • [17] S. J. Young, J. J. Odell, and P. C. Woodland, “Tree-based state tying for high accuracy acoustic modelling,” in Proc. of HLT94, 1994, p. 307312.
    • [18] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlcek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely, “The kaldi speech recognition toolkit,” in Proc. of IEEE ASRU 2011, 2011.
    • [19] Ian Mcloughlin Li-Rong Dai Zhong-Fu Ye Ma Jin, Yan Song, “Lid-senone extraction via deep neural networks for end-to-end language identification,” in Accepted by Odyssey 2016.
  • No related research data.
  • No similar publications.

Share - Bookmark

Download from

Cite this article