LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Publisher: IEEE
Languages: English
Types: Unknown
Subjects: T
Effective representation plays an important role in automatic spoken language identification (LID). Recently, several representations that employ a pre-trained deep neural network (DNN) as the front-end feature extractor, have achieved state-of-the-art performance. However the performance is still far from satisfactory for dialect and short-duration utterance identification tasks, due to the deficiency of existing representations. To address this issue, this paper proposes the improved representations to exploit the information extracted from different layers of the DNN structure. This is conceptually motivated by regarding the DNN as a bridge between low-level acoustic input and high-level phonetic output features. Specifically, we employ deep bottleneck network (DBN), a DNN with an internal bottleneck layer acting as a feature extractor. We extract representations from two layers of this single network, i.e. DBN-TopLayer and DBN-MidLayer. Evaluations on the NIST LRE2009 dataset, as well as the more specific dialect recognition task, show that each representation can achieve an incremental performance gain. Furthermore, a simple fusion of the representations is shown to exceed current state-of-the-art performance.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Driss Matrouf, Martine Adda-Decker, Lori Lamel, and Jean-Luc Gauvain, “Language identification incorporating lexical information.,” in Proceedings of the International Conference on Spoken Language Processing, 1998, vol. 98, pp. 181-184.
    • [2] Stephen J Eady, “Differences in the F0 patterns of speech: Tone language versus stress language,” Language and Speech, vol. 25, no. 1, pp. 29-42, 1982.
    • [3] Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, and Chin-Hui Lee, “Exploring universal attribute characterization of spoken languages for spoken language recognition.,” in Proceedings of the Annual Conference of the Intermational Speech Communication Association, 2009, pp. 168-171.
    • [4] Hamid Behravan, Ville Hautamaki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen, and Chin-Hui Lee, “Dialect levelling in finnish: A universal speech attribute approach,” in Proceedings of Interspeech 2014, 2014.
    • [5] Dan Qu and Bingxi Wang, “Discriminative training of GMM for language identification,” in Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, 2003.
    • [6] Lukas Burget, Pavel Matejka, and Jan Cernocky, “Discriminative training techniques for acoustic language identification,” in Proc IEEE Int Conf Acoust Speech Signal Process, 2006, vol. 1, pp. 209-212.
    • [7] Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, and Claudio Vair, “Acoustic language identification using fast discriminative training.,” in Proceedings of the Annual Conference of the Intermational Speech Communication Association, 2007, vol. 7, pp. 346-349.
    • [8] Claudio Vair, Daniele Colibro, Fabio Castaldo, Emanuele Dalmasso, and Pietro Laface, “Channel factors compensation in model and feature domain for speaker recognition,” in Proceedings of Odyssey 2006: Speaker and Language Recognition Workshop, 2006, pp. 1-6.
    • [9] Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, and Claudio Vair, “Compensation of nuisance factors for speaker and language recognition,” IEEE Trans Audio Speech Lang Processing, vol. 15, no. 7, pp. 1969-1978, 2007.
    • [10] Valiantsina Hubeika, Lukas Burget, Pavel Matejka, and Petr Schwarz, “Discriminative training and channel compensation for acoustic language recognition.,” in Proceedings of the Annual Conference of the Intermational Speech Communication Association, 2008, pp. 301-304.
    • [11] Najim Dehak, Patrick J Kenny, Re´da Dehak, Pierre Dumouchel, and Pierre Ouellet, “Front-end factor analysis for speaker verification,” IEEE Trans Audio Speech Lang Processing, vol. 19, no. 4, pp. 788-798, 2011.
    • [12] Najim Dehak, Pedro A Torres-Carrasquillo, Douglas A Reynolds, and Reda Dehak, “Language recognition via i-vectors and dimensionality reduction.,” in Proceedings of the Annual Conference of the Intermational Speech Communication Association, 2011, pp. 857-860.
    • [13] Yan Song, Bing Jiang, YeBo Bao, Si Wei, and Li-Rong Dai, “I-vector representation based on bottleneck features for language identification,” Electron Lett, vol. 49, no. 24, pp. 1569-1570, 2013.
    • [14] Luciana Ferrer, Yun Lei, Mitchell Mclaren, and Nicolas Scheffer, “Spoken language recognition based on senone posteriors.,” in Proceedings of the Interspeech 2014, 2014.
    • [15] Yun Lei, Luciana Ferrer, Mitchell Mclaren, and Nicolas Scheffer, “Application of convolutional neural networks to language identification in noisy conditions,” in Proceedings of Odyssey 2014, 2014.
    • [16] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
    • [17] Patrick Kenny, Gilles Boulianne, and Pierre Dumouchel, “Eigenvoice modeling with sparse training data,” IEEE Trans Speech Audio Process, vol. 13, no. 3, pp. 345- 354, 2005.
    • [18] P.A.Torres-Carrasquillo and et.al. E. Singer, “Discriminative training techniques for acoustic language identification,” in Proceedings of ICASSP 2006, 2006, pp. 209-212.
    • [19] Alvin Martin and Craig Greenberg, “The 2009 NIST language recognition evaluation,” in Proceedings of Odyssey 2009: The Speaker and Language Recognition Workshop, 2010, pp. 165-171.
    • [20] Alvin Martin, Craig Greenberg, M. Howard Hohn, George R.Doddingtong, and John J. Godfrey, “Nist language recognition evaluation-past and future,” in Odyssey 2014: The Speaker and Language Recognition Workshop, 2014.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article