Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Hmad, NF (2015)
Languages: English
Types: Doctoral thesis
Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject.\ud This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speech Recognition. Moreover, the TIMIT corpus with a wide variety of American dialects is also aimed to evaluate the proposed models.\ud The availability of speech data that is time-aligned and labelled at phonemic level is a fundamental requirement for building speech recognition systems. A developed Arabic phoneme database (APD) was manually timed and phonetically labelled. This dataset was constructed from the King Abdul-Aziz Arabic Phonetics Database (KAPD) database for Saudi Arabia dialect and the Centre for Spoken Language Understanding (CSLU2002) database for different Arabic dialects. This dataset covers 8148 Arabic phonemes. In addition, a corpus of 120 speakers (13 hours of Arabic speech) randomly selected from the Levantine Arabic\ud dialect database that is used for training and 24 speakers (2.4 hours) for testing are revised and transcription errors were manually corrected. The selected dataset is labelled automatically using the HTK Hidden Markov Model toolkit. TIMIT corpus is also used for phone recognition and acoustic modelling task. We used 462 speakers (3.14 hours) for training and 24 speakers (0.81 hours) for testing. For Automatic Speech Recognition (ASR), a Deep Neural Network (DNN) is used to evaluate its adoption in developing a framewise phoneme recognition and an acoustic modelling system for Arabic speech recognition. Restricted Boltzmann Machines (RBMs) DNN models have not been explored for any Arabic corpora previously. This allows us to claim priority for adopting this RBM DNN model for the Levantine Arabic acoustic models. A post-processing enhancement was also applied to the DNN acoustic model outputs in order to improve the recognition accuracy and to obtain the accuracy at a phoneme level instead of the frame level. This post process has significantly improved the recognition performance. An Echo State Network (ESN) is developed and evaluated for Arabic phoneme recognition with different learning algorithms. This investigated the use of the conventional ESN trained with supervised and forced learning algorithms. A novel combined supervised/forced supervised learning algorithm (unsupervised adaptation) was developed and tested on the proposed optimised Arabic phoneme recognition datasets. This new model is evaluated on the Levantine dataset and empirically compared with the results obtained from the baseline Deep Neural Networks (DNNs). A significant improvement on the recognition performance was achieved when the ESN model was implemented compared to the baseline RBM DNN model’s result. The results show that the ESN model has a better ability for recognizing phonemes sequences than the DNN model for a small vocabulary size dataset. The adoption of the ESNs model for acoustic modeling is seen to be more valid than the adoption of the DNNs model for acoustic modeling speech recognition, as ESNs are recurrent models and expected to support sequence models better than the RBM DNN models even with the contextual input window. The TIMIT corpus is also used to investigate deep learning for framewise phoneme classification and acoustic modelling using Deep Neural Networks (DNNs) and Echo State Networks (ESNs) to allow us to make a direct and valid comparison between the proposed systems investigated in this thesis and the published works in equivalent projects based on framewise phoneme recognition used the TIMIT corpus. Our main finding on this corpus is that ESN network outperform time-windowed RBM DNN ones. However, our developed system ESN-based shows 10% lower performance when it was compared to the other systems recently reported in the literature that used the same corpus. This due to the hardware availability and not applying speaker and noise adaption that can improve the results in this thesis as our aim is to investigate the proposed models for speech recognition and to make a direct comparison between these models.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 3.1 Arabic Phoneme categorization. ……………………………………………………. 37 3.2 The frequency of Arabic phonemes based for the 17 female and 17 38 male speakers from the CSLU2002 database.…………………………………..
    • 3.3 The frequency of Arabic phonemes extracted from the Levantine 39 database. ..……………………………………………………………………………………..
    • AHMED, A., YU, K., XU, W., GONG, Y. & XING, E. P. 2008. Training hierarchical feed-forward visual recognition models using transfer learning from pseudo tasks. in Proceedings of the 10th European Conference on Computer Vision (ECCV'08), 69-82.
    • AL-MANIE, M. A., ALKANHAL, M. I. & AL-GHAMDI, M. M. Automatic Speech Segmentation Using the Arabic Phonetic Database. Proceedings of the 10th WSEAS International Conference on AUTOMATION & INFORMATION, 2006. 76-79.
    • AL-RADAIDEH, Q. A. & MASRI, K. H. 2011. Improving Mobile Multi-Tap Text Entry for Arabic Language. Computer Standards & Interfaces, 33 108-113.
    • AL-SHAREEF, S. & HAIN, T. 2011. An Investigation in Speech Recognition for Colloquial Arabic. INTERSPEECH, 2869-2872.
    • AL-SHAREEF, S. & HAIN, T. 2012. CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition. Interspeech.
    • ALALSHEKMUBARAK, A. 2014. Towards A Robust Arabic Speech Recognition System Based On Reservoir Computing. PhD, University of Stirling.
    • ALALSHEKMUBARAK, A. & SMITH, L. S. 2014. On Improving the Classification Capability of Reservoir Computing For Arabic Speech Recognition. inWermter, S.,Weber, C., Duch,W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P. (Eds.) , Artificial Neural Networks and Machine Learning-ICANN 2014, 24th International.
    • ALI, A. A. & HWAIDY, I. T. 2007. Hierarchical Arabic Phoneme Recognition Using MFCC Analysis. Iraq J. Electrical and Electronic Engineering, 3 97-106.
    • ANWAR, M. J., AWAIS, M. M., MASUD, S. & SHAMAIL, S. 2006 Automatic Arabic speech segmentation system. International Journal of Information Technology, 12, 102-111.
    • APPEN 2007. Levantine arabic conversational telephone speech. Linguistic Data Consortium, Philadelphia, Sydney, Australia, 2007. Catalog No: LDC2007S01 & LDC2007T01. In: LTD, A. P. (ed.).
    • BENGIO, Y. 1991. Artificial Neural Networks and Their Application to Sequence Recognition. PhD thesis, McGill University.
    • BENGIO, Y. 2009. Learning deep architectures for AI. Found. TrendsMach. Learn., 2 (1) (2009), 1-127. Foundations and Trends in Machine Learning, 2, 1-127.
    • BENGIO, Y., BOULANGER, N. & PASCANU, R. 2013a. Advances in optimizing recurrent networks. in Proc. ICASSP.
    • BENGIO, Y., COURVILLE, A. & VINCENT, P. 2013b. Representation learning: a review and new perspectives. IEEETrans.PatternAnal.Mach. Intell., 35, 1798-1828.
    • BENGIO, Y., LAMBLIN, P., POPOVICI, D. & LAROCHELLE, H. 2007. Greedy layer-wise training of deep networks. in Advances in Neural Information Processing Systems 19 (NIPS'06), 153- 160.
    • BENGIO, Y., MORI, R. D., FLAMMIA, G. & KOMPE, F. 1991. Global optimization of a neural network - Hidden Markov model hybrid. in Proc. Proc. Eurospeech.
    • BENGIO, Y., SIMARD, P. & FRASCONI, P. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on, Neural Networks, 5, 157-166.
    • BENZEGHIBA, M., MORI, R. D., DEROO, O., DUPONT, S., ERBES, T., JOUVET, D., FISSORE, L., LAFACE, P., MERTINS, A., RIS, C., ROSE, R., TYAGI, V. & WELLEKENS, C. 2007. Automatic Speech Recognition and Speech Variability: A Review. Speech Communication 49, 763- 786.
    • BIADSY, F. 2011. Automatic Dialect and Accent Recognition and its Application to Speech Recognition. COLUMBIA UNIVERSITY.
    • BIADSY, F., HABASH, N. & HIRSCHBERG, J. 2009. Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL '09, (Stroudsburg, PA, USA), 397-405.
    • BILLA, J., NOAMANY, M., SRIVASTAVA, A., LIU, D., STONE, R., XU, J., MAKHOUL, J. & KUBALA, F. 2002a. Audio Indexing of Arabic broadcast news. Acoustics, Speech, and Signal Processing (ICASSP), 1, I-5 - I-8
    • BILLA, J., NOAMANY, M., SRIVASTAVA, A., MAKHOUL, J. & KUBALA, F. 2002b. Arabic Speech and Text in TIDES OnTAP. in Proceedings of the second international conference on Human Language Technology Research, HLT '02, (San Francisco, CA, USA), 7-11 Morgan Kaufmann Publishers Inc.
    • BOURLARD, H. & MORGAN, N. 1993. Connectionist Speech Recognition: A Hybrid Approach, KLUWER ACADEMIC PUBLISHERS.
    • BROWN, P. 1987. The Acoustic-Modeling Problem in Automatic Speech Recognition. PhD thesis, Carnegie-Mellon University,USA,.
    • BRUGNARA, F., FALAVIGNA, D. & OMOLOGO, M. 1992. A HMM-based system for automatic segmentation and labeling of speech. The Second International Conference on Spoken Language Processing, ICSLP 1992. Banff, Alberta, Canada.
    • BRUGNARA, F., FALAVIGNA, D. & OMOLOGO, M. 1993. Automatic segmentation and labeling of speech based on hidden Markov models. Speech Communication, 12, 357-370.
    • BUCKWALTER, T. 2002. LDC Buckwalter Arabic Morphological Analyzer (BAMA). Version 1.0. LDC Catalog No. LDC2002L49, ISBN: 1-58563-257-0.
    • CAMPBELL, N. 1996. Autolabelling Japanese ToBI. Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference, 4, 2399 - 2402.
    • CHEN, C. H. 1988. Signal processing handbook, New York.
    • CHEN, R. & JAMIESON, L. 1996. Experiments on the implementation of recurrent neural networks for speech phone recognition Proceedings of the thirtieth annual asilomar conference on signals. systems and computers 779-782.
    • COLLOBERT, R. & WESTON, J. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), 160-167.
    • COSI., P., D. FALAVIGNA & OMOLOGO, M. 1991. A Preliminary Statistical Evaluation of Manual and Automatic Segmentation Discrepancies In EUROSPEECH, 693-696.
    • COX, S., BRADY, R. & JACKSON, P. 1998. TECHNIQUES FOR ACCURATE AUTOMATIC ANNOTATION OF SPEECH WAVEFORMS. In Proceedings of ICSLP '98 (Sydney, Australia), 5, 1947-1950.
    • DAHL, G., YU, D., DENG, L. & ACERO, A. 2011. Context-dependent DBN-HMMs in large vocabulary continuous speech recognition. in Proc. ICASSP.
    • DAHL, G. E., RANZATO, M., MOHAMED, A. & HINTON, G. E. 2010. Phone recognition with the mean-covariance restricted Boltzmann machine. In NIPS'2010.
    • DAHL, G. E., YU, D., DENG, L. & ACERO, A. 2012. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 33-42.
    • DAI, J., VENAYAGAMOORTHY, G. K. & HARLEY, R. G. 2010. An Introduction to the Echo State Network and its Applications in Power System. IEEE Xplore.
    • DAVIS, S. B. & MERMELSTEIN, P. 1980. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions On Acoustic, Speech, And Signal Processing, ASSP-28, 357-366.
    • DEHAK, N., KENNY, P. J., DEHAK, R., DUMOUCHEL, P. & P.OUELLET 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 19, 788-798.
    • DEMPSTER, A., LAIRD, N. & RUBIN, D. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-38.
    • DENG, L., SELTZER, M., YU, D., ACERO, A., MOHAMED, A. & HINTON, G. 2010. Binary coding of speech spectrograms using a deep auto-encoder. in Proc. Interspeech.
    • DOYA, K. 1992. Bifurcations in the learning of recurrent neural networks. In: Proceedings of IEEE International Symposium on Circuits and Systems, 6, 2777-2780.
    • DUDA, R. O., HART, P. E. & STORK, D. G. 2001. Pattern Classification.
    • EL-IMAM, Y. A. 2004. Phonetization of Arabic: rules and algorithms. In Computer Speech and Language, 18, 339-373.
    • ELLIS, D. P. W., SINGH, R. & SIVADAS, S. 2001. Tandem acoustic modeling in large-vocabulary recognition. in Proc. ICASSP.
    • ELMAHDY, M., GRUHN, R. & MINKER, W. 2012a. Fundamentals. Novel Techniques for Dialectal Arabic Speech Recognition. Sphingers.
    • ELMAHDY, M., HASEGAWA-JOHNSON, M. & MUSTAFAWI, E. 2012b. A Baseline Speech Recognition System for Levantine Colloquial Arabic.
    • FRANZ, A. & MILCH, B. 2002. Searching the web by voice. Proc. Computational Linguistics, 1213- 1217.
    • FREUND, Y. & HAUSSLER, D. 1994. Unsupervised learning of distributions on binary vectors using two layer networks. In: REPORT, T. (ed.). University of California at Santa Cruz, Santa Cruz, CA, USA.
    • FURUI, S. 1986. Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoustics, Speech and Signal Processing, 52-59.
    • G.-DOMINGUEZ, J., L.-MORENO, I., MORENO, P. J. & G.-RODRIGUEZ, J. 2014. Frame-by-frame language identification in short utterances using deep neural networks. Artical in press,Neural Networks.
    • GALES, M. & YOUNG, S. 2007. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1, 195-304.
    • GALES, M. J. F. 1998. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language, 12, 75-98.
    • GALES, M. J. F. 2007. Discriminative Models for Speech Recognition. in ITA Workshop, University San Diego, USA, February 2007.
    • GAMMAS, D., JUDITH, N. & NORA, C. 2013. Arabic Pod 101 [Online].
    • GAROFOLO, J. S., LAMEL, L. F., FISHER, W. M., FISCUS, J. G., PALLETT, D. S. & DAHLGREN, N. L. 1993. Darpa timit acoustic phonetic continuous speech corpus cdrom.
    • GLOROT, X. & BENGIO, Y. 2010. Understanding the difficulty of training deep feedforward neural networks. in Proc. AISTAT.
    • GOLDMAN, J.-P. 2011. EasyAlign: a friendly automatic phonetic alignment tool under Praat.
    • GR´EZL, F., KARAFIA´T, M., KONTA´R, S. & CERNOCKY, J. 2007. Probabilistic and bottle-neck features for LVCSR of meetings. ICASSP'07. Hononulu.
    • GRAVES, A. 2008. Supervised Sequence Labelling with Recurrent Neural Networks.
    • GRAVES, A. 2012. Supervised Sequence Labelling with Recurrent Neural Networks, volume 385 of Studies in Computational Intelligence. Springer, .
    • GRAVES, A., FERNANDEZ, S., GOMEZ, F. & SCHMIDHUBER, J. 2006. Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. in Proc. ICML.
    • GRAVES, A. & JAITLY, N. 2014. Towards End-to-End Speech Recognition with Recurrent Neural Networks. Proceedings of the 31 st International Conference on Machine Learning. Beijing, China: JMLR: W&CP.
    • GRAVES, A., MOHAMED, A. & HINTON, G. 2013. Speech recognition with deep recurrent neural networks. In Proc ICASSP 2013. Vancouver, Canada.
    • GRAVES, A. & SCHMIDHUBER, J. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18, 602-610.
    • GRAVES, A. & SCHMIDHUBER, J. 2009. Offline handwriting recognition with multidimensional recurrent neural networks. in Neural Information Processing Systems 21, 545-552.
    • GRE´ZL, F. & FOUSEK, P. 2008. Optimizing bottle-neck features for lvcsr. Acoustics, Speech and Signal Processing. ICASSP 2008. IEEE International Conference on Las Vegas, NV IEEE.
    • GRUBB, A. & BAGNELL, J. A. 2013. Stacked Training for Overfitting Avoidance in Deep Networks. Appearing at the ICML 2013 Workshop on Representation Learning.
    • HABASH, N. 2010. Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies, 3, 1-187.
    • HACHKAR, Z., MOUNIR, B., FARCHI, A. & ABBADI, J. E. 2011. Comparison of MFCC and PLP Parameterization in Pattern Recognition of Arabic Alphabet Speech. Canadian Journal on Artificial Intelligence, Machine Learning & Pattern Recognition, 2, 56-60.
    • HADSELL, R., ERKAN, A., SERMANET, P., SCOFFIER, M., MULLER, U. & LECUN, Y. 2008. Deep belief net learning in a long-range vision system for autonomous offroad driving. in Proc. Intelligent Robots and Systems (IROS'08), 628-633.
    • HAIN, T. 2001. Hidden Model Sequence Models for Automatic Speech Recognition. PhD thesis, University of Cambridge.
    • HAIN, T. & WOODLAND, P. C. 1998. CU-HTK acoustic modeling experiments. The 9th Conversational Speech Recognition Workshop, . MITAGS, Linthicum Heights, Maryland.
    • HALBERSTADT, A. 1998. Heterogeneous acoustic measurements and multiple classifiers for speech recognition. PhD thesis, Massachusetts Institute of Technology.
    • HAWKINS, P. 1988. Introducing phonology, Australia, Pty Ltd.
    • HEINTZ, I. 2010. Arabic Language Modeling With Stem-Derived Morphemes For Automatic Speech Recognition. PhD thesis, The Ohio State University.
    • HERMANS, M. & SCHRAUWEN, B. 2013. Training and Analyzing Deep Recurrent Neural Networks.
    • HERMANSKY, H. 1990. Perceptual linear predictive (PLP) analysis of speech. Acoustical Society of America Journal, 87, 1738-1752.
    • HERMANSKY, H., ELLIS, D. & SHARMA, S. 2000. Tandem connectionist feature extraction for conventional HMM systems. ICASSP-2000, Istanbul, 1635-1638.
    • HERMANSKY, H. & SHARMA, S. 1998. TRAPS - Classifiers of Temporal Patterns. In Proc. International Conference on Spoken Language Processing (ICSLP), 1003-1006.
    • HINTON, G., DENG, L., YU, D., DAHL, G., MOHAMED, A., JAITLY, N., SENIOR, A., VANHOUCKE, V., NGUYEN, P., SAINATH, T. & KINGSBURY, B. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine, 29, 82-97.
    • HINTON, G. & SALAKHUTDINOV, R. 2006. Reducing the dimensionality of data with neural networks.
    • HINTON, G. E. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771-1800.
    • HINTON, G. E., OSINDERO, S. & TEH, Y. 2006. A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.
    • HMAD, N. & ALLEN, T. Biologically inspired Continuous Arabic Speech Recognition. In: PETRIDIS, M. B. A. M., ed. Research and Development in intelligent systems XXIX, 2012 Cambridge, UK. Springer London, 245-258.
    • HMAD, N. & ALLEN, T. 2013. Echo State Networks for Arabic phoneme recognition. World Academy of Science, Engineering and Technology, International Journal of Computer, Control, Quantum and Information Engineering 7.
    • HMIDT, S. P., WIERING, M. A., ROSSUM, A. C. V., ELBURG, R. A. J. V., ANDRINGA, T. C. & VALKENIER, B. 2010. Robust Real-TimeVowel Classification with an Echo StateNetwork. [Accessed 2010].
    • HOCHREITER, S. & SCHMIDHUBER, J. 1997. Long short-term memory. Neural Computation, 9, 1735-1780.
    • HOLZMANN, G. 2008. Echo State Networks with Filter Neurons and a Delay&Sum Readout with Applications in Audio Signal Processing. Master Master's Thesis, Graz University of Technology.
    • HOSOM, J. P. 2000a. Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information. Oregon Graduate Institute of Science and Technology.
    • HOSOM, J. P. 2000b. A Comparison of Speech Recognizers Created Using Manually-Aligned and Automatically-Aligned Training Data. Technical Report ESE-00-002. Beaverton: Center for Spoken Language Understanding (CSLU), Oregon Graduate Institute of Science and Technology (OGI).
    • HOSOM, J. P. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication, 51, 352-368.
    • HUANG, X., ACERO, A., HON, H. W. & REDDY, R. 2001. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, 1st edition.
    • ISMAIL, S. & AHMAD, A. B. 2004. Recurrent Neural Network with Backpropagation Through Time Algorithm for Arabic Recognition. European Simulation Multiconference, 2004 Graham Horton. SCS Europe.
    • JAEGER, H. 2001. The "echo state" approach to analysing and training recurrent neural networks. In: 148, T. R. G. R. (ed.). German National Research Center for Information Technology.
    • JAEGER, H. 2005. A Tutorial On Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF And The "Echo State Network" Approach. Fraunhofer Institute for Autonomous Intelligent Systems (AIS).
    • JAEGER, H. 2007. Echo state network. Scholarpedia 2, 2330.
    • JAEGER, H. 2012. Long Short-Term Memory in Echo State Networks: Details of a simulation study. Technical Report 27 School of Engineering and Science: Jacobs University Bremen
    • JAEGER, H. & HAAS, H. 2004. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Telecommunication. Science.
    • JAEGER, H., LUKOSEVICIUS, M. & POPOVICI, D. 2007. Optimization and Applications of Echo State Networks with Leaky Integrator Neurons.
    • JALALVAND, A., TRIEFENBACH, F., VERSTRAETEN, D. & MARTENS, J. P. 2011. Connected digit recognition by means of reservoir computing. In: Proceedings of Interspeech 2011, 1725- 1728.
    • JEAGER, H. 2003. Adaptive Nonlinear System Identification With Echo State Networks.
    • JOU, S. C. S. 2008. Automatic Speech Recognition on Vibrocervigraphic and Electromyographic Signals. PhD, Carnegie Mellon University.
    • KINGSBURY, B., SAINATH, T. N. & SOLTAU, H. 2012. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization in Proc. interspeech.
    • KIRCHHOFF, K., BILMES, J., DAS, S., DUTA, N., EGAN, M., JI, G., HE, F., HENDERSON, J., LIU, D., NOAMANY, M., SCHONE, P., SCHWARTZ, R. & VERGYRI, D. 2002. Novel Approaches To Arabic Speech Recognition. the 2002 johns-hopkins summer workshop.
    • KVALE, K. 1994. On the connection between manual segmentation conventions and "errors" made by automatic segmentation. In Proceeding of ICSLP '94 (Yokohama, Japan) 3.
    • LAVIE, A., WAIBEL, A., LEVIN, L., FINKE, M., GATES, D., GAVALDA, M., ZEPPENFELD, T. & ZHAN, P. 1997. Jansus-III: speech-to-speech translation in multiple languages. Proc. ICASSP, 99- 102.
    • LEE, H., GROSSE, R., RANGANATH, R. & NG, A. Y. 2009. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In Proceedings of the 26th International Conference on Machine Learning, 609-616.
    • LEE, L.-S. 1997. Voice dictation of Mandarin Chinese. IEEE Signal Processing Magazine, 14, 63- 101.
    • LEE, S.-M., FANG, S.-H., HUNG, J.-W. & LEE, L.-S. 2001. Improved MFCC Feature Extraction by PCA-Optimized Filter Bank for Speech Recognition. Automatic Speech Recognition and Understanding, 49-52.
    • LEGGETTER, C. J. & WOODLAND, P. C. 1995. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9, 171 - 185.
    • LEUNG, H. & ZUE, V. W. 1984. A procedure for automatic alignment of phonetic transcriptions with continuous speech. In Proceedings of ICASSP '84 (San Diego, California), 9, 2.7.1- 2.7.4.
    • LEVNER, I. 2008. Data Driven Object Segmentation. PhD Univesity of Alberta.
    • LIMA, A., ZEN, H., NANKAKU, Y., MIYAJIMA, C., TOKUDA, K. & KITAMURA, T. 2004. On the Use of Kernel PCA for Feature Extraction in Speech Recognition. IEICE Trans. Inf. & Syst., E87-D, 2802-2811.
    • LIMA, A., ZEN, H., NANKAKU, Y., TOKUDA, K., KITAMURA, T. & RESENDE, F. G. 2005. Applying Sparse KPCA for Feature Extraction in Speech Recognition. IEICE Trans. Inf. & Syst., E88- D, 401-409.
    • LJOLJE, A., HIRSCHBERG, J. & SANTEN, J. P. H. V. 1994. Automatic speech segmentation for concatenative inventory selection. In SSW2, 93-96.
    • LJOLJE, A., HIRSCHBERG, J. & SANTEN, J. V. 1997. Automatic speech segmentation for concatenative inventory selection. Progress in Speech Synthesis, Springer Verlag, New York, 305-311.
    • LLOYD, S. P. 1982. least Squares quantization in PCM. In IEEE Transactions on Information Theory, 28 129-137.
    • LOAN, C. V. 1992. Computational Frameworks for the Fast Fourier Transform.
    • MALFRÈRE, F. & DEROO, O. 1998. Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN. The 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia.
    • MARTENS, J. 2010. Deep learning with Hessian-free optimization. in Proc. ICML.
    • MARTENS, J. & SUTSKEVER, I. 2011. Learning recurrent neural networks with Hessian-free optimization. in Proc. ICML.
    • MIAO, Y., ZHANG, H. & METZE, F. 2014. Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models.
    • MIKOLOV, T., KARAFIAT, M., BURGET, L., CERNOCKY, J. & KHUDANPUR, S. 2010. Recurrent neural network based language model. in Proc. ICASSP, 1045-1048.
    • MITRA, V., WANG, W., FRANCO, H., LEI, Y., BARTELS, C. & GRACIARENA, M. 2014. Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions. INTERSPEECH 2014. Singapore: ISCA.
    • MNIH, A. & HINTON, G. E. 2009. A scalable hierarchical distributed language model. in Advances in Neural Information Processing Systems 21 (NIPS'08), 1081-1088.
    • MOHAMED, A.-R. 2014. Deep Neural Network acoustic models for ASR. PhD thesis, University of Toronto.
    • MOHAMED, A., DAHL, G. & HINTON, G. 2009. Deep belief networks for phone recognition. in NIPS Workshop on Deep Learning for Speech Recognition and Related Applications.
    • MOHAMED, A., DAHL, G. & HINTON, G. 2012. Acoustic modeling using deep belief networks. IEEE Trans. on Audio, Speech and Language Processing, 20, 14-22.
    • MOHAMED, A., YU, D. & DENG, L. 2010. Investigation of full-sequence training of deep belief networks for speech recognition. in Proc. Interspeech.
    • MORGAN, N. 1990. The ring array processor (RAP): Algorithms and architecture. International Computer Science Institute.
    • MORGAN, N. 2012. Deep and wide: multiple layers in automatic speech recognition. IEEE Trans.Audio Speech, Lang. Process, 20, 7-13.
    • MORGAN, N. & FOSLER-LUSSIER, E. 1998. Combining Multiple Estimators Of Speaking Rate.
    • MORGAN, N., ZHU, Q., STOLCKE, A., SÖNMEZ, K., SIVADAS, S., SHINOZAKI, T., OSTENDORF, M., JAIN, P., H. HERMANSKY, ELLIS, D., DODDINGTON, G., CHEN, B., ÇETIN, Ö., BOURLARD, H. & ATHINEOS, M. 2005. Pushing the Envelope-Aside. Signal Processing Magazine, IEEE, 22, 81-88.
    • MOSA, G. S. & ALI, A. A. 2009. Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC Feature Extraction. Signal Processing: An International Journal (SPIJ), 3, 161- 171.
    • MURVEIT, H., BUTZBERGER, J., DIGALAKIS, V. & WEINTRAUB, M. 1993. Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques. Proc. ICASSP, 319-322.
    • NAHAR, K. M. O., ELSHAFEI, M., AL-KHATIB, W. G., AL-MUHTASEB, H. & ALGHAMDI, M. M. 2012. Statistical Analysis of Arabic Phonemes for Continuous Arabic Speech Recognition. International Journal of Computer and Information Technology, 01.
    • NOVOTNEY, S., SCHWARTZ, R. M. & KHUDANPUR, S. 2011. Unsupervised Arabic Dialect Adaptation with Self-Training. in INTERSPEECH, 541-544.
    • O'SHAUGHNESSY, D. 1987. Speech communication - human and machine.
    • OZTURK, M. C. & PRINCIPE, J. C. 2007. An Associative Memory Rreadout for ESNs with Applications to Dynamical Pattern Recognition. Neural Networks 20, 377-390.
    • P.S. GOPALAKRISHNAN, D. KANEVSKY, A. N´ADAS & NAHAMOO, D. 1991. An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans. Information Theory.
    • PEABODY, M. A. 2011. Methods for pronunciation assessment in computer aided language learning. PhD, Cambridge.
    • POVEY, D. 2004. Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University.
    • POVEY, D. & WOODLAND, P. C. 2002. Minimum phone error and I-smoothing for improved discriminative training,” ., , May i.n Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Orlando, FL.
    • RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE, 77, 257-286.
    • RABINER, L. R. & JUANG, B. H. (eds.) 1993. Fundamentals of Speech Recognition, Englewood Cliffs, New Jersey: Prentice Hall PTR.
    • RABINER, L. R. & SCHAFER, R. W. (eds.) 1978. Digital Processing of Speech Signals, New Jersey: Prentice-Hall.
    • RANZATO, M., BOUREAU, Y. & LECUN, Y. 2007. Sparse feature learning for deep belief networks. in Proc. NIPS.
    • RANZATO, M. & SZUMMER, M. 2008. Semi-supervised learning of compact document representations with deep networks. in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), 307, 792-799.
    • RAPP, S. 1995. Automatic phonemic transcription and linguistic annotation from known text with Hidden Markov Models. In Proceeding of ELSNET Goes East and IMACS Workshop.
    • ROBINSON, A. J. 1994. An application of recurrent nets to phone probability estimation IEEE, Transactions on Neural Networks, 5, 298 - 305.
    • SAINATH, T. N., KINGSBURY, B., SAON, G., SOLTAU, H., MOHAMED, A.-R., DAH, G. & RAMABHADRAN, B. 2014. Deep convolutional neural networks for large-scale speech tasks. Neural Networks.
    • SAKENAS, V. 2010. Distortion Invariant Feature Extraction with Echo State Networks. School of Engineering and Science. School of Engineering and Science Jacobs University Bremen gGmbH.
    • SALAKHUTDINOV, R. & HINTON, G. E. 2007a. Learning a nonlinear embedding by preserving class neighbourhood structure. in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07).
    • SALAKHUTDINOV, R. & HINTON, G. E. 2007b. Semantic hashing. in Proceedings of the 2007 Workshop on Information Retrieval and applications of Graphical Models (SIGIR 2007).
    • SALAKHUTDINOV, R. & HINTON, G. E. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. in Advances in Neural Information Processing Systems 20 (NIPS'07), 1249-1256.
    • SALIMI, H., GIVEKI, D., SOLTANSHAHI, M. A. & HATAMI, J. 2012. Extended Mixture of MLP Experts by Hybrid of Conjugate Gradient Method and Modified Cuckoo Search. International Journal of Artificial Intelligence & Applications (IJAIA), , 3.
    • SALMEN, M. & PLOGER, P. G. 2005. Echo State Networks used for Motor Control. IEEE, 1953- 1958.
    • SANTEN, J. V. & SPROAT, R. 1999. High-accuracy automatic segmentation. In Proc. EuroSpeech Budapest, Hungary.
    • SAON, G., SOLTAU, H., NAHAMOO, D. & PICHENY, M. 2013. Speaker adaptation of neural network acoustic models using i-vectors. in Proc. ASRU, 55-59.
    • SCHAFER, R. W. & RABINER, L. R. 1975. Digital representations of speech signals. Proc. of the IEEE, 63, 662-677.
    • SCHRAUWEN, B. & BUSING, L. 2010. A Hierarchy of Recurrent Networks for Speech Recognition.
    • SCHUSTER, M. 1999. On supervised learning from sequential data with applications for speech recognition. PhD thesis, Nara Institute of Science and Technolog.
    • SEIDE, F., LI, G. & YU, D. 2011. Conversational speech transcription using context-dependent deep neural networks. In Interspeech 2011, 437-440.
    • SELOUANI, S. A. & CAELEN, J. 1999. Arabic Phonetic Features Recognition using Modular Connectionist Architectures.
    • SIEGLER, M. A. & STERN, R. M. 1995. On The Effects Of Speech Rate in Large Vocabulary Speech Recognition Systems.
    • SJÖLANDER, K. 2003. An HMM-based system for automatic segmentation and alignment of speech. Umeå University, Department of Philosophy and Linguistics, PHONUM 9 93-96.
    • SKOWRONSKI, M. D. & HARRIS, J. G. 2006. Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model. Island of Kos, Greece: IEEE International Symposium on Circits Systems,.
    • SKOWRONSKI, M. D. & HARRIS, J. G. 2007. Automatic Speech Recognition Using a Predictive Echo State Network Classifier. Neural Networks, 20, 414-423.
    • SMOLENSKY, P. 1986. Information processing in dynamical systems: foundations of harmony theory, Parall. distrib. process.
    • SOLTAU, H., SAON, G., KINGSBURY, B., KUO, J., MANGU, L., POVEY, D. & ZWEIG, G. 2007. The IBM 2006 Gale Arabic ASR System Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE International Conference, 4, IV-349 - IV-352
    • SRIVASTAVA, G. H. N., KRIZHEVSKY, A., SUTSKEVER, I. & SALAKHUTDINOV, R. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR,abs/1207.0580.
    • SRIVASTAVA, N., HINTON, G., KRIZHEVSKY, A., SUTSKEVER, I. & SALAKHUTDINOV, R. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
    • SUTSKEVER, I., MARTENS, J. & HINTON, G. 2011. Generating text with recurrent neural networks in Proc. ICML.
    • SVENDSEN, T. & KVALE, K. 1990. Automatic alignment of phonemic labels with continuous speech. The First International Conference on Spoken Language Processing, ICSLP 1990. Kobe, Japan.
    • TASHAN, T. 2012. Biologically Inspired Speaker Verification. PhD thesis, Nottingham Trent University.
    • TAYLOR, G. & HINTON, G. 2009. Factored conditional restricted Boltzmann machines for modeling motion style. in Proceedings of the 26th International Conference on Machine Learning (ICML'09), 1025-1032.
    • TAYLOR, G. W., FERGUS, R., LECUN, Y. & BREGLER, C. 2010. Convolutional learning of spatiotemporal features. In Eurpean Conference on Computer Vision.
    • TEBELSKIS, J. 1995. Speech Recognition using Neural Networks. PhD, Carnegie Mellon University.
    • TOLEDANO, D. T., GOMEZ, L. A. H. & GRANDE, L. V. 2003. Automatic phoneme segmentation. IEEE Trans. Speech and Audio Proc., 11, 617-625.
    • TONG, M. H., BICKETT, A. D., CHRISTIANSEN, E. M. & COTTRELL, G. W. 2007. Learning Grammatical Structure with Echo State Networks. Neural Networks 20, 424-432.
    • TRIEFENBACH, F., DEMUYNCK, K. & MARTENS, J.-P. 2012. Improving large vocabulary continuous speech recognition by combining GMM-based and reservoir-based acoustic modeling. Spoken Language Technology Workshop (SLT), IEEE 107-112
    • TRIEFENBACH, F., JALALVAND, A., DEMUYNCK, K. & MARTENS, J. 2013. Acoustic Modeling With Hierarchical Reservoirs. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 21, 2439-2450.
    • TRIEFENBACH, F., JALALVAND, A., SCHRAUWEN, B. & MARTENS, J. P. 2011. Phoneme recognition with large hierarchical reservoirs. In: Advances in Neural Information Processing Systems 23 (NIPS 2010), MIT Press, Cambridge, 2307-2315.
    • VENAYAGAMOORTHY, G. K. 2007. Online Design of an Echo State Network Based Wide Area Monitor for a Multimachine Power System. Neural Networks 20, 404-413.
    • VERGYRI, D. & KIRCHHOFF, K. 2004. Automatic diacritization of Arabic for Acoustic Modeling in Speech Recognition. COLING 2004 Computational Approaches to Arabic Script-based Languages, 66-73.
    • VERGYRI, D., KIRCHHOFF, K., GADDE, R., STOLCKE, A. & ZHENG, J. 2005. Development of a Conversational Telephone Speech Recognizer for Levantine Arabic. In Proceedings SRI Publication.
    • VERGYRI, D., MANDAL, A., WANG, W., STOLCKE, A., ZHENG, J., GRACIARENA, M., RYBACH, D., GOLLAN, C., SCHLUTER, R., KIRCHHOFF, K., FARIA, A. & MORGAN, N. 2008. Development of the SRI/Nightingale Arabic ASR system. In Proceedings of Interspeech, 1437-1440.
    • VERSTRAETEN, D., SCHRAUWEN, B., D'HAENE, M. & STROOBANDT, D. 2007. An experimental unification of reservoir computing methods. Neural Networks, 20 391-403.
    • VESEL´Y, K., GHOSHAL, A., BURGET, L. & POVEY, D. 2013. Sequence-discriminative training of deep neural networks. interspeech.
    • VETTER, R., VIRAG, N., RENEVEY, P. & VESIN, J.-M. 1999. Single Channel Speech Enhancement Using Principal Component Analysis and MDL Subspace Selection. Eurospeech.
    • WAGNER, M. 1981. Automatic labelling of continuous speech with a given phonetic transcription using dynamic programming algorithms. Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '81, 6.
    • WAHEED, K., WEAVER, K. & SALAM, F. M. 2002. A Robust Algorithm for Detecting Speech Segments Using an Entropic Contrast. In proc. of the IEEE Midwest Symposium on Circuits and Systems. Lida Ray Technologies Inc., 45.
    • WAHLSTER, W. 2000. Verbmobil: foundations of speech-to-speech translation. Springer-Verlag Berlin Heidelberg.
    • WERBOS, P. J. 1990. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE 78, 1550-1560.
    • WESTON, J., RATLE, F. & COLLOBERT, R. 2008. Deep learning via semi-supervised embedding. in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), 1168-1175.
    • WIGHTMAN, C. & TALKIN, D. 1997. The Aligner: Text to speech alignment using Markov Models. Progress in Speech Synthesis, Springer Verlag, New York, 313-323.
    • WILLIAMS, R. J. & ZIPSER, D. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1, 270-280.
    • WITT, S. & YOUNG, S. 1997. Language learning based on non-native speech recognition. Proc. Eurospeech, 633-636.
    • WOODLAND, P. C., LEGGETTER, C. J., ODELL, J. J., VALTCHEV, V. & YOUNG, S. J. 1995. The 1994 HTK large vocabulary speech recognition system. In Proceedings of ICASSP '95 (Detroit, MI).
    • WOODLAND, P. C. & POVEY, D. 2002. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language, 16, 25-47.
    • X.D. HUANG, A. ACERO & HON., H. W. 2001. Spoken Language Processing.
    • XU, Y., GOLDIE, A. & SENEFF, S. 2009. Automatic question generation and answer judging: a Q&A game for language learning. Proc. SIGSLaTE.
    • YOUNG, S., EVERMANN, G., GALES, M., HAIN, T., KERSHAW, D., LIU, X., MOORE, G., ODELL, J., OLLASON, D., POVEY, D., VALTCHEV, V. & WOODLAND, P. (eds.) 2006. The HTK book, version 3.4.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article