LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Zhou, Deyu; He, Yulan
Languages: English
Types: Article
Subjects:
In this paper, we discuss how discriminative training can be applied to the hidden vector state (HVS) model in different task domains. The HVS model is a discrete hidden Markov model (HMM) in which each HMM state represents the state of a push-down automaton with a finite stack size. In previous applications, maximum-likelihood estimation (MLE) is used to derive the parameters of the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold. Discriminative training, without making such assumptions, can improve the performance of the HVS model by discriminating the correct hypothesis from the competing hypotheses. Experiments have been conducted in two domains: the travel domain for the semantic parsing task using the DARPA Communicator data and the Air Travel Information Services (ATIS) data and the bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a relative error reduction rate of 31 percent in F-measure when compared with MLE on the DARPA Communicator data and 9 percent on the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4 percent in F-measure is achieved on the GENIA corpus.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] J. Dowding, R. Moore, F. Andry, and D. Moran, “Interleaving syntax and semantics in an efficient bottom-up parser,” in Proc. of the 32th Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, USA, 1994, pp. 110-116.
    • [2] W. Ward and S. Issar, “Recent improvements in the cmu spoken language understanding system,” in Proc. of the workshop on Human Language Technology, Plainsboro, New Jerey, USA, 1994, pp. 213-216.
    • [3] M. Collins, “Head-driven statistical models for natural language parsing,” Ph.D. dissertation, University of Pennsylvania, Philadelphia, PA, 1999.
    • [4] E. Charniak, “A maximum entropy inspired parser,” in 1st Meeting of North American Chapter of Association for Computational Linguistics, Seattle, Washington, 2000, pp. 132-139.
    • [5] Y. Normandin and S. D. Morgera, “An improved mmie training algorithm for speaker-independent, small vocabulary, continuous speech recognition,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '91, 1991, pp. 537 - 540.
    • [6] B. Juang, W. Chou, and C. Lee, “Statistical and discriminative methods for speech recognition,” in Speech Recognition and Understanding, ser. NATO ASI Series, Rubio, Ed. Berlin: Springer-Verlag, 1993.
    • [7] W. Chou, C. Lee, and B. Juang, “Minimum error rate training based on n-best string models,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '93, vol. 2, April 1993, pp. 652 - 655.
    • [8] J. Chen and F. Soong, “An n-best candidates-based discriminative training for speech recognition applications,” IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 206 - 216, 1994.
    • [9] B. Juang, W. Hou, and C. Lee, “Minimum classification error rate methods for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, pp. 257 - 265, 1997.
    • [10] R. Pieraccini, E. Tzoukermann, Z. Gorelov, E. Levin, C. H. Lee, and J.-L. Gauvain., “Progress report on the chronus system: Atis benchmark results,” in Proc. of DARPA Speech and Natural Language Workshop, 1992, pp. 67-71.
    • [11] S. Miller, R. Bobrow, R. Ingria, and R. Schwartz, “Hidden understanding models of natural language,” in Proc. of the 32th Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, June 1994, pp. 25-32.
    • [12] S. Miller, R. Bobrow, and R. Ingria, “statistical language processing using hidden understanding models,” in Proc. of the ARPA Human Language Technology Workshop, Plainsboro, NJ, Mar. 1994, pp. 278-282.
    • [13] S. Miller, M. Bates, R. Bobrow, R. Ingria, J. Makhoul, and R. Schwartz, “Recent progress in hidden understanding models,” in Proc. of the DARPA Speech and Natural Language Workshop, Austin, TX, Jan. 1995, pp. 276-280.
    • [14] R. Schwartz, S. Miller, D. Stallard, and J. Makhoul, “Language understanding using hidden understanding models,” in Proc. of Intl. Conf. on Spoken Language Processing, Philadelphia, PA, Oct 1996.
    • [15] S. Fine, Y. Singer, and N. Tishby, “The hierarchical hidden markov model: Analysis and applications,” Machine Learning, vol. 32, pp. 41-62, 1998.
    • [16] K. Murphy and M. Paskin, “Linear time inference in hierarchical hmms,” in Proc. of Neural Information Processing Systems, Vancouver, Canada, Dec. 2001.
    • [17] E. Charniak, “Immediate-head parsing for language models,” in Proc. of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, 2001, pp. 124 - 131.
    • [18] J. Henderson, “Inducing history representations for broad coverage statistical parsing,” in Proc. of the joint meeting of the North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 2003.
    • [19] C. Chelba and M. Mahajan, “Information extraction using the structured language model,” in Empirical Methods in Natural Language Processing, 2001.
    • [20] Y. He and S. Young, “Semantic processing using the hidden vector state model,” Computer Speech and Language, vol. 19, no. 1, pp. 85-106, 2005.
    • [21] H.-K. Kuo, E. Fosle-Lussier, H. Jiang, and C. Lee, “Discriminative training of language models for speech recognition,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '02, vol. 1, April 2002, pp. 325 - 328.
    • [22] L. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum mutual information estimation of hidden markov model parameters for speech recognition,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86, 1986, pp. 49-52.
    • [23] P. Brown, “The acoustic-modelling problem in automatic speech recognition,” Ph.D. dissertation, Carnegie-Mellon University, 1987.
    • [24] P. Gopalakrishnan, D. Kanevsky, A. Nadas, and D. Nahamoo, “An inequality for rational functions with applications to some statistical estimation problems,” IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 107 - 113, 1991.
    • [25] V. Valtchev, J. Odell, P. Woodland, and S. Young, “Latticebased discriminative training for large vocabulary speech recognition,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '96, vol. 2, May 1996, pp. 605 - 608.
    • [26] D. Povey and P. Woodland, “Minimum phone error and i-smoothing for improved discriminative training,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '02, vol. 1, April 2002, pp. 105 - 108.
    • [27] J. Henderson, “Discriminative training of a neural network statistical parser,” in Proc. of the 42th Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 95-102.
    • [28] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains,” Annual Mathematics Statistics, vol. 41, pp. 164-171, 1970.
    • [29] D. Klein and C. D. Manning, “Conditional structure versus conditional estimation in nlp models,” in Proc. the ACL-02 conference on Empirical methods in natural language processing, University of Pennsylvania, PA, 2002, pp. 9-16.
    • [30] CUData, “DARPA communicator travel data. university of colorado at boulder.” Avaiable from http://communicator.colorado.edu/phoenix, 2004.
    • [31] D. A. Dahl, M. Bates, M. Brown, W. Fisher, K. HunickeSmith, D. Pallett, C. Pao, A. Rudnicky, and E. Shriberg, “Expanding the scope of the atis task: the atis-3 corpus,” in HLT '94: Proceedings of the workshop on Human Language Technology. Morristown, NJ, USA: Association for Computational Linguistics, 1994, pp. 43-48.
    • [32] D. Zhou, Y. He, and C. K. Kwoh, “Extracting ProteinProtein Interactions from the Literature using the Hidden Vector State Model,” in International Workshop on Bioinformatics Research and Applications (LNCS 3992). Springer Berlin / Heidelberg, 2006, pp. 718-725.
    • [33] J. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “GENIA corpussemantically annotated corpus for bio-textmining,” Bioinformatics, vol. 19, no. Suppl 1, pp. i180-2, 2003.
    • [34] HTK, “Hidden Markov Model Toolkit (HTK) 3.2,” Cambridge University Engineering Department, Available from http://htk.eng.cam.ac.uk, 2002. Yulan He is a Lecturer in the Informatics Research Centre, the School of Business, the University of Reading, UK. She obtained her BASc (1st class Honors) and MEng in 1997 and 2001 respectively, both from Nanyang Technological University, Singapore. In 2004, she received her PhD degree from Cambridge University Engineering Department, UK. Between 2004 and 2007, she was an Assistant Professor with the School of Computer Engineering at Nanyang Technological University, Singapore. Her current research interests include text and data mining, machine learning, information extraction, natural langauge processing, and spoken dialogue systems.
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Cite this article