LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Krotov, A.; Hepple, M.; Gaizauskas, R.; Wilks, Y. (1999)
Publisher: Cambridge University Press
Languages: English
Types: Article
Subjects:

Classified by OpenAIRE into

ACM Ref: TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.\ud \ud In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Bies, A., Ferguson, M., Katz, K. and MacIntyre, R. (1995) Bracketing Guidelines for Treebank II Style Penn Treebank Project. Available at: ftp://ftp.cis.upenn.edu/pub/treebank/doc/manual.
    • Bod, R. (1992) A computational model of language performance: Data Oriented Parsing. Proceedings of COLING'92, pp. 855-859. Nantes, France.
    • Bod, R. (1993) Using an annotated corpus as a stochastic grammar. Proceedings of European Chapter of the Association for Computational Linguistics '93, Utrecht, The Netherlands.
    • Bonnema, R., Bod, R. and Scha, R. (1997) A DOP model for semantic interpretation. Proceedings of European Chapter of the Association for Computational Linguistics, pp. 159- 167.
    • Charniak, E. (1996) Tree-bank grammars. Proceedings 13th National Conference on Artificial Intelligence (AAAI-96), pp. 1031-1036. MIT Press.
    • Charniak, E. (1997a) Statistical parsing with a context-free grammar and word statistics. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97). MIT Press.
    • Charniak, E. (1997b) Statistical techniques for natural language parsing. AI Magazine. 18(4): 33-44.
    • Collins, M. (1996) A new statistical parser based on bigram lexical dependencies. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 184-191.
    • Gaizauskas, R. (1995) Investigations into the grammar underlying the Penn Treebank II. Research Memorandum CS-95-25, University of Sheffield.
    • Johnson, M. (1998) PCFG models of linguistic tree representations. Computational Linguistics, 24(4): 613-632.
    • Krotov, A., Gaizauskas, R. and Wilks, Y. (1994) Acquiring a stochastic context-free grammar from the Penn Treebank. Proceedings of Third Conference on the Cognitive Science of Natural Language Processing, pp. 79-86. Dublin, Ireland.
    • Krotov, A., Hepple, M., Gaizauskas, R. and Wilks, Y. (1997) Compacting the Penn Treebank grammar. Technical Report CS-97-04, Department of Computer Science, University of Sheffield.
    • Krotov, A., Hepple, M., Gaizauskas, R. and Wilks, Y. (1998) Compacting the Penn Treebank grammar. Proceedings 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 699-703.
    • Lee, K. J., Kim, J.-H., Han, Y. S. and Kim, G. C. (1997) Restricted representation of phrase structure grammar for building a tree-annotated corpus of Korean. Natural Language Engineering 3: 215-230.
    • Magerman, D. (1995) Statistical decision-tree models for parsing. Proceedings 33rd Annual Meeting of the Association for Computational Linguistics, pp. 276-283.
    • Marcus, M., Santorini, B. and Marcinkiewicz, M. A. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2): 313-330.
    • Schabes, Y., Roth, M. and Osborne, R. (1993) Parsing the Wall Street Journal with the insideoutside algorithm. Proceedings Sixth Conference of the European Association for Computational Linguistics, pp. 341-347.
    • Shirai, K., Tokunaga, T. and Tanaka, H. (1995) Automatic extraction of Japanese grammar from a bracketed corpus. Proceedings of Natural Language Processing Pacific Rim Symposium, pp. 211-216. Korea.
    • Thompson, C. A., Mooney, R. J. and Tang, L. R. (1997) Learning to parse natural language database queries into logical form. Proceedings of the ML-97 workshop on Automata Induction, Grammatical Inference and Language Acquisition.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article