Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Hughes, J; Atwell, ES (1994)
Publisher: AISB
Languages: English
Types: Other
Automatic inference of a classification of words has been carried out by several researchers recently. Although they use a variety of methods they all exploit the statistical redundancy inherent in the structure of language to differentiate words; the assumption being that words of similar roles occur in measurably similar contexts. This paper describes a general method by which clustering schemes can be qualitatively compared. This allows a systematic approach to finding the best word class formation scheme to be adopted. The process by which words are automatically grouped into classes involves a number of decision points. These include: the contextual pattern in the language being measured; the metric by which words are compared according to the pattern; and the mechanism by which items judged to be similar are merged. Alternatives are presented for each of these factors. The experiments rated each combination so that the most successful approach can be found. Previously, researcher relied on a looks-good-to-me method of self-evaluation to the judge the quality of their derived word classifications. This paper directly compares some of their adopted approaches with alternative clustering schemes not previously attempted. This allows us to formally demonstrate when our approach to clustering is more successful. The evaluation method is also shown to be a valuable aid to highlighting approaches that are inefficient. Amongst the patterns investigated were the morphological context supplied by the previous words. Bigram counts of the collocation of the words to be clustered with the last three letters of the word immediately before were found to be a remarkably good differentiation criteria. The evaluation method demonstrated that the context of the last three letters (which on average contain a lot of morphological information in English) is even better that the context supplied by using the whole of the previous word in collocation counts. Results such as this should prove useful to handwriting recognition research. The authors believe this method provides a sensible first step for handwriting recognition researchers who wish to use statistical models of language to aid the disambiguation process; proposed contextual models can be evaluated relative to previously investigated models to indicate the likely success rate of employing them. This allows a proposed poor disambiguation methods to be ruled out early on and thus is a valuable aid to saving valuable time and resources. We end by considering some further applications of automatic word class formation techniques. Although our experiments are exclusively with English corpus text, the general clustering and word-classifying algorithms should be applicable to text in other languages. This is likely to be particularly useful in development of linguistic engineering technologies for emerging nations and their mother tongues, which have little or no computational linguistics resources or computational linguistics to "hand-craft" them.
  • No references.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article