OpenAIRE is about to release its new face with lots of new content and services.
During September, you may notice downtime in services, while some functionalities (e.g. user registration, login, validation, claiming) will be temporarily disabled.
We apologize for the inconvenience, please stay tuned!
For further information please contact helpdesk[at]openaire.eu

fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Smith, Catherine; Adolphs, Svenja; Harvey, Kevin; Mullany, Louise (2014)
Publisher: Edinburgh University Press
Languages: English
Types: Article
Subjects:

Classified by OpenAIRE into

ACM Ref: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
The abundance of language data that is now available in digital form, and the rise of distinct language varieties that are used for digital communication, means that issues of non-standard spellings and spelling errors are, in future, likely to become more prominent for compilers of corpora. This paper examines the effect of spelling variation on keywords in a born-digital corpus in order to explore the extent and impact of this variation for future corpus studies. The corpus used in this study consists of e-mails about health concerns that were sent to a health website by adolescents. Keywords are generated using the original version of the corpus and a version with spelling errors corrected, and the British National Corpus (BNC) acts as the reference corpus. The ranks of the keywords are shown to be very similar and, therefore, suggest that, depending on the research goals, keywords could be generated reliably without any need for spelling correction.

Share - Bookmark

Cite this article

Cookies make it easier for us to provide you with our services. With the usage of our services you permit us to use cookies.
More information Ok