Automatic error tagging of spelling mistakes in learner corpora

Computing and Communications

Associated organisational units

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Published

Publication date	2011
Host publication	A Taste for Corpora: In honour of Sylviane Granger
Editors	Fanny Meunier, Sylvie De Cock, Gaëtanelle Gilquin, Magali Paquot
Place of Publication	Amsterdam
Publisher	John Benjamins
Pages	109-126
Number of pages	28
Volume	45
ISBN (electronic)	978 90 272 8708 3
ISBN (print)	978 90 272 0350 2
<mark>Original language</mark>	English

Publication series

Name	Studies in Corpus Linguistics
Publisher	John Benjamins
Volume	45
ISSN (Print)	1388-0373

Abstract

Manual error tagging of learner corpus data is time consuming and creates
a bottleneck in the analysis of learner corpora. This had led researchers to
apply techniques from the area of natural language processing to assist in the automatic analysis of such data. This chapter presents the novel application of a hybrid approach to the detection of spelling errors in learner data. The Variant Detector (VARD) software was developed to match historical spelling variants to modern equivalents with the intention of improving the accuracy and robustness of corpus linguistics techniques when applied to historical corpora. Here, we describe its application to detect spelling errors in written learner corpora consisting of 50,000 words from each of three learner backgrounds (French, German and Spanish).

Research

Associated organisational units