Rights statement: © 2012 John Benjamins This article has been published in International Journal of Corpus Linguistics, 17:4 2012. The publisher should be contacted for permission to re-use the material in any form.
Submitted manuscript, 806 KB, PDF document
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Children Online: A survey of child language and CMC corpora
AU - Baron, Alistair
AU - Rayson, Paul
AU - Greenwood, Phil
AU - Walkerdine, James
AU - Rashid, Awais
N1 - © 2012 John Benjamins This article has been published in International Journal of Corpus Linguistics, 17:4 2012. The publisher should be contacted for permission to re-use the material in any form.
PY - 2012
Y1 - 2012
N2 - The collection of representative corpus samples of both child language and online (CMC) language varieties is crucial for linguistic research that is motivated by applications to the protection of children online. In this paper, we present an extensive survey of corpora available for these two areas. Although a significant amount of research has been undertaken both on child language and on CMC language varieties, a much smaller number of datasets are made available as corpora. Especially lacking are corpora which match requirements for verifiable age and gender metadata, although some include self-reported information, which may be unreliable. Our survey highlights the lack of corpus data available for the intersecting area of child language in CMC environments. This lack of available corpus data is a significant drawback for those wishing to undertake replicable studies of child language and online language varieties.
AB - The collection of representative corpus samples of both child language and online (CMC) language varieties is crucial for linguistic research that is motivated by applications to the protection of children online. In this paper, we present an extensive survey of corpora available for these two areas. Although a significant amount of research has been undertaken both on child language and on CMC language varieties, a much smaller number of datasets are made available as corpora. Especially lacking are corpora which match requirements for verifiable age and gender metadata, although some include self-reported information, which may be unreliable. Our survey highlights the lack of corpus data available for the intersecting area of child language in CMC environments. This lack of available corpus data is a significant drawback for those wishing to undertake replicable studies of child language and online language varieties.
KW - child language
KW - CMC
KW - survey
U2 - 10.1075/ijcl.17.4.01bar
DO - 10.1075/ijcl.17.4.01bar
M3 - Journal article
VL - 17
SP - 443
EP - 481
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
SN - 1384-6655
IS - 4
ER -