Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Special issue › peer-review
Research output: Contribution to Journal/Magazine › Special issue › peer-review
}
TY - JOUR
T1 - The Spoken BNC2014
T2 - designing and building a spoken corpus of everyday conversations
AU - Love, Robbie
AU - Dembry, Claire
AU - Hardie, Andrew
AU - Brezina, Vaclav
AU - McEnery, Tony
PY - 2017
Y1 - 2017
N2 - This paper introduces the Spoken British National Corpus 2014, an 11-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012-2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.
AB - This paper introduces the Spoken British National Corpus 2014, an 11-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012-2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.
KW - Spoken BNC2014
KW - transcription
KW - corpus construction
KW - spoken corpora
U2 - 10.1075/ijcl.22.3.02lov
DO - 10.1075/ijcl.22.3.02lov
M3 - Special issue
VL - 22
SP - 319
EP - 344
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
SN - 1384-6655
IS - 3
ER -