Accepted author manuscript, 76.9 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Building LANA-CASE, a spoken corpus of American English conversation
T2 - Challenges and innovations in corpus compilation
AU - Hanks, Elizabeth
AU - McEnery, Anthony
AU - Egbert, Jesse
AU - Larsson, Tove
AU - Biber, Douglas
AU - Reppen, Randi
AU - Baker, Paul
AU - Brezina, Vaclav
AU - Brookes, Gavin
AU - Clarke, Isobelle
AU - Bottini, Raffaella
PY - 2024/10/31
Y1 - 2024/10/31
N2 - The Lancaster-Northern Arizona Corpus of Spoken American English (LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the Spoken British National Corpus 2014) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.
AB - The Lancaster-Northern Arizona Corpus of Spoken American English (LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the Spoken British National Corpus 2014) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.
U2 - 10.32714/ricl.12.02.03
DO - 10.32714/ricl.12.02.03
M3 - Journal article
VL - 12
SP - 24
EP - 44
JO - Research in Corpus Linguistics
JF - Research in Corpus Linguistics
IS - 2
ER -