Home > Research > Publications & Outputs > The written British National Corpus 2014

Electronic data

  • 2018hawtinphd

    Final published version, 3.01 MB, PDF document

    Available under license: CC BY-NC-ND

Text available via DOI:

View graph of relations

The written British National Corpus 2014: design, compilation and analysis

Research output: ThesisDoctoral Thesis

Unpublished
  • Abigail Hawtin
Close
Publication date2019
Number of pages360
QualificationPhD
Awarding Institution
Supervisors/Advisors
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

The ESRC-funded Centre for Corpus Approaches to Social Science at Lancaster University (CASS) and the English Language Teaching Group at Cambridge University Press (CUP) have collaborated to compile a new, publicly accessible corpus of contemporary Written British English, known as the Written British National Corpus 2014 (Written BNC2014). The Written BNC2014 is an updated version of the Written British National Corpus (Written BNC1994) which was created in the 1990s. The Written BNC1994 is often used as a proxy for present day British English, so the Written BNC2014 has been created in order to allow for both comparisons between the two corpora, and also to allow for research on British English to be carried out using a state-of-the-art contemporary data-set. The Written BNC2014 contains approximately 90 million words of written British English, published between 2010-2018, from a wide variety of genres. The corpus will be publicly released in 2019.
This thesis presents a detailed account of the design and compilation of the corpus, focusing on the very many challenges which needed to be overcome in order to create the corpus, along with the solutions to these challenges which were devised. It also demonstrates the utility of the corpus, by presenting a diachronic comparison of academic writing in the 1990s and 2010s, with a focus on the theory of colloquialisation.
This thesis, whilst not a Written BNC2014 user-guide, presents all of the decisions made in the design and creation of the corpus, and as such, will help to make the corpus as useful to as many people, for as many purposes, as possible.