Home > Research > Publications & Outputs > Corpus methods for linguistic analysis
View graph of relations

Corpus methods for linguistic analysis

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNChapter (peer-reviewed)peer-review

Forthcoming
Publication date28/06/2024
Host publicationResearch methods for applied linguistics: a practical guide
Place of PublicationEdinburgh
PublisherEdinburgh University Press
<mark>Original language</mark>English

Abstract

This chapter aims to introduce corpus linguistics as a scientific method for both the theoretical and applied analysis of written and spoken language data. It first provides key definitions in corpus linguistics (e.g., corpus, token, type, annotation, metadata) and introduces different types of corpora, describing the core features (e.g., representativeness, balance, sampling) to consider when building/selecting a corpus. In addition to well-known general corpora like the British National Corpus (which represents British English used in a wide range of communicative contexts), the chapter also describes specialized corpora that contain, for example, transcripts of second language speech, multilingual data, register-specific or topic-specific texts. It then provides an overview of key techniques in corpus linguistics (e.g., frequency counts, concordances, collocations) which allow both quantitative and qualitative analysis of linguistic data. These techniques demonstrate the wide range of applications of corpus methods, for instance, to inform research in stylistics, forensic linguistics, language acquisition, and discourse analysis. The chapter includes a case study that analyses second language production to illustrate corpus techniques, freely available tools, and how to report corpus findings. Challenges in the application of corpus methods to linguistic analysis are identified (e.g., selecting/building a representative corpus, processing and annotating data, ethical issues and copyright) and possible solutions are outlined.