The utility of topic modelling for discourse studies

Linguistics and English Language

Electronic data

Brookes McEnery Discourse Studies accepted version
Accepted author manuscript, 79.3 KB, Word document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1177/1461445618814032
Final published version
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Keywords

Corpus linguistics, corpus-assisted discourse studies, latent Dirichlet allocation, patient feedback, topic modelling

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

The utility of topic modelling for discourse studies. / Brookes, Gavin ; McEnery, Anthony Mark.
In: Discourse Studies, Vol. 21, No. 1, 01.01.2019, p. 3-21.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Brookes, G & McEnery, AM 2019, 'The utility of topic modelling for discourse studies', Discourse Studies, vol. 21, no. 1, pp. 3-21. https://doi.org/10.1177/1461445618814032

APA

Brookes, G., & McEnery, A. M. (2019). The utility of topic modelling for discourse studies. Discourse Studies, 21(1), 3-21. https://doi.org/10.1177/1461445618814032

Vancouver

Brookes G , McEnery AM. The utility of topic modelling for discourse studies. Discourse Studies. 2019 Jan 1;21(1):3-21. Epub 2018 Dec 21. doi: 10.1177/1461445618814032

Author

Brookes, Gavin ; McEnery, Anthony Mark. / The utility of topic modelling for discourse studies. In: Discourse Studies. 2019 ; Vol. 21, No. 1. pp. 3-21.

Bibtex

@article{245d47801b754d949ce2146e0558e9d1,

title = "The utility of topic modelling for discourse studies",

abstract = "This article explores and critically evaluates the potential contribution to discourse studies of topic modelling, a group of machine learning methods which have been used with the aim of automatically discovering thematic information in large collections of texts. We critically evaluate the utility of the thematic grouping of texts into {\textquoteleft}topics{\textquoteright} emerging from a large collection of online patient comments about the National Health Service (NHS) in England. We take two approaches to this, one inspired by methods adopted in existing topic modelling research and one using more established methods of discourse analysis. In the study, we compare the insights produced by each approach and consider the extent to which the automatically generated topics might be of use to discourse analysts attempting to organise and study sizeable datasets. We found that the topic modelling approach was able to group texts into {\textquoteleft}topics{\textquoteright} that were truly thematically coherent with a mixed degree of success while the more traditional approach to discourse analysis consistently provided a more nuanced perspective on the data that was ultimately closer to the {\textquoteleft}reality{\textquoteright} of the texts it contains. This study thus highlights issues concerning the use of topic modelling and offers recommendations and caveats to researchers employing such approaches to study discourse in the future. ",

keywords = "Corpus linguistics, corpus-assisted discourse studies, latent Dirichlet allocation, patient feedback, topic modelling",

author = "Gavin Brookes and McEnery, {Anthony Mark}",

year = "2019",

month = jan,

day = "1",

doi = "10.1177/1461445618814032",

language = "English",

volume = "21",

pages = "3--21",

journal = "Discourse Studies",

issn = "1461-4456",

publisher = "SAGE Publications Ltd",

number = "1",

}

RIS

TY - JOUR

T1 - The utility of topic modelling for discourse studies

AU - Brookes, Gavin

AU - McEnery, Anthony Mark

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This article explores and critically evaluates the potential contribution to discourse studies of topic modelling, a group of machine learning methods which have been used with the aim of automatically discovering thematic information in large collections of texts. We critically evaluate the utility of the thematic grouping of texts into ‘topics’ emerging from a large collection of online patient comments about the National Health Service (NHS) in England. We take two approaches to this, one inspired by methods adopted in existing topic modelling research and one using more established methods of discourse analysis. In the study, we compare the insights produced by each approach and consider the extent to which the automatically generated topics might be of use to discourse analysts attempting to organise and study sizeable datasets. We found that the topic modelling approach was able to group texts into ‘topics’ that were truly thematically coherent with a mixed degree of success while the more traditional approach to discourse analysis consistently provided a more nuanced perspective on the data that was ultimately closer to the ‘reality’ of the texts it contains. This study thus highlights issues concerning the use of topic modelling and offers recommendations and caveats to researchers employing such approaches to study discourse in the future.

AB - This article explores and critically evaluates the potential contribution to discourse studies of topic modelling, a group of machine learning methods which have been used with the aim of automatically discovering thematic information in large collections of texts. We critically evaluate the utility of the thematic grouping of texts into ‘topics’ emerging from a large collection of online patient comments about the National Health Service (NHS) in England. We take two approaches to this, one inspired by methods adopted in existing topic modelling research and one using more established methods of discourse analysis. In the study, we compare the insights produced by each approach and consider the extent to which the automatically generated topics might be of use to discourse analysts attempting to organise and study sizeable datasets. We found that the topic modelling approach was able to group texts into ‘topics’ that were truly thematically coherent with a mixed degree of success while the more traditional approach to discourse analysis consistently provided a more nuanced perspective on the data that was ultimately closer to the ‘reality’ of the texts it contains. This study thus highlights issues concerning the use of topic modelling and offers recommendations and caveats to researchers employing such approaches to study discourse in the future.

KW - Corpus linguistics

KW - corpus-assisted discourse studies

KW - latent Dirichlet allocation

KW - patient feedback

KW - topic modelling

U2 - 10.1177/1461445618814032

DO - 10.1177/1461445618814032

M3 - Journal article

VL - 21

SP - 3

EP - 21

JO - Discourse Studies

JF - Discourse Studies

SN - 1461-4456

IS - 1

ER -

Research

Electronic data

Links

Text available via DOI:

Keywords