A Multi-Dimensional Analysis of English Tweets

Linguistics and English Language

Text available via DOI:

https://doi.org/10.1177/09639470221090369
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Corpus linguistics, functional linguistic variation, multiple correspondence analysis, multidimensional anlaysis, stylistic variation, twitter

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

A Multi-Dimensional Analysis of English Tweets. / Clarke, Isobelle.
In: Language and Literature, Vol. 31, No. 2, 01.05.2022, p. 124-149.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Clarke, I 2022, 'A Multi-Dimensional Analysis of English Tweets', Language and Literature, vol. 31, no. 2, pp. 124-149. https://doi.org/10.1177/09639470221090369

APA

Clarke, I. (2022). A Multi-Dimensional Analysis of English Tweets. Language and Literature, 31(2), 124-149. https://doi.org/10.1177/09639470221090369

Vancouver

Clarke I. A Multi-Dimensional Analysis of English Tweets. Language and Literature. 2022 May 1;31(2):124-149. doi: 10.1177/09639470221090369

Author

Clarke, Isobelle. / A Multi-Dimensional Analysis of English Tweets. In: Language and Literature. 2022 ; Vol. 31, No. 2. pp. 124-149.

Bibtex

@article{facc044fdf47438191712aa023ca30e6,

title = "A Multi-Dimensional Analysis of English Tweets",

abstract = "This paper applies Multi-Dimensional Analysis (MDA) to a corpus of English tweets to uncover the most common patterns of linguistic variation. MDA is a commonly applied method in corpus linguistics for the analysis of functional and/or stylistic variation in a particular language variety. Notably, MDA is an approach aimed at identifying and interpreting the frequent patterns of co-occurring linguistic features across a corpus, such as a corpus of spoken and written English registers (Biber, 1988). Traditionally, MDA is based on a factor analysis of the relative frequencies of numerous grammatical features measured across numerous texts drawn from that variety of language to identify a series of underlying dimensions of linguistic variation. Despite its popularity and utility, traditional MDA has an important limitation – it can only be used to analyse texts that are long enough to allow for the relative frequencies of many grammatical forms to be estimated accurately. If the texts under analysis are too short, then few forms can be expected to occur sufficiently frequently for their relative frequency to be accurately estimated. Tweets are characteristically short texts, meaning that traditional MDA cannot be used in the present research. To overcome this problem, this paper introduces a short-text version of MDA and applies it to a corpus of English tweets. Specifically, rather than measure the relative frequencies of forms in each tweet, the approach analyses their occurrence. This binary dataset is then aggregated using Multiple Correspondence Analysis (MCA), which is used much like factor analysis in traditional MDA – to return a series of dimensions that represent the most common patterns of linguistic variation in the dataset. After controlling for text length in the first dimension, four subsequent dimensions are interpreted. The results suggest that there is a great deal of linguistic variation on Twitter. Notably, the results show that Twitter is commonly used for self-commodification, as people manage their identities, engaging in practices of self-branding through stance-taking, self-reporting, promotion and persuasion, as well as broadcasting their message beyond their followership, distributing news, and expressing opposition and this often occurs in order to attract attention. Additionally, the results show that interaction is common, suggesting that Twitter is also used for social and interpersonal gain.",

keywords = "Corpus linguistics, functional linguistic variation, multiple correspondence analysis, multidimensional anlaysis, stylistic variation, twitter",

author = "Isobelle Clarke",

year = "2022",

month = may,

day = "1",

doi = "10.1177/09639470221090369",

language = "English",

volume = "31",

pages = "124--149",

journal = "Language and Literature",

issn = "0963-9470",

publisher = "SAGE Publications Ltd",

number = "2",

}

RIS

TY - JOUR

T1 - A Multi-Dimensional Analysis of English Tweets

AU - Clarke, Isobelle

PY - 2022/5/1

Y1 - 2022/5/1

N2 - This paper applies Multi-Dimensional Analysis (MDA) to a corpus of English tweets to uncover the most common patterns of linguistic variation. MDA is a commonly applied method in corpus linguistics for the analysis of functional and/or stylistic variation in a particular language variety. Notably, MDA is an approach aimed at identifying and interpreting the frequent patterns of co-occurring linguistic features across a corpus, such as a corpus of spoken and written English registers (Biber, 1988). Traditionally, MDA is based on a factor analysis of the relative frequencies of numerous grammatical features measured across numerous texts drawn from that variety of language to identify a series of underlying dimensions of linguistic variation. Despite its popularity and utility, traditional MDA has an important limitation – it can only be used to analyse texts that are long enough to allow for the relative frequencies of many grammatical forms to be estimated accurately. If the texts under analysis are too short, then few forms can be expected to occur sufficiently frequently for their relative frequency to be accurately estimated. Tweets are characteristically short texts, meaning that traditional MDA cannot be used in the present research. To overcome this problem, this paper introduces a short-text version of MDA and applies it to a corpus of English tweets. Specifically, rather than measure the relative frequencies of forms in each tweet, the approach analyses their occurrence. This binary dataset is then aggregated using Multiple Correspondence Analysis (MCA), which is used much like factor analysis in traditional MDA – to return a series of dimensions that represent the most common patterns of linguistic variation in the dataset. After controlling for text length in the first dimension, four subsequent dimensions are interpreted. The results suggest that there is a great deal of linguistic variation on Twitter. Notably, the results show that Twitter is commonly used for self-commodification, as people manage their identities, engaging in practices of self-branding through stance-taking, self-reporting, promotion and persuasion, as well as broadcasting their message beyond their followership, distributing news, and expressing opposition and this often occurs in order to attract attention. Additionally, the results show that interaction is common, suggesting that Twitter is also used for social and interpersonal gain.

AB - This paper applies Multi-Dimensional Analysis (MDA) to a corpus of English tweets to uncover the most common patterns of linguistic variation. MDA is a commonly applied method in corpus linguistics for the analysis of functional and/or stylistic variation in a particular language variety. Notably, MDA is an approach aimed at identifying and interpreting the frequent patterns of co-occurring linguistic features across a corpus, such as a corpus of spoken and written English registers (Biber, 1988). Traditionally, MDA is based on a factor analysis of the relative frequencies of numerous grammatical features measured across numerous texts drawn from that variety of language to identify a series of underlying dimensions of linguistic variation. Despite its popularity and utility, traditional MDA has an important limitation – it can only be used to analyse texts that are long enough to allow for the relative frequencies of many grammatical forms to be estimated accurately. If the texts under analysis are too short, then few forms can be expected to occur sufficiently frequently for their relative frequency to be accurately estimated. Tweets are characteristically short texts, meaning that traditional MDA cannot be used in the present research. To overcome this problem, this paper introduces a short-text version of MDA and applies it to a corpus of English tweets. Specifically, rather than measure the relative frequencies of forms in each tweet, the approach analyses their occurrence. This binary dataset is then aggregated using Multiple Correspondence Analysis (MCA), which is used much like factor analysis in traditional MDA – to return a series of dimensions that represent the most common patterns of linguistic variation in the dataset. After controlling for text length in the first dimension, four subsequent dimensions are interpreted. The results suggest that there is a great deal of linguistic variation on Twitter. Notably, the results show that Twitter is commonly used for self-commodification, as people manage their identities, engaging in practices of self-branding through stance-taking, self-reporting, promotion and persuasion, as well as broadcasting their message beyond their followership, distributing news, and expressing opposition and this often occurs in order to attract attention. Additionally, the results show that interaction is common, suggesting that Twitter is also used for social and interpersonal gain.

KW - Corpus linguistics

KW - functional linguistic variation

KW - multiple correspondence analysis

KW - multidimensional anlaysis

KW - stylistic variation

KW - twitter

U2 - 10.1177/09639470221090369

DO - 10.1177/09639470221090369

M3 - Journal article

VL - 31

SP - 124

EP - 149

JO - Language and Literature

JF - Language and Literature

SN - 0963-9470

IS - 2

ER -

Research

Links

Text available via DOI:

Keywords