General Northern English. Exploring regional variation in the North of England with machine learning

Linguistics and English Language

Associated organisational unit

Phonetics Lab

Text available via DOI:

https://doi.org/10.3389/frai.2020.00048
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

vowels, accent features, Northern English, random forests, feature selection, dialect leveling

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

General Northern English. Exploring regional variation in the North of England with machine learning. / Strycharczuk, Patrycja; Lopez-Ibanez, Manu; Brown, Georgina et al.
In: Frontiers in Artificial Intelligence: Language and Computation, Vol. 4, 48, 15.07.2020.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Strycharczuk, P, Lopez-Ibanez, M, Brown, G & Leemann, A 2020, 'General Northern English. Exploring regional variation in the North of England with machine learning', Frontiers in Artificial Intelligence: Language and Computation, vol. 4, 48. https://doi.org/10.3389/frai.2020.00048

APA

Strycharczuk, P., Lopez-Ibanez, M., Brown, G., & Leemann, A. (2020). General Northern English. Exploring regional variation in the North of England with machine learning. Frontiers in Artificial Intelligence: Language and Computation, 4, Article 48. https://doi.org/10.3389/frai.2020.00048

Vancouver

Strycharczuk P, Lopez-Ibanez M, Brown G, Leemann A. General Northern English. Exploring regional variation in the North of England with machine learning. Frontiers in Artificial Intelligence: Language and Computation. 2020 Jul 15;4:48. doi: 10.3389/frai.2020.00048

Author

Strycharczuk, Patrycja ; Lopez-Ibanez, Manu ; Brown, Georgina et al. / General Northern English. Exploring regional variation in the North of England with machine learning. In: Frontiers in Artificial Intelligence: Language and Computation. 2020 ; Vol. 4.

Bibtex

@article{1ef2231f30344edda6bb45c0c463cb10,

title = "General Northern English. Exploring regional variation in the North of England with machine learning",

abstract = "In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.",

keywords = "vowels, accent features, Northern English, random forests, feature selection, dialect leveling",

author = "Patrycja Strycharczuk and Manu Lopez-Ibanez and Georgina Brown and Adrian Leemann",

year = "2020",

month = jul,

day = "15",

doi = "10.3389/frai.2020.00048",

language = "English",

volume = "4",

journal = "Frontiers in Artificial Intelligence: Language and Computation",

publisher = "Frontiers",

}

RIS

TY - JOUR

T1 - General Northern English. Exploring regional variation in the North of England with machine learning

AU - Strycharczuk, Patrycja

AU - Lopez-Ibanez, Manu

AU - Brown, Georgina

AU - Leemann, Adrian

PY - 2020/7/15

Y1 - 2020/7/15

N2 - In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.

AB - In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.

KW - vowels

KW - accent features

KW - Northern English

KW - random forests

KW - feature selection

KW - dialect leveling

U2 - 10.3389/frai.2020.00048

DO - 10.3389/frai.2020.00048

M3 - Journal article

VL - 4

JO - Frontiers in Artificial Intelligence: Language and Computation

JF - Frontiers in Artificial Intelligence: Language and Computation

M1 - 48

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords