Home > Research > Publications & Outputs > Examining corpus prototypicality and keyness be...
View graph of relations

Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published

Standard

Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt. / Smith, Nicholas; Anthony, Laurence; Hoffmann, Sebastian et al.
2023. Paper presented at Corpus Linguistics, Lancaster, United Kingdom.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Smith, N, Anthony, L, Hoffmann, S & Rayson, P 2023, 'Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt', Paper presented at Corpus Linguistics, Lancaster, United Kingdom, 2/07/23 - 6/07/23.

APA

Smith, N., Anthony, L., Hoffmann, S., & Rayson, P. (2023). Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt. Paper presented at Corpus Linguistics, Lancaster, United Kingdom.

Vancouver

Smith N, Anthony L, Hoffmann S, Rayson P. Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt. 2023. Paper presented at Corpus Linguistics, Lancaster, United Kingdom.

Author

Smith, Nicholas ; Anthony, Laurence ; Hoffmann, Sebastian et al. / Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt. Paper presented at Corpus Linguistics, Lancaster, United Kingdom.

Bibtex

@conference{0590403377df4e28917d634d187fcdc8,
title = "Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt",
abstract = "For linguists working with corpora, a common difficulty after quantitative analysis is deciding which texts to select for follow-up, close analysis, without arousing suspicion of {\textquoteleft}cherry-picking{\textquoteright}. The ProtAnt tool (Anthony & Baker 2015) provides a major boost in this respect. Building on a now well-established tradition of corpus keywords analysis (since Scott 1997), and the association between prototypes and frequency of instantiation (e.g., Rosch 1975, Gries 2003), ProtAnt ranks the texts in a target corpus from most to least prototypical according to the number of keywords they contain. ProtAnt{\textquoteright}s capabilities have been increasingly exploited in text/discourse analysis (e.g., Levon 2016, Bednarek and Caple 2017, Price 2022), but to date, all such studies have been confined to traditional lexical-based keywords, rather than keywords generated at other linguistic levels such as parts of speech (POS), semantic domains, and speech acts. The current paper seeks to address this research gap, posing the question: How successfully can ProtAnt identify prototypical and outlier texts in corpora at various non-lexical linguistic levels? We address this question through a series of experiments. Results show that ProtAnt is able to use key POS-tags to identify stylistically prototypical texts in registers of the American AmE06 corpus and also flag outlier texts that have been artificially included from another register. Using semantic tags, outliers are identified with still higher success. Other results show that speech act tags (in SPICE-Ireland) yield more mixed results. On the whole, non-lexical key items are able to complement those at the lexical level in profiling texts, with success seemingly affected by the granularity of the tags, accuracy of the linguistic annotations, and degree of specialization of the register. We discuss the theoretical and practical implications of our work in areas such as grammar, stylistics, discourse analysis, and data-driven learning (DDL).ReferencesAnthony, L. & Baker, P. (2015). ProtAnt: A tool for analysing the prototypicality of texts International Journal of Corpus Linguistics 20(3): 273-292.Bednarek, M. & Caple, H. (2017). The Discourse of News Values: How News Organizations Create Newsworthiness. Oxford: Oxford University Press.Gries, S. (2003). Towards a corpus-based identification of prototypical instances of con- structions. Annual Review of Cognitive Linguistics, 1:1-27. DOI: 10.1075/arcl.1.02griLevon, E. (2016). Qualitative analysis of stance. In Baker. P. & Egbert, J. (eds.) Triangulating Methodological Approaches in Corpus Linguistic Research. London/New York: Routledge.Price, H. (2022) The Language of Mental Illness: Corpus Linguistics and the Construction of mental illness in the Press. Cambridge: Cambridge University Press.Rosch, E. (1975). Cognitive representations of semantic categories. Journal of ExperimentalPsychology: General, 104(3): 192–233. DOI: 10.1037/0096-3445.104.3.192Scott, M. (1997). PC analysis of key words - and key key words. System 25(2): 233-245.",
author = "Nicholas Smith and Laurence Anthony and Sebastian Hoffmann and Paul Rayson",
year = "2023",
month = jul,
day = "3",
language = "English",
note = "Corpus Linguistics, CL2023 ; Conference date: 02-07-2023 Through 06-07-2023",
url = "https://wp.lancs.ac.uk/cl2023/",

}

RIS

TY - CONF

T1 - Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt

AU - Smith, Nicholas

AU - Anthony, Laurence

AU - Hoffmann, Sebastian

AU - Rayson, Paul

N1 - Conference code: 12

PY - 2023/7/3

Y1 - 2023/7/3

N2 - For linguists working with corpora, a common difficulty after quantitative analysis is deciding which texts to select for follow-up, close analysis, without arousing suspicion of ‘cherry-picking’. The ProtAnt tool (Anthony & Baker 2015) provides a major boost in this respect. Building on a now well-established tradition of corpus keywords analysis (since Scott 1997), and the association between prototypes and frequency of instantiation (e.g., Rosch 1975, Gries 2003), ProtAnt ranks the texts in a target corpus from most to least prototypical according to the number of keywords they contain. ProtAnt’s capabilities have been increasingly exploited in text/discourse analysis (e.g., Levon 2016, Bednarek and Caple 2017, Price 2022), but to date, all such studies have been confined to traditional lexical-based keywords, rather than keywords generated at other linguistic levels such as parts of speech (POS), semantic domains, and speech acts. The current paper seeks to address this research gap, posing the question: How successfully can ProtAnt identify prototypical and outlier texts in corpora at various non-lexical linguistic levels? We address this question through a series of experiments. Results show that ProtAnt is able to use key POS-tags to identify stylistically prototypical texts in registers of the American AmE06 corpus and also flag outlier texts that have been artificially included from another register. Using semantic tags, outliers are identified with still higher success. Other results show that speech act tags (in SPICE-Ireland) yield more mixed results. On the whole, non-lexical key items are able to complement those at the lexical level in profiling texts, with success seemingly affected by the granularity of the tags, accuracy of the linguistic annotations, and degree of specialization of the register. We discuss the theoretical and practical implications of our work in areas such as grammar, stylistics, discourse analysis, and data-driven learning (DDL).ReferencesAnthony, L. & Baker, P. (2015). ProtAnt: A tool for analysing the prototypicality of texts International Journal of Corpus Linguistics 20(3): 273-292.Bednarek, M. & Caple, H. (2017). The Discourse of News Values: How News Organizations Create Newsworthiness. Oxford: Oxford University Press.Gries, S. (2003). Towards a corpus-based identification of prototypical instances of con- structions. Annual Review of Cognitive Linguistics, 1:1-27. DOI: 10.1075/arcl.1.02griLevon, E. (2016). Qualitative analysis of stance. In Baker. P. & Egbert, J. (eds.) Triangulating Methodological Approaches in Corpus Linguistic Research. London/New York: Routledge.Price, H. (2022) The Language of Mental Illness: Corpus Linguistics and the Construction of mental illness in the Press. Cambridge: Cambridge University Press.Rosch, E. (1975). Cognitive representations of semantic categories. Journal of ExperimentalPsychology: General, 104(3): 192–233. DOI: 10.1037/0096-3445.104.3.192Scott, M. (1997). PC analysis of key words - and key key words. System 25(2): 233-245.

AB - For linguists working with corpora, a common difficulty after quantitative analysis is deciding which texts to select for follow-up, close analysis, without arousing suspicion of ‘cherry-picking’. The ProtAnt tool (Anthony & Baker 2015) provides a major boost in this respect. Building on a now well-established tradition of corpus keywords analysis (since Scott 1997), and the association between prototypes and frequency of instantiation (e.g., Rosch 1975, Gries 2003), ProtAnt ranks the texts in a target corpus from most to least prototypical according to the number of keywords they contain. ProtAnt’s capabilities have been increasingly exploited in text/discourse analysis (e.g., Levon 2016, Bednarek and Caple 2017, Price 2022), but to date, all such studies have been confined to traditional lexical-based keywords, rather than keywords generated at other linguistic levels such as parts of speech (POS), semantic domains, and speech acts. The current paper seeks to address this research gap, posing the question: How successfully can ProtAnt identify prototypical and outlier texts in corpora at various non-lexical linguistic levels? We address this question through a series of experiments. Results show that ProtAnt is able to use key POS-tags to identify stylistically prototypical texts in registers of the American AmE06 corpus and also flag outlier texts that have been artificially included from another register. Using semantic tags, outliers are identified with still higher success. Other results show that speech act tags (in SPICE-Ireland) yield more mixed results. On the whole, non-lexical key items are able to complement those at the lexical level in profiling texts, with success seemingly affected by the granularity of the tags, accuracy of the linguistic annotations, and degree of specialization of the register. We discuss the theoretical and practical implications of our work in areas such as grammar, stylistics, discourse analysis, and data-driven learning (DDL).ReferencesAnthony, L. & Baker, P. (2015). ProtAnt: A tool for analysing the prototypicality of texts International Journal of Corpus Linguistics 20(3): 273-292.Bednarek, M. & Caple, H. (2017). The Discourse of News Values: How News Organizations Create Newsworthiness. Oxford: Oxford University Press.Gries, S. (2003). Towards a corpus-based identification of prototypical instances of con- structions. Annual Review of Cognitive Linguistics, 1:1-27. DOI: 10.1075/arcl.1.02griLevon, E. (2016). Qualitative analysis of stance. In Baker. P. & Egbert, J. (eds.) Triangulating Methodological Approaches in Corpus Linguistic Research. London/New York: Routledge.Price, H. (2022) The Language of Mental Illness: Corpus Linguistics and the Construction of mental illness in the Press. Cambridge: Cambridge University Press.Rosch, E. (1975). Cognitive representations of semantic categories. Journal of ExperimentalPsychology: General, 104(3): 192–233. DOI: 10.1037/0096-3445.104.3.192Scott, M. (1997). PC analysis of key words - and key key words. System 25(2): 233-245.

M3 - Conference paper

T2 - Corpus Linguistics

Y2 - 2 July 2023 through 6 July 2023

ER -