Home > Research > Publications & Outputs > Extending the key semantic domains method beyon...

Electronic data

  • Poster

    Final published version, 827 KB, PDF document

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

  • Handout

    Final published version, 74.3 KB, PDF document

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

View graph of relations

Extending the key semantic domains method beyond English corpora: Wmatrix version 5

Research output: Contribution to conference - Without ISBN/ISSN Posterpeer-review

Published

Standard

Extending the key semantic domains method beyond English corpora: Wmatrix version 5. / Rayson, Paul.
2021. Poster session presented at Corpus Linguistics 2021, Limerick, Ireland.

Research output: Contribution to conference - Without ISBN/ISSN Posterpeer-review

Harvard

Rayson, P 2021, 'Extending the key semantic domains method beyond English corpora: Wmatrix version 5', Corpus Linguistics 2021, Limerick, Ireland, 13/07/21 - 16/07/21.

APA

Rayson, P. (2021). Extending the key semantic domains method beyond English corpora: Wmatrix version 5. Poster session presented at Corpus Linguistics 2021, Limerick, Ireland.

Vancouver

Rayson P. Extending the key semantic domains method beyond English corpora: Wmatrix version 5. 2021. Poster session presented at Corpus Linguistics 2021, Limerick, Ireland.

Author

Rayson, Paul. / Extending the key semantic domains method beyond English corpora : Wmatrix version 5. Poster session presented at Corpus Linguistics 2021, Limerick, Ireland.

Bibtex

@conference{0489807abcd940e2b4a45a9b7d9728e3,
title = "Extending the key semantic domains method beyond English corpora: Wmatrix version 5",
abstract = "The key semantic domains method (Rayson, 2008) implemented in Wmatrix (versions 1 to 4) extends the keywords approach which has been widely applied in corpus linguistics research. Key semantic domains facilitates the discovery of concepts and groups of words collected within semantic fields which are unusually frequent or infrequent compared to a reference corpus, and can exploit significance and effect size measures in the same way as the key words approach. Key semantic domains have proved useful in a number of different areas of linguistic research: literary characterisation (Balossi, 2014), language of psychopaths (Hancock et al., 2013), corpus-assisted discourse analysis of social work writing (Leedham et al., 2020), enhancing critical thinking in higher education (O{\textquoteright}Halloran, 2020), and the construction of newsworthiness (Potts et al., 2015). However, one important drawback is that key semantic domains are currently restricted to one language only due to the inclusion of the CLAWS Part-of-Speech (POS) tagger (Garside and Smith, 1997) and the UCREL Semantic Analysis System (USAS) for English (Rayson et al., 2004). In recent years, semantic taggers for other languages have been developed (Piao et al., 2015; Piao et al., 2016) utilising freely available POS taggers and lemmatisers for new languages, and adapting a variety of methods ranging from bilingual dictionaries, parallel aligned corpora, machine translation, and crowdsourcing to bootstrap development of new semantic lexicons, and vector-based, pre-trained embeddings and machine learning methods to improve contextual disambiguation (Ezeani et al., 2019). Previously, a beta version of the Spanish semantic tagger has been incorporated into Wmatrix4. This poster will describe how the semantic taggers for further languages are being incorporated into Wmatrix5. Crucially, there is a need to support community crowdsourcing involvement for the extension and checking of the new semantic lexicons which are under varying stages of development to improve their coverage and accuracy.",
author = "Paul Rayson",
year = "2021",
month = jul,
day = "13",
language = "English",
note = "Corpus Linguistics 2021, CL2021 ; Conference date: 13-07-2021 Through 16-07-2021",
url = "https://www.cl2021.org/",

}

RIS

TY - CONF

T1 - Extending the key semantic domains method beyond English corpora

T2 - Corpus Linguistics 2021

AU - Rayson, Paul

PY - 2021/7/13

Y1 - 2021/7/13

N2 - The key semantic domains method (Rayson, 2008) implemented in Wmatrix (versions 1 to 4) extends the keywords approach which has been widely applied in corpus linguistics research. Key semantic domains facilitates the discovery of concepts and groups of words collected within semantic fields which are unusually frequent or infrequent compared to a reference corpus, and can exploit significance and effect size measures in the same way as the key words approach. Key semantic domains have proved useful in a number of different areas of linguistic research: literary characterisation (Balossi, 2014), language of psychopaths (Hancock et al., 2013), corpus-assisted discourse analysis of social work writing (Leedham et al., 2020), enhancing critical thinking in higher education (O’Halloran, 2020), and the construction of newsworthiness (Potts et al., 2015). However, one important drawback is that key semantic domains are currently restricted to one language only due to the inclusion of the CLAWS Part-of-Speech (POS) tagger (Garside and Smith, 1997) and the UCREL Semantic Analysis System (USAS) for English (Rayson et al., 2004). In recent years, semantic taggers for other languages have been developed (Piao et al., 2015; Piao et al., 2016) utilising freely available POS taggers and lemmatisers for new languages, and adapting a variety of methods ranging from bilingual dictionaries, parallel aligned corpora, machine translation, and crowdsourcing to bootstrap development of new semantic lexicons, and vector-based, pre-trained embeddings and machine learning methods to improve contextual disambiguation (Ezeani et al., 2019). Previously, a beta version of the Spanish semantic tagger has been incorporated into Wmatrix4. This poster will describe how the semantic taggers for further languages are being incorporated into Wmatrix5. Crucially, there is a need to support community crowdsourcing involvement for the extension and checking of the new semantic lexicons which are under varying stages of development to improve their coverage and accuracy.

AB - The key semantic domains method (Rayson, 2008) implemented in Wmatrix (versions 1 to 4) extends the keywords approach which has been widely applied in corpus linguistics research. Key semantic domains facilitates the discovery of concepts and groups of words collected within semantic fields which are unusually frequent or infrequent compared to a reference corpus, and can exploit significance and effect size measures in the same way as the key words approach. Key semantic domains have proved useful in a number of different areas of linguistic research: literary characterisation (Balossi, 2014), language of psychopaths (Hancock et al., 2013), corpus-assisted discourse analysis of social work writing (Leedham et al., 2020), enhancing critical thinking in higher education (O’Halloran, 2020), and the construction of newsworthiness (Potts et al., 2015). However, one important drawback is that key semantic domains are currently restricted to one language only due to the inclusion of the CLAWS Part-of-Speech (POS) tagger (Garside and Smith, 1997) and the UCREL Semantic Analysis System (USAS) for English (Rayson et al., 2004). In recent years, semantic taggers for other languages have been developed (Piao et al., 2015; Piao et al., 2016) utilising freely available POS taggers and lemmatisers for new languages, and adapting a variety of methods ranging from bilingual dictionaries, parallel aligned corpora, machine translation, and crowdsourcing to bootstrap development of new semantic lexicons, and vector-based, pre-trained embeddings and machine learning methods to improve contextual disambiguation (Ezeani et al., 2019). Previously, a beta version of the Spanish semantic tagger has been incorporated into Wmatrix4. This poster will describe how the semantic taggers for further languages are being incorporated into Wmatrix5. Crucially, there is a need to support community crowdsourcing involvement for the extension and checking of the new semantic lexicons which are under varying stages of development to improve their coverage and accuracy.

M3 - Poster

Y2 - 13 July 2021 through 16 July 2021

ER -