Home > Research > Publications & Outputs > Extending the key semantic domains method beyon...

Electronic data

  • Poster

    Final published version, 827 KB, PDF document

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

  • Handout

    Final published version, 74.3 KB, PDF document

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

View graph of relations

Extending the key semantic domains method beyond English corpora: Wmatrix version 5

Research output: Contribution to conference - Without ISBN/ISSN Posterpeer-review

Publication date13/07/2021
<mark>Original language</mark>English
EventCorpus Linguistics 2021 - University of Limerick, Limerick, Ireland
Duration: 13/07/202116/07/2021


ConferenceCorpus Linguistics 2021
Abbreviated titleCL2021
Internet address


The key semantic domains method (Rayson, 2008) implemented in Wmatrix (versions 1 to 4) extends the keywords approach which has been widely applied in corpus linguistics research. Key semantic domains facilitates the discovery of concepts and groups of words collected within semantic fields which are unusually frequent or infrequent compared to a reference corpus, and can exploit significance and effect size measures in the same way as the key words approach. Key semantic domains have proved useful in a number of different areas of linguistic research: literary characterisation (Balossi, 2014), language of psychopaths (Hancock et al., 2013), corpus-assisted discourse analysis of social work writing (Leedham et al., 2020), enhancing critical thinking in higher education (O’Halloran, 2020), and the construction of newsworthiness (Potts et al., 2015). However, one important drawback is that key semantic domains are currently restricted to one language only due to the inclusion of the CLAWS Part-of-Speech (POS) tagger (Garside and Smith, 1997) and the UCREL Semantic Analysis System (USAS) for English (Rayson et al., 2004). In recent years, semantic taggers for other languages have been developed (Piao et al., 2015; Piao et al., 2016) utilising freely available POS taggers and lemmatisers for new languages, and adapting a variety of methods ranging from bilingual dictionaries, parallel aligned corpora, machine translation, and crowdsourcing to bootstrap development of new semantic lexicons, and vector-based, pre-trained embeddings and machine learning methods to improve contextual disambiguation (Ezeani et al., 2019). Previously, a beta version of the Spanish semantic tagger has been incorporated into Wmatrix4. This poster will describe how the semantic taggers for further languages are being incorporated into Wmatrix5. Crucially, there is a need to support community crowdsourcing involvement for the extension and checking of the new semantic lexicons which are under varying stages of development to improve their coverage and accuracy.