Home > Research > Publications & Outputs > Towards a Welsh semantic tagger

Electronic data

  • cl2017-welsh-semtagger

    Accepted author manuscript, 111 KB, PDF document

View graph of relations

Towards a Welsh semantic tagger: creating lexicons for a resource poor language

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Publication date24/07/2017
Number of pages4
<mark>Original language</mark>English
EventThe Corpus Linguistics Conference 2017 - University of Birmingham, Birmingham, United Kingdom
Duration: 24/07/201728/07/2017


ConferenceThe Corpus Linguistics Conference 2017
Abbreviated titleCL2017
Country/TerritoryUnited Kingdom
Internet address


Semantic annotation is an important part of corpus linguistics. A major tool for semantic tagger is the USAS developed at Lancaster University, which was originally designed for English but has been extended to cover many more languages. In the CorCenCC Project (http://sites.cardiff.ac.uk/corcencc), we are extending the USAS to automatically annotate Welsh language data with the USAS semantic tagset. In this paper, we report on the development of Welsh semantic lexicons for the semantic tagger, in which we have already built a Welsh semantic lexicon containing 143,290 entries that has achieved a lexical coverage of 72.42% in an initial evaluation. An initial version of the Welsh semantic tagger has already been developed based on the lexical resource.