Home > Research > Publications & Outputs > Towards A Welsh Semantic Annotation System

Electronic data

  • lrec2018-cysemtagger

    Accepted author manuscript, 126 KB, PDF-document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

  • welsh-sem-tagger-lrec2018-proc

    Rights statement: The LREC 2018 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

    Final published version, 137 KB, PDF-document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

Towards A Welsh Semantic Annotation System

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published
Publication date9/05/2018
Host publicationLREC 2018, Eleventh International Conference on Language Resources and Evaluation
EditorsNicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
PublisherEuropean Language Resources Association (ELRA)
Pages980-985
Number of pages6
ISBN (Print)9791095546009
Original languageEnglish
EventThe 11th Edition of the Language Resources and Evaluation Conference - Miyazaki, Japan
Duration: 7/05/201812/05/2018
http://lrec2018.lrec-conf.org/en/

Conference

ConferenceThe 11th Edition of the Language Resources and Evaluation Conference
Abbreviated titleLREC2018
CountryJapan
CityMiyazaki
Period7/05/1812/05/18
Internet address

Conference

ConferenceThe 11th Edition of the Language Resources and Evaluation Conference
Abbreviated titleLREC2018
CountryJapan
CityMiyazaki
Period7/05/1812/05/18
Internet address

Abstract

Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantic taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource languages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotation tool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a large scale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semantic classification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsets into a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to 91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpus
and information processing at semantic level.