Home > Research > Publications & Outputs > Open-Source Thesaurus Development for Under-Res...

Links

View graph of relations

Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published

Standard

Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study. / Khallaf, Nouran; Arfon, Elin; El-Haj, Mahmoud et al.
2023. Paper presented at Language, Data and Knowledge, Vienna, Austria.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Khallaf, N, Arfon, E, El-Haj, M, Morris, J, Knight, D, Rayson, P, Hammouda, T & Jarrar, M 2023, 'Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study', Paper presented at Language, Data and Knowledge, Vienna, Austria, 12/09/23 - 15/09/23. <https://aclanthology.org/2023.ldk-1.30>

APA

Khallaf, N., Arfon, E., El-Haj, M., Morris, J., Knight, D., Rayson, P., Hammouda, T., & Jarrar, M. (2023). Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study. Paper presented at Language, Data and Knowledge, Vienna, Austria. https://aclanthology.org/2023.ldk-1.30

Vancouver

Khallaf N, Arfon E, El-Haj M, Morris J, Knight D, Rayson P et al.. Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study. 2023. Paper presented at Language, Data and Knowledge, Vienna, Austria.

Author

Khallaf, Nouran ; Arfon, Elin ; El-Haj, Mahmoud et al. / Open-Source Thesaurus Development for Under-Resourced Languages : a Welsh Case Study. Paper presented at Language, Data and Knowledge, Vienna, Austria.

Bibtex

@conference{df21b95ca9a94dbd86e27361a5168d22,
title = "Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study",
abstract = "This paper introduces an open-access, user- friendly online thesaurus for the Welsh language, aimed at enriching digital resources for Welsh speakers and learners. Utilising advances in Natural Language Processing (NLP), our approach combines pre-existing word em- beddings, a Welsh semantic tagger, and human evaluation to establish related terms. In this case, an initial list of 250 words was expanded by adding 6,953 synonyms provided by linguists, creating a more extensive foundation for building the gold-standards. With this expanded list, when a user queries a particular word, the thesaurus presents all of its synonyms, allowing them to choose from a wider range of options. This is especially helpful when a user is unsure of the exact word they want to use or wants to explore different ways to ex- press a concept. The resulting thesaurus offers a comprehensive, reliable resource for Welsh language users, fostering enhanced communication and expression. Our work promotes Welsh NLP and showcases NLP{\textquoteright}s potential to support under-resourced languages. The thesaurus will be accessible via a bilingual website, and the ac- companying Python code will be available in a bilingual, public GitHub repository, and it will be available as a web service. Our approach presents a more efficient, cost-effective method for thesaurus creation, with potential applicability to other under-resourced languages.",
author = "Nouran Khallaf and Elin Arfon and Mahmoud El-Haj and Jonathan Morris and Dawn Knight and Paul Rayson and Tymaa Hammouda and Mustafa Jarrar",
year = "2023",
month = sep,
day = "14",
language = "English",
note = "Language, Data and Knowledge, LDK 2023 ; Conference date: 12-09-2023 Through 15-09-2023",
url = "http://2023.ldk-conf.org/",

}

RIS

TY - CONF

T1 - Open-Source Thesaurus Development for Under-Resourced Languages

T2 - Language, Data and Knowledge

AU - Khallaf, Nouran

AU - Arfon, Elin

AU - El-Haj, Mahmoud

AU - Morris, Jonathan

AU - Knight, Dawn

AU - Rayson, Paul

AU - Hammouda, Tymaa

AU - Jarrar, Mustafa

N1 - Conference code: 4

PY - 2023/9/14

Y1 - 2023/9/14

N2 - This paper introduces an open-access, user- friendly online thesaurus for the Welsh language, aimed at enriching digital resources for Welsh speakers and learners. Utilising advances in Natural Language Processing (NLP), our approach combines pre-existing word em- beddings, a Welsh semantic tagger, and human evaluation to establish related terms. In this case, an initial list of 250 words was expanded by adding 6,953 synonyms provided by linguists, creating a more extensive foundation for building the gold-standards. With this expanded list, when a user queries a particular word, the thesaurus presents all of its synonyms, allowing them to choose from a wider range of options. This is especially helpful when a user is unsure of the exact word they want to use or wants to explore different ways to ex- press a concept. The resulting thesaurus offers a comprehensive, reliable resource for Welsh language users, fostering enhanced communication and expression. Our work promotes Welsh NLP and showcases NLP’s potential to support under-resourced languages. The thesaurus will be accessible via a bilingual website, and the ac- companying Python code will be available in a bilingual, public GitHub repository, and it will be available as a web service. Our approach presents a more efficient, cost-effective method for thesaurus creation, with potential applicability to other under-resourced languages.

AB - This paper introduces an open-access, user- friendly online thesaurus for the Welsh language, aimed at enriching digital resources for Welsh speakers and learners. Utilising advances in Natural Language Processing (NLP), our approach combines pre-existing word em- beddings, a Welsh semantic tagger, and human evaluation to establish related terms. In this case, an initial list of 250 words was expanded by adding 6,953 synonyms provided by linguists, creating a more extensive foundation for building the gold-standards. With this expanded list, when a user queries a particular word, the thesaurus presents all of its synonyms, allowing them to choose from a wider range of options. This is especially helpful when a user is unsure of the exact word they want to use or wants to explore different ways to ex- press a concept. The resulting thesaurus offers a comprehensive, reliable resource for Welsh language users, fostering enhanced communication and expression. Our work promotes Welsh NLP and showcases NLP’s potential to support under-resourced languages. The thesaurus will be accessible via a bilingual website, and the ac- companying Python code will be available in a bilingual, public GitHub repository, and it will be available as a web service. Our approach presents a more efficient, cost-effective method for thesaurus creation, with potential applicability to other under-resourced languages.

M3 - Conference paper

Y2 - 12 September 2023 through 15 September 2023

ER -