Home > Research > Publications & Outputs > AraSAS

Links

View graph of relations

AraSAS: The Open Source Arabic Semantic Tagger

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published
Close
Publication date15/06/2022
Number of pages8
Pages23-31
<mark>Original language</mark>English
EventOpen-Source Arabic Corpora and Processing Tools - France, Marseille, France
Duration: 20/06/202220/06/2022
Conference number: 5
https://osact-lrec.github.io/

Workshop

WorkshopOpen-Source Arabic Corpora and Processing Tools
Abbreviated titleOSACT 2022
Country/TerritoryFrance
CityMarseille
Period20/06/2220/06/22
Internet address

Abstract

This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system. AraSAS is a software framework that provides full semantic tagging of text written in Arabic. AraSAS is based on the UCREL Semantic Analysis System (USAS) which was first developed to semantically tag English text. Similarly to USAS, AraSAS uses a hierarchical semantic tag set that contains 21 major discourse fields and 232 fine-grained semantic field tags. The paper describes the creation, validation and evaluation of AraSAS. In addition, we demonstrate a first case study to illustrate the affordances of applying USAS and AraSAS semantic taggers on the Zayed University Arabic-English Bilingual Undergraduate Corpus (ZAEBUC) (Palfreyman and Habash, 2022), where we show and compare the coverage of the two semantic taggers through running them on Arabic and English essays on different topics. The analysis expands to compare the taggers when run on texts in Arabic and English written by the same writer and texts written by male and by female students. Variables for comparison include frequency of use of particular semantic sub-domains, as well as the diversity of semantic elements within a text.