Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging

Linguistics and English Language

Electronic data

Hardie & Ibrahim on 'Kana' - Authors' Final Version
Rights statement: This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: https://www.euppublishing.com/doi/abs/10.3366/cor.2021.0225
Accepted author manuscript, 1.11 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.3366/cor.2021.0225
Final published version

Keywords

Arabic, aspect, auxiliary, copula, syntax, tense

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging. / Hardie, Andrew ; Ibrahim, Wesam.
In: Corpora, Vol. 16, No. 3, 30.11.2021, p. 305-335.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{2c3696c4a3e447a68e46a150036948d2,

title = "Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging",

abstract = "Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna, {\textquoteleft}be{\textquoteright}, functions additionally as an auxiliary, creating periphrastic tense-aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the million-word Leeds Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.{\textquoteright}s 2012 MADA disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10% samples (499 instances of copula kāna, 387 of auxiliary kāna) are manually analysed to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns{\textquoteright} main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, substantial new detail, not mentioned in existing grammars, is discovered (e.g. the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna); there exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions, but also pedagogy of Arabic as a first or second/foreign language. ",

keywords = "Arabic, aspect, auxiliary, copula, syntax, tense",

author = "Andrew Hardie and Wesam Ibrahim",

note = "This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: https://www.euppublishing.com/doi/abs/10.3366/cor.2021.0225",

year = "2021",

month = nov,

day = "30",

doi = "10.3366/cor.2021.0225",

language = "English",

volume = "16",

pages = "305--335",

journal = "Corpora",

issn = "1749-5032",

publisher = "Edinburgh University Press",

number = "3",

}

RIS

TY - JOUR

T1 - Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging

AU - Hardie, Andrew

AU - Ibrahim, Wesam

N1 - This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: https://www.euppublishing.com/doi/abs/10.3366/cor.2021.0225

PY - 2021/11/30

Y1 - 2021/11/30

N2 - Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna, ‘be’, functions additionally as an auxiliary, creating periphrastic tense-aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the million-word Leeds Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s 2012 MADA disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10% samples (499 instances of copula kāna, 387 of auxiliary kāna) are manually analysed to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, substantial new detail, not mentioned in existing grammars, is discovered (e.g. the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna); there exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions, but also pedagogy of Arabic as a first or second/foreign language.

AB - Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna, ‘be’, functions additionally as an auxiliary, creating periphrastic tense-aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the million-word Leeds Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s 2012 MADA disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10% samples (499 instances of copula kāna, 387 of auxiliary kāna) are manually analysed to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, substantial new detail, not mentioned in existing grammars, is discovered (e.g. the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna); there exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions, but also pedagogy of Arabic as a first or second/foreign language.

KW - Arabic

KW - aspect

KW - auxiliary

KW - copula

KW - syntax

KW - tense

U2 - 10.3366/cor.2021.0225

DO - 10.3366/cor.2021.0225

M3 - Journal article

VL - 16

SP - 305

EP - 335

JO - Corpora

JF - Corpora

SN - 1749-5032

IS - 3

ER -

Research

Electronic data

Links

Text available via DOI:

Keywords