Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter (peer-reviewed)
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter (peer-reviewed)
}
TY - CHAP
T1 - A morphosyntactic categorisation scheme for the automated analysis of Nepali
AU - Hardie, Andrew
AU - Lohani, Ram Raj
AU - Regmi, Bhim N.
AU - Yadava, Yogendra P.
PY - 2009
Y1 - 2009
N2 - This paper describes the linguistic rationale underlying the part-of-speechtagset used for tagging the Nepali National Corpus. In particular, threeconceptually complex areas are discussed in detail. In the first place, thenature of Nepali postpositions is explored, and the approach that the tagsettakes to them – in which postpositions are tokenised separately to thenouns or other words to which they are attached – is justified. A similarexploration of gender marking, however, supports an opposite approach,where gender is treated as a feature of the word on which it is marked, andindicated in that word’s tag. It is further argued that an inconsistenttreatment of gender on nouns, as opposed to adjectives and other wordsthat agree with nouns, is justified for Nepali. Thirdly, the very greatcomplexity of Nepali verb inflection (some of it created by very productivecompounding) is shown to necessitate the use, within the tagset, of asimplified model of the Nepali verb. A brief analysis of the similarities anddifferences between this tagset and part-of-speech annotation schemes forsome closely related is undertaken. Finally, the implementation of thetagset in an automated tagging system is summarised and some directionsfor future work outlined.
AB - This paper describes the linguistic rationale underlying the part-of-speechtagset used for tagging the Nepali National Corpus. In particular, threeconceptually complex areas are discussed in detail. In the first place, thenature of Nepali postpositions is explored, and the approach that the tagsettakes to them – in which postpositions are tokenised separately to thenouns or other words to which they are attached – is justified. A similarexploration of gender marking, however, supports an opposite approach,where gender is treated as a feature of the word on which it is marked, andindicated in that word’s tag. It is further argued that an inconsistenttreatment of gender on nouns, as opposed to adjectives and other wordsthat agree with nouns, is justified for Nepali. Thirdly, the very greatcomplexity of Nepali verb inflection (some of it created by very productivecompounding) is shown to necessitate the use, within the tagset, of asimplified model of the Nepali verb. A brief analysis of the similarities anddifferences between this tagset and part-of-speech annotation schemes forsome closely related is undertaken. Finally, the implementation of thetagset in an automated tagging system is summarised and some directionsfor future work outlined.
M3 - Chapter (peer-reviewed)
SN - 9783110225594
T3 - Trends in linguistics. Studies and monographs
SP - 171
EP - 196
BT - Annual Review of South Asian Languages and Linguistics 2009
A2 - Singh, Rajendra
PB - Mouton de Gruyter
CY - Berlin
ER -