Home > Research > Publications & Outputs > A morphosyntactic categorisation scheme for the...
View graph of relations

A morphosyntactic categorisation scheme for the automated analysis of Nepali

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNChapter (peer-reviewed)

Publication date2009
Host publicationAnnual Review of South Asian Languages and Linguistics 2009
EditorsRajendra Singh
Place of PublicationBerlin
PublisherMouton de Gruyter
Number of pages26
ISBN (Print)9783110225594
<mark>Original language</mark>English

Publication series

NameTrends in linguistics. Studies and monographs
PublisherMouton de Gruyter


This paper describes the linguistic rationale underlying the part-of-speech
tagset used for tagging the Nepali National Corpus. In particular, three
conceptually complex areas are discussed in detail. In the first place, the
nature of Nepali postpositions is explored, and the approach that the tagset
takes to them – in which postpositions are tokenised separately to the
nouns or other words to which they are attached – is justified. A similar
exploration of gender marking, however, supports an opposite approach,
where gender is treated as a feature of the word on which it is marked, and
indicated in that word’s tag. It is further argued that an inconsistent
treatment of gender on nouns, as opposed to adjectives and other words
that agree with nouns, is justified for Nepali. Thirdly, the very great
complexity of Nepali verb inflection (some of it created by very productive
compounding) is shown to necessitate the use, within the tagset, of a
simplified model of the Nepali verb. A brief analysis of the similarities and
differences between this tagset and part-of-speech annotation schemes for
some closely related is undertaken. Finally, the implementation of the
tagset in an automated tagging system is summarised and some directions
for future work outlined.