A morphosyntactic categorisation scheme for the automated analysis of Nepali

Linguistics and English Language

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter (peer-reviewed)

Published

Andrew Hardie
Ram Raj Lohani
Bhim N. Regmi
Yogendra P. Yadava

More...

Publication date	2009
Host publication	Annual Review of South Asian Languages and Linguistics 2009
Editors	Rajendra Singh
Place of Publication	Berlin
Publisher	Mouton de Gruyter
Pages	171-196
Number of pages	26
ISBN (print)	9783110225594
<mark>Original language</mark>	English

Publication series

Name	Trends in linguistics. Studies and monographs
Publisher	Mouton de Gruyter
Volume	222

Abstract

This paper describes the linguistic rationale underlying the part-of-speech
tagset used for tagging the Nepali National Corpus. In particular, three
conceptually complex areas are discussed in detail. In the first place, the
nature of Nepali postpositions is explored, and the approach that the tagset
takes to them – in which postpositions are tokenised separately to the
nouns or other words to which they are attached – is justified. A similar
exploration of gender marking, however, supports an opposite approach,
where gender is treated as a feature of the word on which it is marked, and
indicated in that word’s tag. It is further argued that an inconsistent
treatment of gender on nouns, as opposed to adjectives and other words
that agree with nouns, is justified for Nepali. Thirdly, the very great
complexity of Nepali verb inflection (some of it created by very productive
compounding) is shown to necessitate the use, within the tagset, of a
simplified model of the Nepali verb. A brief analysis of the similarities and
differences between this tagset and part-of-speech annotation schemes for
some closely related is undertaken. Finally, the implementation of the
tagset in an automated tagging system is summarised and some directions
for future work outlined.

Research

Associated organisational unit