Home > Research > Publications & Outputs > Part-of-speech ratios in English corpora.
View graph of relations

Part-of-speech ratios in English corpora.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Part-of-speech ratios in English corpora. / Hardie, Andrew.
In: International Journal of Corpus Linguistics, Vol. 12, No. 1, 2007, p. 55-81.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Hardie, A 2007, 'Part-of-speech ratios in English corpora.', International Journal of Corpus Linguistics, vol. 12, no. 1, pp. 55-81. https://doi.org/10.1075/ijcl.12.1.05har

APA

Hardie, A. (2007). Part-of-speech ratios in English corpora. International Journal of Corpus Linguistics, 12(1), 55-81. https://doi.org/10.1075/ijcl.12.1.05har

Vancouver

Hardie A. Part-of-speech ratios in English corpora. International Journal of Corpus Linguistics. 2007;12(1):55-81. doi: 10.1075/ijcl.12.1.05har

Author

Hardie, Andrew. / Part-of-speech ratios in English corpora. In: International Journal of Corpus Linguistics. 2007 ; Vol. 12, No. 1. pp. 55-81.

Bibtex

@article{4bc32b06d0d74d1f9e7f11869c2d0db1,
title = "Part-of-speech ratios in English corpora.",
abstract = "Using part-of-speech (POS) tagged corpora, Hudson (1994) reports that approximately 37% of English tokens are nouns, where 'noun' is a superordinate category including nouns, pronouns and other word-classes. It is argued here that difficulties relating to the boundaries of Hudson's 'noun' category demonstrate that there is no uncontroversial way to derive such a superordinate category from POS tagging. Decisions regarding the boundary of the 'noun' category have small but statistically significant effects on the ratio that emerges for 'nouns' as a whole. Tokenisation and categorisation differences between tagging schemes make it problematic to compare the ratio of 'nouns' across different tagsets. The precise figures for POS ratios are therefore effectively artefacts of the tagset. However, these objections to the use of POS ratios do not apply to their use as a metric of variation for comparing data-sets tagged with the same tagging scheme.",
author = "Andrew Hardie",
year = "2007",
doi = "10.1075/ijcl.12.1.05har",
language = "English",
volume = "12",
pages = "55--81",
journal = "International Journal of Corpus Linguistics",
issn = "1384-6655",
publisher = "John Benjamins Publishing Company",
number = "1",

}

RIS

TY - JOUR

T1 - Part-of-speech ratios in English corpora.

AU - Hardie, Andrew

PY - 2007

Y1 - 2007

N2 - Using part-of-speech (POS) tagged corpora, Hudson (1994) reports that approximately 37% of English tokens are nouns, where 'noun' is a superordinate category including nouns, pronouns and other word-classes. It is argued here that difficulties relating to the boundaries of Hudson's 'noun' category demonstrate that there is no uncontroversial way to derive such a superordinate category from POS tagging. Decisions regarding the boundary of the 'noun' category have small but statistically significant effects on the ratio that emerges for 'nouns' as a whole. Tokenisation and categorisation differences between tagging schemes make it problematic to compare the ratio of 'nouns' across different tagsets. The precise figures for POS ratios are therefore effectively artefacts of the tagset. However, these objections to the use of POS ratios do not apply to their use as a metric of variation for comparing data-sets tagged with the same tagging scheme.

AB - Using part-of-speech (POS) tagged corpora, Hudson (1994) reports that approximately 37% of English tokens are nouns, where 'noun' is a superordinate category including nouns, pronouns and other word-classes. It is argued here that difficulties relating to the boundaries of Hudson's 'noun' category demonstrate that there is no uncontroversial way to derive such a superordinate category from POS tagging. Decisions regarding the boundary of the 'noun' category have small but statistically significant effects on the ratio that emerges for 'nouns' as a whole. Tokenisation and categorisation differences between tagging schemes make it problematic to compare the ratio of 'nouns' across different tagsets. The precise figures for POS ratios are therefore effectively artefacts of the tagset. However, these objections to the use of POS ratios do not apply to their use as a metric of variation for comparing data-sets tagged with the same tagging scheme.

U2 - 10.1075/ijcl.12.1.05har

DO - 10.1075/ijcl.12.1.05har

M3 - Journal article

VL - 12

SP - 55

EP - 81

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

SN - 1384-6655

IS - 1

ER -