The Lancaster Corpus of Mandarin Chinese - Research Portal

Linguistics and English Language

Keywords

corpus, Chinese, contrastive study

The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Published

Standard

The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study. / McEnery, A. M.; Xiao, R. Z.
LREC . 2004.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Bibtex

@inbook{509ae7e76c8f4064b26545de413db433,

title = "The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study",

abstract = "This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.",

keywords = "corpus, Chinese, contrastive study",

author = "McEnery, {A. M.} and Xiao, {R. Z.}",

year = "2004",

month = may,

language = "English",

booktitle = "LREC",

note = "LREC 2004 ; Conference date: 24-05-2004 Through 30-05-2004",

}

RIS

TY - CHAP

T1 - The Lancaster Corpus of Mandarin Chinese

T2 - LREC 2004

AU - McEnery, A. M.

AU - Xiao, R. Z.

PY - 2004/5

Y1 - 2004/5

N2 - This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.

AB - This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.

KW - corpus

KW - Chinese

KW - contrastive study

M3 - Chapter

BT - LREC

Y2 - 24 May 2004 through 30 May 2004

ER -

Research

Keywords

The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us