The Lancaster Corpus of Mandarin Chinese - Research Portal

Linguistics and English Language

Keywords

corpus, Chinese, contrastive study

The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Published

Publication date	05/2004
Host publication	LREC
Number of pages	0
<mark>Original language</mark>	English
Event	LREC 2004 - Lisbon, Portugal Duration: 24/05/2004 → 30/05/2004

Conference

Conference	LREC 2004
City	Lisbon, Portugal
Period	24/05/04 → 30/05/04

Conference

Conference	LREC 2004
City	Lisbon, Portugal
Period	24/05/04 → 30/05/04

Abstract

This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.

Research

Keywords

The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study

Conference

Conference

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us