Character encoding in corpus construction.

Linguistics and English Language

Electronic data

character_encoding.pdf
125 KB, PDF document

Keywords

character encoding, Unicode, corpus creation

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Published

Standard

Character encoding in corpus construction. / McEnery, A. M.; Xiao, R. Z.
Developing Linguistic Corpora : A Guide to Good Practice. ed. / M. Wynne. Oxford, UK: AHDS, 2005.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter

Harvard

McEnery, AM & Xiao, RZ 2005, Character encoding in corpus construction. in M Wynne (ed.), Developing Linguistic Corpora : A Guide to Good Practice. AHDS, Oxford, UK.

APA

McEnery, A. M., & Xiao, R. Z. (2005). Character encoding in corpus construction. In M. Wynne (Ed.), Developing Linguistic Corpora : A Guide to Good Practice AHDS.

Vancouver

McEnery AM , Xiao RZ. Character encoding in corpus construction. In Wynne M, editor, Developing Linguistic Corpora : A Guide to Good Practice. Oxford, UK: AHDS. 2005

Author

McEnery, A. M. ; Xiao, R. Z. / Character encoding in corpus construction. Developing Linguistic Corpora : A Guide to Good Practice. editor / M. Wynne. Oxford, UK : AHDS, 2005.

Bibtex

@inbook{82061abea2f4414597b1328eebdd9f55,

title = "Character encoding in corpus construction.",

abstract = "This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.",

keywords = "character encoding, Unicode, corpus creation",

author = "McEnery, {A. M.} and Xiao, {R. Z.}",

note = "Standards Documentation",

year = "2005",

language = "English",

editor = "M. Wynne",

booktitle = "Developing Linguistic Corpora : A Guide to Good Practice",

publisher = "AHDS",

}

RIS

TY - CHAP

T1 - Character encoding in corpus construction.

AU - McEnery, A. M.

AU - Xiao, R. Z.

N1 - Standards Documentation

PY - 2005

Y1 - 2005

N2 - This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.

AB - This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.

KW - character encoding

KW - Unicode

KW - corpus creation

M3 - Chapter

BT - Developing Linguistic Corpora : A Guide to Good Practice

A2 - Wynne, M.

PB - AHDS

CY - Oxford, UK

ER -

Research

Electronic data

Keywords