Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter
}
TY - CHAP
T1 - Character encoding in corpus construction.
AU - McEnery, A. M.
AU - Xiao, R. Z.
N1 - Standards Documentation
PY - 2005
Y1 - 2005
N2 - This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.
AB - This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.
KW - character encoding
KW - Unicode
KW - corpus creation
M3 - Chapter
BT - Developing Linguistic Corpora : A Guide to Good Practice
A2 - Wynne, M.
PB - AHDS
CY - Oxford, UK
ER -