Home > Research > Publications & Outputs > Character encoding in corpus construction.

Electronic data

View graph of relations

Character encoding in corpus construction.

Research output: Contribution in Book/Report/ProceedingsChapter

Published
Publication date2005
Host publicationDeveloping Linguistic Corpora : A Guide to Good Practice
EditorsM. Wynne
Place of PublicationOxford, UK
PublisherAHDS
Number of pages0
<mark>Original language</mark>English

Abstract

This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.

Bibliographic note

Standards Documentation