Rights statement: This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY-NC-ND 3.0)
Final published version, 578 KB, PDF document
Available under license: CC BY-NC-ND
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Modest XML for corpora
T2 - not a standard, but a suggestion
AU - Hardie, Andrew
N1 - This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY-NC-ND 3.0)
PY - 2014/4
Y1 - 2014/4
N2 - This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need to engage in corpus construction. While extensive standards for corpus encoding exist - most notably, the Text Encoding Initiative’s Guidelines and the Corpus Encoding Standard based on them - these are rather heavyweight approaches, implicitly intended for major corpus-building projects, which are rather different from the increasingly common efforts in corpus construction undertaken by individual researchers in support of their personal research goals. Therefore, there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding. This paper presents such a set of suggestions, dubbed Modest XML for Corpora, and posits that such a set of pointers to a limited level of XML knowledge could work as part of the normal, general training of corpus linguists.The Modest XML recommendations cover the following set of things, which, according to the foregoing argument, are sufficient knowledge about XML for most corpus linguists’ day-to-day needs: use of tags; adding attribute value pairs; recommended use of attributes; nesting of tags; encoding of special characters; XML well-formedness; a collection of de facto standard tags and attributes; going beyond the basic de facto standard tags; and text headers.
AB - This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need to engage in corpus construction. While extensive standards for corpus encoding exist - most notably, the Text Encoding Initiative’s Guidelines and the Corpus Encoding Standard based on them - these are rather heavyweight approaches, implicitly intended for major corpus-building projects, which are rather different from the increasingly common efforts in corpus construction undertaken by individual researchers in support of their personal research goals. Therefore, there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding. This paper presents such a set of suggestions, dubbed Modest XML for Corpora, and posits that such a set of pointers to a limited level of XML knowledge could work as part of the normal, general training of corpus linguists.The Modest XML recommendations cover the following set of things, which, according to the foregoing argument, are sufficient knowledge about XML for most corpus linguists’ day-to-day needs: use of tags; adding attribute value pairs; recommended use of attributes; nesting of tags; encoding of special characters; XML well-formedness; a collection of de facto standard tags and attributes; going beyond the basic de facto standard tags; and text headers.
U2 - 10.2478/icame-2014-0004
DO - 10.2478/icame-2014-0004
M3 - Journal article
VL - 38
SP - 73
EP - 103
JO - ICAME Journal
JF - ICAME Journal
SN - 1502-5462
IS - 1
ER -