Home > Research > Publications & Outputs > Sublanguage corpus analysis toolkit
View graph of relations

Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Sublanguage corpus analysis toolkit : a tool for assessing the representativeness and sublanguage characteristics of corpora. / Temnikova, Irina; Baumgartner, William; Hailu, Negacy; Nikolova, Ivelina; McEnery, Tony; Kilgarriff, Adam; Angelova, Galia; Cohen, Bretonnel.

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Vol. 2014 Reykjavik : European Language Resources Association (ELRA), 2014.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Temnikova, I, Baumgartner, W, Hailu, N, Nikolova, I, McEnery, T, Kilgarriff, A, Angelova, G & Cohen, B 2014, Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora. in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). vol. 2014, European Language Resources Association (ELRA), Reykjavik. <http://www.lrec-conf.org/proceedings/lrec2014/index.html>

APA

Temnikova, I., Baumgartner, W., Hailu, N., Nikolova, I., McEnery, T., Kilgarriff, A., Angelova, G., & Cohen, B. (2014). Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) (Vol. 2014). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2014/index.html

Vancouver

Temnikova I, Baumgartner W, Hailu N, Nikolova I, McEnery T, Kilgarriff A et al. Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Vol. 2014. Reykjavik: European Language Resources Association (ELRA). 2014

Author

Temnikova, Irina ; Baumgartner, William ; Hailu, Negacy ; Nikolova, Ivelina ; McEnery, Tony ; Kilgarriff, Adam ; Angelova, Galia ; Cohen, Bretonnel. / Sublanguage corpus analysis toolkit : a tool for assessing the representativeness and sublanguage characteristics of corpora. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Vol. 2014 Reykjavik : European Language Resources Association (ELRA), 2014.

Bibtex

@inproceedings{71902092680d4bda850602c94b00251d,
title = "Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora",
abstract = " Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license. ",
author = "Irina Temnikova and William Baumgartner and Negacy Hailu and Ivelina Nikolova and Tony McEnery and Adam Kilgarriff and Galia Angelova and Bretonnel Cohen",
year = "2014",
month = may,
day = "1",
language = "English",
isbn = "9782951740884",
volume = "2014",
booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)",
publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - Sublanguage corpus analysis toolkit

T2 - a tool for assessing the representativeness and sublanguage characteristics of corpora

AU - Temnikova, Irina

AU - Baumgartner, William

AU - Hailu, Negacy

AU - Nikolova, Ivelina

AU - McEnery, Tony

AU - Kilgarriff, Adam

AU - Angelova, Galia

AU - Cohen, Bretonnel

PY - 2014/5/1

Y1 - 2014/5/1

N2 - Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.

AB - Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.

M3 - Conference contribution/Paper

SN - 9782951740884

VL - 2014

BT - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

PB - European Language Resources Association (ELRA)

CY - Reykjavik

ER -