Sublanguage corpus analysis toolkit - Research Portal

Linguistics and English Language

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Irina Temnikova
William Baumgartner
Negacy Hailu
Ivelina Nikolova
Tony McEnery
Adam Kilgarriff
Galia Angelova
Bretonnel Cohen

Publication date	1/05/2014
Host publication	Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Place of Publication	Reykjavik
Publisher	European Language Resources Association (ELRA)
Number of pages	5
Volume	2014
ISBN (print)	9782951740884
<mark>Original language</mark>	English

Abstract

Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.

Research

Associated organisational unit

Links

Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us