LexiDB: Patterns & Methods for Corpus Linguistic Database Management

Associated organisational units

Electronic data

LREC2020 (1)
Accepted author manuscript, 202 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
2020.lrec-1.383
Final published version, 263 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

LexiDB: Patterns & Methods for Corpus Linguistic Database Management. / Coole, Matthew ; Rayson, Paul ; Mariani, John.
Proceedings of The 12th Language Resources and Evaluation Conference. Paris: European Language Resources Association (ELRA), 2020. p. 3128-3135.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Coole, M , Rayson, P & Mariani, J 2020, LexiDB: Patterns & Methods for Corpus Linguistic Database Management. in Proceedings of The 12th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), Paris, pp. 3128-3135. <https://www.aclweb.org/anthology/2020.lrec-1.383>

APA

Coole, M., Rayson, P., & Mariani, J. (2020). LexiDB: Patterns & Methods for Corpus Linguistic Database Management. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 3128-3135). European Language Resources Association (ELRA). https://www.aclweb.org/anthology/2020.lrec-1.383

Vancouver

Coole M , Rayson P , Mariani J. LexiDB: Patterns & Methods for Corpus Linguistic Database Management. In Proceedings of The 12th Language Resources and Evaluation Conference. Paris: European Language Resources Association (ELRA). 2020. p. 3128-3135

Author

Coole, Matthew ; Rayson, Paul ; Mariani, John. / LexiDB: Patterns & Methods for Corpus Linguistic Database Management. Proceedings of The 12th Language Resources and Evaluation Conference. Paris : European Language Resources Association (ELRA), 2020. pp. 3128-3135

Bibtex

@inproceedings{269e01e8c5194702999156f77d42d15e,

title = "LexiDB: Patterns & Methods for Corpus Linguistic Database Management",

abstract = "LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database management systems (DBMSs), itis designed specifically for text corpora. It improves on other corpus management systems (CMSs) because data can be added anddeleted from corpora on the fly with the ability to add live data to existing corpora. LexiDB sits between these two categories ofDBMSs and CMSs, more specialised to language data than a general-purpose DBMS but more flexible than a traditional static corpusmanagement system. Previous work has demonstrated the scalability of LexiDB in response to the growing need to be able to scale outfor ever-growing corpus datasets. Here, we present the patterns and methods developed in LexiDB for storage, retrieval and querying ofmulti-level annotated corpus data. These techniques are evaluated and compared to an existing CMS (Corpus Workbench CWB - CQP)and indexer (Lucene). We find that LexiDB consistently outperforms existing tools for corpus queries. This is particularly apparent withlarge corpora and when handling queries with large result sets.",

author = "Matthew Coole and Paul Rayson and John Mariani",

year = "2020",

month = may,

day = "11",

language = "English",

isbn = "9791095546344",

pages = "3128--3135",

booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",

publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - LexiDB: Patterns & Methods for Corpus Linguistic Database Management

AU - Coole, Matthew

AU - Rayson, Paul

AU - Mariani, John

PY - 2020/5/11

Y1 - 2020/5/11

N2 - LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database management systems (DBMSs), itis designed specifically for text corpora. It improves on other corpus management systems (CMSs) because data can be added anddeleted from corpora on the fly with the ability to add live data to existing corpora. LexiDB sits between these two categories ofDBMSs and CMSs, more specialised to language data than a general-purpose DBMS but more flexible than a traditional static corpusmanagement system. Previous work has demonstrated the scalability of LexiDB in response to the growing need to be able to scale outfor ever-growing corpus datasets. Here, we present the patterns and methods developed in LexiDB for storage, retrieval and querying ofmulti-level annotated corpus data. These techniques are evaluated and compared to an existing CMS (Corpus Workbench CWB - CQP)and indexer (Lucene). We find that LexiDB consistently outperforms existing tools for corpus queries. This is particularly apparent withlarge corpora and when handling queries with large result sets.

AB - LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database management systems (DBMSs), itis designed specifically for text corpora. It improves on other corpus management systems (CMSs) because data can be added anddeleted from corpora on the fly with the ability to add live data to existing corpora. LexiDB sits between these two categories ofDBMSs and CMSs, more specialised to language data than a general-purpose DBMS but more flexible than a traditional static corpusmanagement system. Previous work has demonstrated the scalability of LexiDB in response to the growing need to be able to scale outfor ever-growing corpus datasets. Here, we present the patterns and methods developed in LexiDB for storage, retrieval and querying ofmulti-level annotated corpus data. These techniques are evaluated and compared to an existing CMS (Corpus Workbench CWB - CQP)and indexer (Lucene). We find that LexiDB consistently outperforms existing tools for corpus queries. This is particularly apparent withlarge corpora and when handling queries with large result sets.

M3 - Conference contribution/Paper

SN - 9791095546344

SP - 3128

EP - 3135

BT - Proceedings of The 12th Language Resources and Evaluation Conference

PB - European Language Resources Association (ELRA)

CY - Paris

ER -

Research

Associated organisational units

Electronic data

Links