LexiDB: Patterns & Methods for Corpus Linguistic Database Management

Associated organisational units

Electronic data

LREC2020 (1)
Accepted author manuscript, 202 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
2020.lrec-1.383
Final published version, 263 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Publication date	11/05/2020
Host publication	Proceedings of The 12th Language Resources and Evaluation Conference
Place of Publication	Paris
Publisher	European Language Resources Association (ELRA)
Pages	3128-3135
Number of pages	8
ISBN (print)	9791095546344
<mark>Original language</mark>	English

Abstract

LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database management systems (DBMSs), itis designed specifically for text corpora. It improves on other corpus management systems (CMSs) because data can be added anddeleted from corpora on the fly with the ability to add live data to existing corpora. LexiDB sits between these two categories ofDBMSs and CMSs, more specialised to language data than a general-purpose DBMS but more flexible than a traditional static corpusmanagement system. Previous work has demonstrated the scalability of LexiDB in response to the growing need to be able to scale outfor ever-growing corpus datasets. Here, we present the patterns and methods developed in LexiDB for storage, retrieval and querying ofmulti-level annotated corpus data. These techniques are evaluated and compared to an existing CMS (Corpus Workbench CWB - CQP)and indexer (Lucene). We find that LexiDB consistently outperforms existing tools for corpus queries. This is particularly apparent withlarge corpora and when handling queries with large result sets.

Research

Associated organisational units

Electronic data

Links

LexiDB: Patterns & Methods for Corpus Linguistic Database Management

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us