Text and speech corpora for natural language processing and corpus linguistics

Biomedical and Life Sciences

Keywords

Natural Language Processing, Corpus Linguistics, corpora, Artificial Intelligence, Machine Learning, Bioinformatics

View graph of relations

Research output: Contribution to Journal/Magazine › Special issue › peer-review

Published

Dina Demner-Fushman (Editor)
Derek Gatherer (Editor)
Jian Wu (Editor)

More...

<mark>Journal publication date</mark>	24/07/2025
<mark>Journal</mark>	Scientific Data
Volume	Special Collection
Publication Status	Published
<mark>Original language</mark>	English

Abstract

Corpus Linguistics (CL) and Natural Language Processing (NLP) are two of the transformative forces in research across the sciences and humanities, reshaping how insights are gleaned from vast text and speech datasets. Their applications span the natural, medical, social and applied sciences, leading the cutting edge in fields such as healthcare diagnostics, biomedicine, environmental science, and computer vision. This Collection presents a series of annotated text and speech corpora alongside linguistic models tailored for CL and NLP applications. These resources aim to enrich the arsenals of CL and NLP users and facilitate interdisciplinary research.

Research

Links

Keywords