Creating language resources for under-resourced languages - Research Portal

Computing and Communications

Associated organisational units

Electronic data

ELHAJ_LREV
Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-014-9274-3
Accepted author manuscript, 1.48 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1007/s10579-014-9274-3
Final published version

Keywords

Resources, Summarisation, Arabic, Under-resourced languages

View graph of relations

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic. / El-Haj, Mahmoud; Kruschwitz, Udo; Fox, Chris.
In: Language Resources and Evaluation, Vol. 49, No. 3, 09.2015, p. 549-580.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

El-Haj, M, Kruschwitz, U & Fox, C 2015, 'Creating language resources for under-resourced languages: methodologies, and experiments with Arabic', Language Resources and Evaluation, vol. 49, no. 3, pp. 549-580. https://doi.org/10.1007/s10579-014-9274-3

APA

El-Haj, M., Kruschwitz, U., & Fox, C. (2015). Creating language resources for under-resourced languages: methodologies, and experiments with Arabic. Language Resources and Evaluation, 49(3), 549-580. https://doi.org/10.1007/s10579-014-9274-3

Vancouver

El-Haj M, Kruschwitz U, Fox C. Creating language resources for under-resourced languages: methodologies, and experiments with Arabic. Language Resources and Evaluation. 2015 Sept;49(3):549-580. Epub 2014 Aug 9. doi: 10.1007/s10579-014-9274-3

Author

El-Haj, Mahmoud ; Kruschwitz, Udo ; Fox, Chris. / Creating language resources for under-resourced languages : methodologies, and experiments with Arabic. In: Language Resources and Evaluation. 2015 ; Vol. 49, No. 3. pp. 549-580.

Bibtex

@article{e5d78ef4854f4f49ad613e2044e4fc51,

title = "Creating language resources for under-resourced languages: methodologies, and experiments with Arabic",

abstract = "Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning,information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.",

keywords = "Resources, Summarisation, Arabic, Under-resourced languages",

author = "Mahmoud El-Haj and Udo Kruschwitz and Chris Fox",

note = "The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-014-9274-3",

year = "2015",

month = sep,

doi = "10.1007/s10579-014-9274-3",

language = "English",

volume = "49",

pages = "549--580",

journal = "Language Resources and Evaluation",

issn = "1574-020X",

publisher = "Springer Netherlands",

number = "3",

}

RIS

TY - JOUR

T1 - Creating language resources for under-resourced languages

T2 - methodologies, and experiments with Arabic

AU - El-Haj, Mahmoud

AU - Kruschwitz, Udo

AU - Fox, Chris

N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-014-9274-3

PY - 2015/9

Y1 - 2015/9

N2 - Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning,information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.

AB - Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning,information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.

KW - Resources

KW - Summarisation

KW - Arabic

KW - Under-resourced languages

U2 - 10.1007/s10579-014-9274-3

DO - 10.1007/s10579-014-9274-3

M3 - Journal article

VL - 49

SP - 549

EP - 580

JO - Language Resources and Evaluation

JF - Language Resources and Evaluation

SN - 1574-020X

IS - 3

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us