Home > Research > Publications & Outputs > EMILLE, A 67-million word corpus of indic langu...
View graph of relations

EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published

Standard

EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. / Baker, Paul; Hardie, Andrew; McEnery, Tony et al.
2002. 819-825 Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Baker, P, Hardie, A, McEnery, T, Cunningham, H & Gaizauskas, R 2002, 'EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation', Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain, 29/05/02 - 31/05/02 pp. 819-825. <http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf>

APA

Baker, P., Hardie, A., McEnery, T., Cunningham, H., & Gaizauskas, R. (2002). EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 819-825. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain. http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf

Vancouver

Baker P, Hardie A, McEnery T, Cunningham H, Gaizauskas R. EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 2002. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.

Author

Baker, Paul ; Hardie, Andrew ; McEnery, Tony et al. / EMILLE, A 67-million word corpus of indic languages : Data collection, mark-up and harmonisation. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.7 p.

Bibtex

@conference{2bcecb22af334b6181bf4a6e389fdf82,
title = "EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation",
abstract = "The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.",
author = "Paul Baker and Andrew Hardie and Tony McEnery and Hamish Cunningham and Rob Gaizauskas",
year = "2002",
month = jan,
day = "1",
language = "English",
pages = "819--825",
note = "3rd International Conference on Language Resources and Evaluation, LREC 2002 ; Conference date: 29-05-2002 Through 31-05-2002",

}

RIS

TY - CONF

T1 - EMILLE, A 67-million word corpus of indic languages

T2 - 3rd International Conference on Language Resources and Evaluation, LREC 2002

AU - Baker, Paul

AU - Hardie, Andrew

AU - McEnery, Tony

AU - Cunningham, Hamish

AU - Gaizauskas, Rob

PY - 2002/1/1

Y1 - 2002/1/1

N2 - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.

AB - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.

M3 - Conference paper

AN - SCOPUS:85000764797

SP - 819

EP - 825

Y2 - 29 May 2002 through 31 May 2002

ER -