Standard
Harvard
Baker, P, Hardie, A, McEnery, T, Cunningham, H & Gaizauskas, R 2002, '
EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation', Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain,
29/05/02 -
31/05/02 pp. 819-825. <
http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf>
APA
Baker, P., Hardie, A., McEnery, T., Cunningham, H., & Gaizauskas, R. (2002).
EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 819-825. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.
http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf
Vancouver
Baker P, Hardie A, McEnery T, Cunningham H, Gaizauskas R.
EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 2002. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.
Author
Bibtex
@conference{2bcecb22af334b6181bf4a6e389fdf82,
title = "EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation",
abstract = "The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.",
author = "Paul Baker and Andrew Hardie and Tony McEnery and Hamish Cunningham and Rob Gaizauskas",
year = "2002",
month = jan,
day = "1",
language = "English",
pages = "819--825",
note = "3rd International Conference on Language Resources and Evaluation, LREC 2002 ; Conference date: 29-05-2002 Through 31-05-2002",
}
RIS
TY - CONF
T1 - EMILLE, A 67-million word corpus of indic languages
T2 - 3rd International Conference on Language Resources and Evaluation, LREC 2002
AU - Baker, Paul
AU - Hardie, Andrew
AU - McEnery, Tony
AU - Cunningham, Hamish
AU - Gaizauskas, Rob
PY - 2002/1/1
Y1 - 2002/1/1
N2 - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.
AB - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.
M3 - Conference paper
AN - SCOPUS:85000764797
SP - 819
EP - 825
Y2 - 29 May 2002 through 31 May 2002
ER -