EMILLE, A 67-million word corpus of indic languages - Research Portal

Lancaster Environment Centre

EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Published

Standard

EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. / Baker, Paul ; Hardie, Andrew ; McEnery, Tony et al.
2002. 819-825 Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Harvard

Baker, P , Hardie, A , McEnery, T, Cunningham, H & Gaizauskas, R 2002, 'EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation', Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain, 29/05/02 - 31/05/02 pp. 819-825. <http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf>

APA

Baker, P., Hardie, A., McEnery, T., Cunningham, H., & Gaizauskas, R. (2002). EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 819-825. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain. http://www.lrec-conf.org/proceedings/lrec2002/pdf/319.pdf

Vancouver

Baker P , Hardie A , McEnery T, Cunningham H, Gaizauskas R. EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation. 2002. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.

Author

Baker, Paul ; Hardie, Andrew ; McEnery, Tony et al. / EMILLE, A 67-million word corpus of indic languages : Data collection, mark-up and harmonisation. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.7 p.

Bibtex

@conference{2bcecb22af334b6181bf4a6e389fdf82,

title = "EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation",

abstract = "The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.",

author = "Paul Baker and Andrew Hardie and Tony McEnery and Hamish Cunningham and Rob Gaizauskas",

year = "2002",

month = jan,

day = "1",

language = "English",

pages = "819--825",

note = "3rd International Conference on Language Resources and Evaluation, LREC 2002 ; Conference date: 29-05-2002 Through 31-05-2002",

}

RIS

TY - CONF

T1 - EMILLE, A 67-million word corpus of indic languages

T2 - 3rd International Conference on Language Resources and Evaluation, LREC 2002

AU - Baker, Paul

AU - Hardie, Andrew

AU - McEnery, Tony

AU - Cunningham, Hamish

AU - Gaizauskas, Rob

PY - 2002/1/1

Y1 - 2002/1/1

N2 - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.

AB - The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.

M3 - Conference paper

AN - SCOPUS:85000764797

SP - 819

EP - 825

Y2 - 29 May 2002 through 31 May 2002

ER -

Research

Links

EMILLE, A 67-million word corpus of indic languages: Data collection, mark-up and harmonisation

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us