Final published version
Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review
Publication date | 1/01/2002 |
---|---|
Number of pages | 7 |
Pages | 819-825 |
<mark>Original language</mark> | English |
Event | 3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain Duration: 29/05/2002 → 31/05/2002 |
Conference | 3rd International Conference on Language Resources and Evaluation, LREC 2002 |
---|---|
Country/Territory | Spain |
City | Las Palmas, Canary Islands |
Period | 29/05/02 → 31/05/02 |
The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to construct a 67 million word corpus of South Asian languages. In addition to undertaking this corpus construction, the project has had to address a number of related issues in the context of establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools on EMILLE has contributed to the on-going development of the LE architecture GATE.