Multi-task projected embedding for igbo

Computing and Communications

Associated organisational units

Text available via DOI:

https://doi.org/10.1007/978-3-030-00794-2_31
Final published version

Keywords

Diacritics, Embedding models, Igbo, Low-resource, Transfer learning

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Ignatius Ezeani
Mark Hepple
Ikechukwu Onyenwe
Chioma Enemuo

More...

Publication date	8/09/2018
Host publication	Text, Speech, and Dialogue - 21st International Conference, TSD 2018, Proceedings
Editors	Petr Sojka, Aleš Horák, Ivan Kopecek, Karel Pala
Place of Publication	Cham
Publisher	Springer-Verlag
Pages	285-294
Number of pages	10
ISBN (electronic)	9783030007942
ISBN (print)	9783030007935
<mark>Original language</mark>	English
Event	21st International Conference on Text, Speech, and Dialogue, TSD 2018 - Brno, Czech Republic Duration: 11/09/2018 → 14/09/2018

Conference

Conference	21st International Conference on Text, Speech, and Dialogue, TSD 2018
Country/Territory	Czech Republic
City	Brno
Period	11/09/18 → 14/09/18

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11107 LNAI
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Conference

Conference	21st International Conference on Text, Speech, and Dialogue, TSD 2018
Country/Territory	Czech Republic
City	Brno
Period	11/09/18 → 14/09/18

Abstract

NLP research on low resource African languages is often impeded by the unavailability of basic resources: tools, techniques, annotated corpora, and datasets. Besides the lack of funding for the manual development of these resources, building from scratch will amount to the reinvention of the wheel. Therefore, adapting existing techniques and models from well-resourced languages is often an attractive option. One of the most generally applied NLP models is word embeddings. Embedding models often require large amounts of data to train which are not available for most African languages. In this work, we adopt an alignment based projection method to transfer trained English embeddings to the Igbo language. Various English embedding models were projected and evaluated on the odd-word, analogy and word-similarity tasks intrinsically, and also on the diacritic restoration task. Our results show that the projected embeddings performed very well across these tasks.

Research

Associated organisational units

Links

Text available via DOI:

Keywords